MCMS2002 migration to a SharePoint 2013 Metadata driven environment

As any SharePoint enthusiast will acknowledge, migrating an existing website to a SharePoint environment is no easy task. This task becomes increasingly difficult when the website is a non-SharePoint website containing 100,000+ pages and documents. Tagging and organizing content in such a way that users will still be able to find it, either through search or through site navigation, can be very difficult – especially when the source content is unstructured. And finally – how can you ensure that all existing URLs are properly converted to their new ones while maintaining the search ranking in the major search providers?

In this blog post I’ll show you, as well as provide you with examples on, how we managed to overcome these challenges.

Case explanation

Our customer had a website based on Microsoft Content Management System 2002 SP1, which had been in place for over ten years. In those ten years, the website had been modified to meet all customer requirements. It worked like a charm. But, because MCMS and SQL 2000 are no longer supported by Microsoft, the customer had to move to a new platform. Since publishing content through their website is one of the customers core businesses, they needed a robust platform that had extensive publishing capabilities. They chose SharePoint 2013.

Along with the platform change, the new website also has a very different way of presenting content to the visitor. Instead of browsing articles though a hierarchic structure, we went for a search driven site through the use of metadata. This meant that the use of metadata would be of vital importance. Because without it, how could you find an article?

This presented us with a major challenge. More than ten years of publishing content meant that over 70,000 pages and 30,000 documents had to be migrated to SharePoint 2013 and had to be provided with additional metadata. Our goal was after all to make the content meaningful, relevant and, most of all, easily searchable.

Underneath you will read in detail how we approached this challenge.

Exporting the existing content

The first step towards migrating the website was to export the existing content. Because we had to deal with different sorts of content (website pages, documents and images), we decided on a generic approach that would work for all file types. In order to meet this challenge, we created a custom tool that exported all content into an XML based file, so there would be one XML file per URL. This XML file contained all the information we needed to be able to create new content into our SharePoint 2013 environment. All webpages and documents were ordered in a logical folder based structure, so we could import based on the year of publication. Webpages published on May 2006 would be converted into an XML file with a folder structure like this: {export location}\{language}\20065\{articlename}.xml . Although the same principle was applied to documents and images, they were located in an extra subfolder named Documents or Images.

The XML files created were the only source of input we had for creating the new pages and their properties. That is why we had to make sure that these files contained as much information as possible. Along with the basic properties, such as publication date and title, we also added a keywords property, which was filled with the most important keywords on the page. As well as the keywords, the theme property was also of great importance for the new website. This theme property could be derived from the URL of the existing content. A small snippet of the final XML is displayed below:

Executing the export tool using the settings and configuration described earlier, resulted in a total of ~105,000 XML files. These files could then be used by the content creation tool for creating new content in our SharePoint 2013 environment.

Creating new content in SharePoint

The next step in the process was to create new content in our SharePoint 2013 environment using the exported XML files. For this process we created custom tooling that used a dynamic approach, with which our customer has the ability to easily control the output generated by the content creation tool. In order to be able to achieve this dynamic behavior, we combined two XML files that form the heart of the content creation tool.

The first XML file describes a mapping that, based on the URL property of the export, determines which contenttype is used for creating the new content, as well as which tag is added for search optimization. Below is a snippet of this XML file:

The second XML file describes which property in the exported XML file is mapped to which SharePoint Field. Below is a snippet of this XML file:

The flow of the content creation tool, which is executed for every input XML file, is displayed below:

Image

First, the contenttype is chosen based on the mapping file shown in Listing 2. Each contenttype has its own collection of fields that need to be supplied with information. For each field in this collection, we retrieved the appropriate value, based on the mapping shown in the earlier code snippit. To read the data from the XML mapping files, we created a generic method which is displayed in the code sample below. After all field values were retrieved, a new page was created based on the mapped contenttype, and the field values were set to their appropriate values. After that, the page is saved and published. Finally, an item was added to the URLRedirectList. This list can then be used to perform user redirection as described in the next paragraph.

Search provider friendly URL redirection

Migrating the website into SharePoint 2013 meant that all existing URL’s were invalid, since SharePoint stores pages and documents in a completely different directory. This would be easy enough to change in the import tooling for all internal links, however in this case we also had to consider the hundreds of thousands of external links to the website that have been created by various sites over the internet. We also had to make sure that all of these links would still be valid! And we wanted to perform the redirection to the new pages in such a way that search providers would maintain the page rating.

To tackle this problem, we developed a simple, yet very powerful, solution. We created a SharePoint list, containing two columns namely the old and new URL. As discussed earlier, each time a new page or document was created during the execution of the content creation tool, a new list item was added to this list. After the content creation was completed, we had a full list of URL’s to which we want our users to be redirected to when they visit an old URL.

In order to make sure that we could intercept a request before the user is returned a 404 error code, we had to create a custom HttpModule. The downside to this was that HttpModules execute on every request, even if the resulting page would not be a 404 error. To make the HttpModule as lightweight as possible, we first checked whether or not the request would end in a 404 status. Because we knew that all existing URL’s end in the .htm extension, this was our second check. Only if both comparisons are true, we would query the URLRedirect list to see if the user had requested an old URL.

The end result of the HttpModule is shown below. Please note that URLRedirectBE is a business entity class used for working with the URLRedirect list items in a strong typed manner. It retrieves a SharePoint listitem based on the oldURL column and maps the field values to public properties.

By using a redirection with a 301 status, we not only made sure that any existing ranking with search providers remained in place, but also when the user visits this URL through the same search provider again, they will no longer use the HttpModule.

In order to register the new HttpModule in SharePoint we created a web application scoped Feature with a feature event receiver which is displayed below.

The result is a user friendly redirection mechanism that takes the existing pages search provider ranking into account.

Conclusion

Migrating a website into a SharePoint 2013 environment is never an easy task. In this article we have discussed our dynamic approach using custom tooling and configuration XML files. The key point for every migration is to focus on what is important for the customer, and make sure that is translated to the best possible migration approach. For us, the generation of metadata throughout the content generation process was of vital importance. This ultimately allowed us to create a new website that had conceptually completely changed the visitors perspective, from a website based on a hierarchal structure to a search driven website through the use of metadata.


DIWUG logoThis post is also published as an article in #13 of the DIWUG magazine. Be sure to visit the DIWUG site for more interesting articles.

I did not do all the work myself, so I would like to thank the entire team. Be sure to check out the blogs from some of my team members as well:
Garima Agrawal – http://mysharepointlearnings.wordpress.com/
Sachin Sade – http://sachinvzade.blogspot.in/