Migration from Concrete5 to Jekyll

When I first setup boredwookie.net Concrete5 was used to power the site. In the years since then I've grown tired of using a heavyweight CMS to post a few pages. Last month I made the switch over to Jekyll and the transition was anything but painless. It was incredibly difficult and required a lot of manual effort and fine-tuning to get right. Along the way I created a ruby script to take some of the busy work out of doing a bulk migration.

The script takes xml files generated by the Concrete5 Legacy Migration Tool and creates jekyll-style posts with YAML front matter that can be massaged into a working site. While there are gaps in what I could script-out it was a useful tool in the migration effort.


References


Purpose

Migrating from a CMS to a flat-file/static site generator is not the easiest thing in the world. I was fortunate enough to find myself wanting to migrate away from a CMS around the same time that Concrete5 implemented breaking changes in their software. This required them to create an exporter from the older version that I used to provide a migration path to their new system.

My ruby script parses the export files created by the Concrete5 Legacy Migration Tool and creates jekyll-styled posts that you can use a a base when migrating your site.


How do I use the script?

The cc5_to_jekyll.rb script can be used like this:

  • Download the ruby script from github
  • Use the Concrete5 Legacy Migration Tool to create XML exports of concrete5 pages (You will likely have to apply this patch to get the tool to work correctly)
  • Install Nokogiri (gem install nokogiri)
  • Edit the year_month variable to match the prefix for your set of XML files
  • Edit the bw_cc5 line to point to your xml file
  • Run the cc5_to_jekyll.rb script
  • You should how have a bunch of stubbed-out posts in the posts directory next to the ruby script

NOTE: The script does not handle files or attachments (which also get exported using the legacy migration tool). You will need to ensure that URLs in your posts reference images, assets and other files correctly after migration.

NOTE 2: The Concrete5 Legacy Migration Tool does **NOT** export date/time stamps for the pages it exports. I had to manually specify the dates on over 200 pages after the migration script completed.


Important Considerations

You will likely find that a number of your pages have duplicate content after running this script. About 20% of my posts ended up stringing the same content together twice in the same file. My migration script is not perfect as the objective was to speed the bulk migration of hundreds of CC5 pages. This was still quicker for me than having to migrate the content manually using other methods.

I would suggest exporting your posts by year to ensure that you don't encounter PHP timeout issues. Going by year ensured that I could export my 200+ pages without any errors

As part of the migration away from cc5 I ended up writing set of Apache RedirectMatch and mod_rewrite rules to unify the URL scheme to something that makes sense and that I can easily work with if I have to migrate the site to another system sometime in the future. The permalink option in the YAML front-matter is your friend, especially if you want to drop the .html extension from your pages without resorting to excessive folder nesting.


The Rest of the Process

These are the high-level steps I took to migrate from CC5:

  • Export content from Concrete5 using the Migration Tool (in segments)
  • Bulk-transform the XML files generated from the Tool
  • Remove duplicate content from migrated pages (about 20% of pages had dup data)
  • Manually enter publication date information for my 200+ pages
  • Moved images and other assets exported from CC5 to the Jekyll folder structure
  • Updated all URLs in all pages/posts to reference assets and other pages correctly
  • Migrated the comments section for old pages that would benefit from this
  • For posts that have code samples, update them to use Jekyll Rouge
  • Create apache RedirectMatch and mod_rewrite rules to cleanup the URLs
  • Create apache RedirectMatch rules to hide .git, .gitignore and other typical files that can be found in git repos
  • Setup an Ansible deploy process to make deploying site changes quick