Document Migration Tips and Tricks
Martin Bergljung, Consultant, Tuesday 11th October 2011
In this blog entry I want to document and recommend a few ways of migrating content into Alfresco. Depending on whether you are importing content from the file system, other CMS, or from another Alfresco instance there are going to be a number of ways to go about it.
The trick is to know which migration method to choose so you don’t have to do more work than necessary. When I talk about the different migration methods, I use a flowchart as a way of deciding what method to use in a particular situation. The flowchart could also be used separately to decide which migration method to use.
This is work in progress and I would appreciate comments from anyone that has experience with content migration with Alfresco. Also, if you think that a migration method that I recommend at a certain point is not really going to work, or is not suitable in that particular situation, then please let me know.
IS THERE ENOUGH DISK SPACE, SERIOUSLY?
The first thing you need to check out is whether your new target Alfresco system has enough disk space to actually store all the content produced from a couple of test migrations and the final content migration.
As an example, a test migration could consist of the following steps:
- All content is copied to the disk on the machine where Alfresco is running
- Content is imported into Alfresco
- A test group checks the imported content over a couple of days and makes sure everything looks ok
- Then all content is deleted from Alfresco followed by a purge of the content by an administrator
So aren’t we back to square one with the disk space after a completed test migration? Not really, it is still taking up space on disk. Even if you delete all content so it ends up in the archive://SpacesStore, and then purge it so after 14 days it is copied from alf_data/contentstore to alf_data/contentstore.deleted, it will still not be removed from disk, it will hang out there forever.
So let’s say you are migrating 12GB of content into Alfresco and plan to do at least 2 test migrations before the final migration. You also intend to copy all content to the Alfresco box before doing a migration. How much disk space are you going to need? You will need something in the region of 12GB X 3 (migrations) + index space + 12GB (the copied content) =50 GB (approx).
The following picture is the start of the content migration flow chart:

You might ask why we need to do all these test migrations - can’t we just do one migration and that’s it? Most of the time you will find that the client will want to verify with at least one test migration that everything works as it should, and the imported files are ok and they have the metadata that they should.
Users will then add and update content so when we are coming up to the go live date we will have to do a final migration to make sure all the latest content is imported into Alfresco.
MIGRATING CONTENT BETWEEN ALFRESCO SYSTEMS USING BACKUP & RESTORE
So we’ve got the disk space sorted and now we need to figure out what content migration method we are going to use. There are quite a few methods out there and it can be tricky to know which one to use. The first thing we want to find out is if we are migrating content between Alfresco systems. If this is the case then we can narrow down the number of migration methods that we need to look at.
When you migrate content between Alfresco systems it’s not just the content that the client will want to have migrated to the new Alfresco system. They will also most likely want to have all the metadata, categories, groups, Web Projects, Share sites etc migrated over to the new system. So the first method I recommend is to try and use the Backup & Restore technique described on the Alfresco Wiki here. If you can get this to work then you can be sure you can get everything migrated over to the new system in an easy way.

Now, if the Backup & Restore method failed for some reason, for example:
- You had a source Alfresco version that was older than the target Alfresco version and you could not get the upgrade to work
- You were migrating between different database systems (e.g. from MySQL to MSSQL) and could not get that to work
Then you will have to resort to other migration methods, which we will cover later on.
An example of a backup and restore process with MySQL and Ubuntu environments looks something like this:
- Stop Alfresco on the source system
- Take a backup of the MySQL Alfresco database on the source system (e.g.
/mysqldump -u root -ppword alfresco > alfrescodump.sql) - Backup the source system’s dir.root directory as specified in the alfresco.global.properties file (e.g.
tar -cvzf /opt/alfrescocontent.gz /opt/alfresco/alf_data) - FTP both files in binary mode over to the new target system on which you have already installed the same version of Alfresco and can successfully log in to http://host:8080/share
- Stop alfresco on the target system (
/opt/alfresco/alfresco.sh stop) - On the target system extract the alf_data directory and overwrite existing alf_data directory (
tar zxvf /opt/alfrescocontent.gz --overwrite) (in this case the alf_data path is the same in the source and target system) - Create an alfresco database in the target system’s MySQL database (
mysql -u root -ppword then create database alfresco) - Then restore the alfresco db on the target system (
mysql -u root -ppword < alfrescodump.sql) - And finally start Alfresco on the target system (
/opt/alfresco/alfresco.sh start)
MIGRATING CONTENT BETWEEN ALFRESCO SYSTEMS WHEN BACKUP & RESTORE IS NOT POSSIBLE
Okay, so we could not get the backup and restore process to work and we now need another migration method to solve the problem. There are a couple of migration methods we could look at:
- Using ACPs to import and export content
- Mapping WebDAV drive to export content and using Alfresco Bulk Import Tool to import content
- Mapping WebDAV drive to export content and mapping WebDrive drive to import content
- Using the Share import & export tool to export and import Alfresco Share sites
The first method we should look at is exporting and importing using Alfresco Content Packages (ACPs), as then we would get all the metadata (such as created and modified date) preserved, which is usually a requirement from the client. This method usually works fine if you can limit the ACP packages size to < 4GB. If that is not possible then you could try and start both Alfresco systems with JDK7, which should allow for > 4GB ZIP files (ZIP64 format extensions).
If JDK7 is not an option for you then you can first try and temporarily reorganize the folder structure (just for the migration) to accommodate for max 4GB ACP package creation. If that is not possible either then you have to resort to WebDAV drive mapping or custom coding. If you try with drive mapping for the export, do the import with Alfresco Bulk Import tool for speed and preservation of modified date.

Now, if you managed to export and import all content with ACPs that still does not cover WCM (i.e. AVM store) and the Share Site configuration (also kept in the AVM store). For Share sites you are better off using the http://code.google.com/p/share-import-export tool then trying with ACPs. And for WCM projects (i.e. the old WCM product, not WQS) you can use a combination of ACPs and deploy content to a web server and then import it again into the Web Project.
MIGRATING CONTENT FROM THE FILE SYSTEM OR OTHER CMS INTO ALFRESCO
The most common content migration case will probably be to import content from a network drive or from another CMS system such as SharePoint, FileNet, OpenText, or Documentum. When importing from the file system it is important to be able to do that with high speed and preservation of last modified date.
A very good tool for file system import is the Alfresco Bulk Import tool that you can find information about here. There is also a section in my book about how to use this tool. The chapter containing this information is free to download from my blog site http://ecmstuff.blogspot.com.
If you do not need to preserve last modified date or use any of the other features for applying metadata available in the Alfresco Bulk Import tool, or you just have not got access to install the required AMP for this tool, then you could resort to using WebDAV drive mappings to export content and to import content. This is very slow but it will get you there in the end. Make sure to use WebDrive, I have had problems with NetDrive not importing large PDFs properly (maybe it is fixed now).

If you are sitting with another CMS system than Alfresco and need to import its content into Alfresco then you could have a look at the OpenMigrate tool. It might be able to help you with the export and import process, including custom processing if needed. If that tool does not work for you then try and see if you can map a network drive to the CMS system and then just export it that way, but most likely you would want to preserve created and modified dates for the content you are importing.







Comments
Be the first to comment.
Add your comment