Archive data migration

My current task is to migrate the archive data from the various databases which exist around the University, into iPaMS. This is so that older databases and systems can be phased out, and the iPaMS web service can be used to publish archive data to various intranets and places where it is needed, for existing students, staff and alumni.

Most of this data is in flat tables or XML so this should keep the import relatively straightforward. The challenging part comes with the more complex structure of iPaMS – each data field, such as module aims for example, is stored as a separate record (which facilitates the flexible templates) and it is linked to the yearly module descriptor with a unique module_descriptor_id, which in turn is linked to the module with a unique module_id.

My initial attempts at the migration ran into problems if I tried to assign all these id numbers in one go, as I didn’t know what data was being added manually while I was busy working on the import. So potentially we could end up with duplicate live data. Then I realised it would be better to do the migration in layers, like this:

  • Week 1: Add all the module and programme top level records (we used data from SITS to find all these)
  • Week 2: Now we have the module and programme ids, add all the yearly records we have to import
  • Week 3: Now that we have the module_description_ids we can start to add all the individual pieces of data

This is actually slightly more complex in reality (and will take a little longer than 3 weeks!) mainly due to the fact that I will deal with each of the colleges separately, so steps 2 and 3 will be repeated for each college. Also there is some data which already exists for humanities which is also due to be imported, so we need to identify the ‘correct’ version and remove and archive any incorrect data, such as that imported on the wrong template.

All the weekly import scripts are written purely in SQL which I try out on a local copy of the database, then I run the same scripts on the test site and then finally I run the scripts on the live server during a scheduled update period. This should give enough opportunity to spot any potential issues before the live data is affected. My top priority in all of this is to protect the live data and not cause any issues or duplicate data to be added, so hopefully this can all go as smoothly as possible!

Helen Connole (iPaMS Lead Developer)


Skip to toolbar