Using SWORD v2 to upload large data files

Our project uses Dspace as it’s online repository, with ATMOS storing the actual files. This gives us a very large storage area for researchers to submit their research data to.

However, where the Dspace submission interface works well with small files, it struggles a bit with larger ones, or muliple files.

So, to combat this we’ve been looking at creating a new SWORD based submission tool. We started by exploring a python script created by QMUL, ‘sworduploader’.

This works well for small files and multiple files since it effectively zips up files, submit’s them to Dspace using the SWORD service document and then unzips them. However, as we developed the script to work for larger data we started to uncover some issues.

1) Python’s default zip engine uses 32 bit so started to fail when attempting to zip up anything outside of the 32 bit range (4GB). We solved this though by forcing python to using the zip64 module, giving us a much larger scope of filesize.

2) The Dspace XMLUI coccoon module (in core.properties) contains a integer max upload filesize. This is currently set to just over 2GB and cannot be increased. We’re working on this.

http://permalink.gmane.org/gmane.comp.db.dspace.user/15901

And we look forward to…

3) HTTP is not a very trustworthy protocol for uploading data across the web. What happens if the submisson of a large file is interrupted?

We are also working a bespoke java based tool for submitting files to the respository. This uses the ‘batch import’ script of Dspace and seems to work well. However, we’d like to use SWORD and what we don’t want here is to have to create separate submission tools for different kinds of data since this would confuse the user so we’d like to make it all work with one interface.

So, we persevere.

Posted under Technical development

This post was written by Ian Wellaway on August 28, 2012

Comments are closed.

More Blog Post