– Invader Zim (cartoon)
Not when it comes to repositories. As DSPACE is now live, I thought it was time I finally wrote the metadata post I said I’d do in my last post.
The notion of metadata being ‘data about data’ perhaps belies how complex it is, reinforcing JISC Digital Media’s preference for defining it as ‘structured data about data’. This structure is the key part in making the data actually work, be interoperable with other data and ensuring that it is sufficient so no educated guesses are needed!
DSPACE uses Dublin Core, one of the most simple metadata schemas with only 15 elements. However, the material being digitized comes from three different sources, each with their own cataloguing standard (MARC21, ISAD(G) and SPECTRUM), meaning they needed mapping to Dublin Core. Dublin Core has a number of qualifiers which are used to refine the elements and are particularly useful when mapping other standards. We have used these qualifiers to ensure that the Dublin Core elements are a true representation of those in the original standard.
It is this simplicity and adaptability that makes the Dublin Core schema so widely used and so help exists online for the mapping of many different standards to it. However, there was still a lot of tweaking to be done to make it work for us, especially as we did not need the entire standards mapped for CHARTER, only what we were going to be using. Practical testing proved to be very important with a great deal of backwards and forwards between Ahmed and myself.
I found the MARC21 mapping the hardest to get to grips with due to the numerical and positional nature of the standard. Because of this, I developed a cheat sheet of three columns, one with our DC labels, the MARC21 field and one for my own notes on how to complete the field. I found it such a useful tool that I created cheat sheets for SPECTRUM and ISAD(G) too.
Further structure is needed for the information that is going into the Dublin Core fields. This is the small, but important stuff such as ‘What format will the date take?’ ‘What about in free text?’ and ‘What order will the dimensions go in?’ It’s all about standards, uniformity and interoperability (as well as, of course, describing the material and aiding users in discovering it).
And I haven’t even covered the technical metadata so I’ll keep it short. Our technical metadata is embedded in the image files themselves, as well as being exported, and is managed through Adobe Bridge. This metadata is the EXIF (Exchangeable Image File Format) information which is automatically generated when the image is captured. In addition to this, using the IPTC (International Press Telecommunications Council) schema which is bundled with Bridge, we embedding the minimum information required to identify the file and its contents in the event that it becomes removed from its context.
The say ‘the devil is in the details’ which seems to be true of metadata. I came out with the phrase ‘small spanners cause big problems’. One thing I have noticed is that what seems to be a small issue ends up having big repercussions yet something that seems big ends up being easy to deal with.
And so, with nearly all the ‘t’s crossed and ‘j’s dotted, it will be full steam ahead on filling up the repository.
[Metadata Workflow Guidelines are being produced]