Dspace submission using Globus and SWORD2 – Update

We’ve made huge progress on our submission tool recently. We now have a prototype web app that collects metadata from users and uses Globus to transfer the files to our ATMOS storage facility before submitting them to Dspace.

I demonstrated this at the IDCC (International Digital Curation Conference) in Amsterdam last week and found many other delegates were either interested in the use of Globus or already had it at there organization BUT didn’t have it hooked into Dspace. Globus allows the user to create ‘endpoints’: data locations such as your laptop or PC that you can then transfer files from and to. The transfer happens asynchronously and as long as the endpoint hardware is on with Globus running, it will eventually complete and submit the entry in Dspace.

All this is added to the web app via an API and the deposit to Dspace as an atom request via SWORDv2. We have also implemented our Single Sign On (SSO) service.

We hope to have a finished prototype next month and aim to share our Dspace development with the wider community.

 

Posted under Technical development

This post was written by Ian Wellaway on January 25, 2013

The Holistic Librarian – Thing 4: sharing means caring

Hi, I’m Afzal, your Subject Librarian for Arabic, and Politics.

Task 4: If a researcher came to you asking how they could share their research data with somebody external to the University what would you recommend?

What I knew about the topic beforehand:

That there are Facebook type networking sites for researchers of all disciplines such as www.researchgate.net where researchers can get to know one another’s work, and share meta-data and even collaborate to a limited extent but not in real time: this is through the usual means such as emailing, file attachments, web page sharing, perhaps you tube, LinkedIn.  Given the nature of research and links to funding there is reluctance to share Data in the way it is done within a department or institutional project.  

What I know now:

Whilst researchers can exchange their data via USB sticks, external hard drives even conventional post, there aren’t any data collaborative forums or ‘data internets’. This is because of the lack of standards, policies, and consensual sharing guidelines, differing copyright laws, university and institutional regulations which would satisfy researchers.

Sharing is practically limited to the researcher’s institution, where data sharing contracts would be in place.

How did you obtain this knowledge?

I came to the above answer by entering ‘data sharing researchers’ in Google. I didn’t really finding ‘Data Rooms’ where the world’s researchers come together and ‘chat data’ – (Of course, these data rooms would have protective security locks). The results invariably focused on individual organisations e.g. at Edinburgh. I was expecting some kind of inter-institutionally owned ‘data labs’.

What else would you like to know about the topic?

Where do we go from here given the emphases being placed on collaborative and group research at the funding level and Open Access at the output level. This surely cannot be limited to an individual institution anymore: for instance, how do researchers from Oxford, Bristol, and the Max Planck Institute share data – is progress being made to enable this?

How did you find this task? How would you improve it?

Disappointing answers owing to a lack of developed national strategy, which will not help us progress. To improve: provide ‘model’ responses to all the tasks.

Posted under Holistic Librarian, Training

This post was written by Afzal Hasan on January 25, 2013

Tags: ,

The Holistic Librarian – Thing 10

Hi, I’m Diane Workman and I’m the Subject Librarian for Drama, English, Film Studies and Theology & Religion.

As part of the Holistic Librarian project I was asked to research three tasks of the ‘23 Things (+1) for Research Data Management’.

Task 10 was to research the answer to the question: “A researcher has used a secondary data set in their research. In which circumstances would she be able to put this on Open Access?”

What I knew about the topic beforehand:

I was unsure where to begin with this one, but felt that I should explore issues of Intellectual Property Rights (in relation to the original data creator) and ethics (in relation to the consent provided by participants in the original data collection).

What I know now:

It’s important to know the copyright situation for the data set that is being re-used. The researcher should establish whether the data set is still covered by copyright, and who the copyright owner is. Once they know this, they can contact the copyright owner to seek permission to publish their data set on an Open Access (OA ) basis. It’s possible that the original creator may have made their material freely available for re-use in one or both of the following ways:

  • By applying a Creative Commons licence
  • By depositing in a data centre with an OA policy

It’s also important to be clear about what informed consent was obtained from the participants in the original data collection by the data creator. Any consent form signed by them should have outlined any likely re-uses of the data, ideally specifying publication of the data set in an OA repository. This relies heavily on the original data creator making good provisions for the sharing and future use of the data that they collect, something that researchers should consider when developing their Data Management Plan. The University of Glasgow acknowledges that the process of placing data sets on OA may not always be straightforward:

“There can be a tension between abiding by data protection legislation and ethical guidelines, whilst fulfilling funder and public expectations to make research results available.” [Source: University of Glasgow website; section on data protection legislation and ethics].

How did I obtain this knowledge?:

I consulted several websites during the research for this question. It required reading around research data management more generally, as it relates to several issues. The following websites were all useful:

The Digital Curation Centre, and in particular their section on digital curation
http://www.dcc.ac.uk/digital-curation/what-digital-curation

The Incremental Project, part of the JISC Managing Research Data programme
http://www.lib.cam.ac.uk/preservation/incremental/index.html

The University of Cambridge Support for Managing Research Data web pages, one of the Incremental Project partners
http://www.lib.cam.ac.uk/dataman/

The University of Glasgow Data Management Support for Researchers web pages, the other Incremental Project partner
http://www.gla.ac.uk/services/datamanagement/

The UK Data Archive at the University of Essex, and in particular their section on consent and ethics
http://www.data-archive.ac.uk/create-manage/consent-ethics

What else would I like to know about this topic:

It would be useful to know more about the implications of the Data Protection Act on the re-use of data by researchers.

How did I find this task? How would I improve it?

This was the most difficult of the tasks that I was set, as there is no clear answer. Anything relating to IPR is usually not straightforward, and was further complicated when combined with the ethical aspect of the question. No single source provided an answer to the question, so it was necessary to draw my own conclusions based on the range of information sources consulted.

Posted under Holistic Librarian

This post was written by Diane Workman on January 25, 2013

Tags: ,

The Holistic Librarian – Thing 9: a date with meta…

Hi, I’m Afzal, your Subject Librarian for Arabic, and Politics.

Task 9: What is the importance of documenting research data and metadata? Where can you find useful information on data documentation and metadata?

What I knew about the topic beforehand:

I understood that Research Data Documentation to mean the same as creating Metadata. This is data about the data, helping the research creator with an organisational skeleton to keep track of progress themself and for others to know the research is or will be in the domain.

What I know now:

Creating metadata is highly emphasised and de rigueur. It’s as important as the research data itself. Rigorous standards mean greater impact because it means easy citability.Good metadata also allows other researchers to verify and repeat the experiments, for example. This has major ethical dimensions.

Universities now have webpages emphasising the importance of metadata:

E.g. MIT http://libraries.mit.edu/guides/subjects/data-management/metadata.html

Cambridge: http://www.lib.cam.ac.uk/dataman/pages/metadata.html

Oxford: http://www.admin.ox.ac.uk/rdm/dmp/documentation/

A good introduction to the sine qua non of Metadata is found at the UK Data Archive site:

http://data-archive.ac.uk/create-manage/document

How did you obtain this knowledge?

Googling succeeded.

What else would you like to know about the topic?

Whether there’s a standard Exeter University toolkit for ensuring rigorous metadata is created.

How did you find this task? How would you improve it?

Straightforward and educative.

Posted under Holistic Librarian, Training

This post was written by Afzal Hasan on January 24, 2013

Tags: ,

The Holistic Librarian – Thing 11

Hi, I’m Diane Workman and I’m the Subject Librarian for Drama, English, Film Studies and Theology & Religion.

As part of the Holistic Librarian project I was asked to research three tasks of the ‘23 Things (+1) for Research Data Management’.

Task 11 was to research the answer to the question “What advice could you give to a researcher about backing-up his research data?”

What I knew about the topic beforehand:

I was aware of the importance of making regular back-ups of data to protect against accidental or malicious data loss, and of ensuring that one copy was stored at a different location to the rest. I was also vaguely aware that Exeter IT provide a back-up service, but was unfamiliar with the detail.

What I know now:

Making back-ups of files is an essential element of data management and should be built into a researcher’s Data Management Plan. The back-up procedure followed, and the regularity with which it is carried out, will depend upon the perceived value of the data (e.g. is it unique?) and the levels of risk considered appropriate by the researcher.

Some things that need to be considered include:

  • Institutional back-up policy
    This will determine what the researcher will be responsible for backing-up themselves, e.g. personal laptops, home PCs. Exeter IT provide more advice on the back-up options available to staff and students of the University on their web pages. See http://as.exeter.ac.uk/it/files/backup/

Researchers at Exeter can also deposit their data in the Exeter Data Archive (EDA) for long-term preservation.

  • Which storage media to use
    This will depend on the quantity and type of data. Memory sticks, CD/DVD or remote, online back-up services are considered better for small amounts of data. Hard drives (networked and removable) and magnetic tapes are considered more suitable for large volumes of data.
  •  Which file format to use
    The format chosen should be suitable for long-term digital preservation.
  • Regular validation of back-ups
    It’s important to test back-ups at regular intervals for completeness and integrity. They can be checked by matching file size, dates and checksum/hash tags against the original file. MD5 is a widely used checksum which can be used to verify whether two files are identical. More information on MD5 is available on the UK Data Archive website.

Further information on the majority of these points can be found on the websites mentioned in the section below.

How did I obtain this knowledge?:

The UK Data Archive at the University of Essex is something of an authority on this topic, and had a useful webpage devoted to backing-up data.

The University of Glasgow has a set of useful webpages on data management support for researchers, including one dealing with data back-up.

Information about arrangements local to the University of Exeter can be found on the following Exeter IT web page: http://as.exeter.ac.uk/it/files/backup/

What else would I like to know about this topic:

I don’t begin to understand how the MD5 checksum works! I would like to find out more about this and perhaps see it in action.

How did I find this task? How would I improve it?

It was fine. There’s enough specific advice available on authoritative websites – much more than I could convey in this post.

Posted under Holistic Librarian, Training

This post was written by Diane Workman on January 24, 2013

Tags: ,

The Holistic Librarian – Thing 8

Hi, I’m Afzal, your Subject Librarian for Arabic, and Politics.

Task 8: A researcher asks you about her funder requirements on research data. Where you could find out this information?

What I knew about the topic beforehand:

First thing to occur would be to ask the funder.

What I know now:

An excellent overview of funder requirements is found here with a more detailed look in a document called Funder Requirements for Data Management and Sharing by Gareth Knight. He covers the following funders:

1. Action Medical Research (AMR)
2. Biotechnology and Biosciences Research Council (BBSRC)
3. Bill & Melinda Gates Foundation
4. Breast Cancer Campaign (BCC)
5. Cancer Research UK (CRUK)
6. Department of Health, UK (DoH)
7. Department for International Development (DfID)
8. Drugs for Neglected Diseases Initiative (DNDi)
9. Economic and Social Research Council (ESRC)
10. Engineering and Physical Sciences Research Council (EPSRC)
11. GlaxoSmithKline (GSK)
12. Medical Research Council (MRC)
13. Natural Environment Research Council (NERC)
14. Wellcome Trust
15. WHO – World Health Organization – TDR
16. World Cancer Research Fund (WCRF)
17. National Health Service Technology Assessment (NHS HTA)

How did you obtain this knowledge?

I used Google.

What else would you like to know about the topic?

Do any of the Funder’s policies represent a potential conflict with the University of Exeter’s policies; if so what are our strategies for resolving this?

How did you find this task? How would you improve it?

Straightforward. But why would a researcher – worthy of their name – ask me about their funder?!

Posted under Holistic Librarian, Training

This post was written by Afzal Hasan on January 24, 2013

Tags: ,

The Holistic Librarian – Thing 5

Hi, I’m Afzal, your Subject Librarian for Arabic, and Politics.

Task 5: What is our institutional policy on OA and RDM and how does it compare with other institutions’ policies? Are there any other institutional policies that affect research data management?

What I knew about the topic beforehand:

OA and RDM are major components at Russell Group Universities, and I assumed therefore that must be a university body that’s affiliated with a national organisation setting or coordinating national policy. I did not think there should be so much a ‘comparison’ as ‘coordination’.

What I know now:

Policy authorship at Exeter is the responsibility of  Open Access and Research Data Management Policy Task and Finish Group, formed in March 2012. I understand there’s one approved policy for PGR students and one for researchers in advance draft stage.  These can be read here.

Comparison of inter-institutional policies

The Digital Curation Centre http://www.dcc.ac.uk/ gives guidelines on policy formation. The DCC  has collated policies from various UK institutions on data management, and can be seen here:

http://www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies/uk-institutional-data-policies

There are policies from non-UK bodies as well.

How did you obtain this knowledge?

I ran a search for OA policies on the Exeter University homepage. I googled OA RDM policies to find the DCC page.

What else would you like to know about the topic?

Is there a coordinated national objective?

How did you find this task? How would you improve it?

I found it straightforward. There might be copyright issues/conflicts involved. To improve the task, I’d put the task more simply. It’s rather beyond the means to ask for a comparison of institutional policies across the world.

Posted under Holistic Librarian, Training

This post was written by Afzal Hasan on January 24, 2013

Tags: ,

The Holistic Librarian – Thing 12

Hi, I’m Diane Workman and I’m the Subject Librarian for Drama, English, Film Studies and Theology & Religion.

As part of the Holistic Librarian project I was asked to research three tasks of the ‘23 Things (+1) for Research Data Management’.

Task 12 was to research the answer to the question “What evidence can you cite that research made available on Open Access has more impact than research that is not available on Open Access?”

What I knew about the topic beforehand:

I attended Alma Swan’s presentation during Open Access (OA) Week in October, and remembered her citing some examples where placing research papers on OA had a positive impact on their use and/or citation. The concept of Journal Impact Factor (JIF) is one that takes me out of my comfort zone, as it has traditionally been of less importance to the Humanities disciplines that I provide library support for.

What I know now:

Since 2001 many studies have taken place, largely in the Sciences, to test the hypothesis that OA provides a citation advantage “by increasing visibility, findability and accessibility for research articles” (Swan, 2010). Swan summarizes 31 studies that were published between 2001 and 2010, concluding that 27 studies found a positive citation advantage to placing a research article on OA. She also provides data on the size of the OA citation advantage found (as a % increase in citations) by discipline.

There is a move away from reliance on that traditional citation metric the JIF in measuring the impact of research articles. Notice is also being taken of Social Media-based metrics and ‘Altmetrics’ in attempting to determine the impact of research made available on OA.

How did I obtain this knowledge?:

Swan’s report The Open Access citation advantage: Studies and results to date provides a useful summary of the relevant studies carried out up to 2010.

There is a useful web-based bibliography provided the OpCit Project, which you can use to bring your knowledge up to date. This provides abstracts for a variety of studies, including a number published as recently as 2012.

What else would I like to know about this topic:

I would like to have more information about the impact of OA on research in HASS disciplines, as so many of the published studies focus upon STEMM subjects, where OA is more established. There is an interesting article by Melissa Terras, published in the OA Journal of Digital Humanities called The impact of social media on the dissemination of research: results of an experiment which deals with the topic from a HASS perspective.

How did I find this task? How would I improve it?

I enjoyed finding out more about this topic. It was well-documented in sources available on OA, I’m pleased to say!

Posted under Holistic Librarian, Training

This post was written by Diane Workman on January 23, 2013

Tags: ,

IDCC 2013 – Workshop Tweets

The Open Exeter and Research360 JISC MRD projects ran a workshop at IDCC 2013, Designing Data Management Training Resources, on January 14th.  The following are some of the comments tweeted during the event:

  • In the training workshop at idcc13. Research360 & OpenExeter from the jiscmrd programme will tell us how to design RDM training
  • OpenExeter project started by doing an RDM survey to gather researcher requirements. For results see:https://eric.exeter.ac.uk/repository/handle/10036/3689 … jiscmrd idcc13
  • idcc13 Hannah Lloyd-Jones from OpenExeterRDM giving us an intro to RDM work at Exeter. Started off with a survey and 50 interviews!
  • idcc13 OpenExeterRDM recognised that diff disciplines/groups/individuals like different approaches to training, feedback collection etc.
  • OpenExeter training reused existing materials e.g. UKDA – http://data-archive.ac.uk/create-manage/training-resources …idcc13 jiscmrd
  • Tips for training materials: clear objectives, know audience, use researchers to create materials, concise, easy to read, jargon free
  • Tips for training materials pt2: glossaries, take away materials (easy to pin up), online guidance should compliment sessions
  • Lots of interesting discussions taking place during OpenExeterRDM’s “Speed Dating” exercise! idcc13
  • Hannah L-J from jiscmrd Open Exeter mentions (excellent) holistic librarian blog posts ’23 things for RDM’ http://bit.ly/TUPmsg  idcc13
  • Just been speed data dating at idcc13. Didn’t pull.
  • so much good info at Designing DM training coming too fast to tweet it! […] Bringing home lots of ideas idcc13
  • Data Mgmt Speed Dating: share skills you need for training; skills your researchers need. Need to do this in upcoming workshop! idcc13
  • First exercise at idcc13 training workshop ‘speed data dating’ was a bit frenetic but fun – good icebreaker ukdcc jiscmrd
  • OpenExeterRDM PhD students developed a research data management survival guide –jiscmrdidcc13 #ukdcc http://bit.ly/13uj5ev 

Posted under Training

This post was written by Jill Evans on January 21, 2013

Follow the Data – end of project feedback from Ruth Farrar

The Open Exeter project began with a structured method of involvement: weekly data management audit forms and face-to-face meetings with Dr. Gareth Cole every two to three weeks. Initially, I found this approach beneficial for three main reasons. First, as I work remotely from campus, it was nice to have a regular check in with the Open Exeter team as it encouraged open lines of communication and getting to know the team better. Second, the data management audits enabled me to regularly reflect on my own data management practices which in turn helped me understand the project’s wider issues of data management. Third, I found the face-to-face meetings with Dr. Cole useful at the start of the project as it provided a friendly space to discuss data management issues, gain advice and ask any questions about the project.

Though as the months progressed, I felt I was repeating some of the same information on my audit form which in turn meant there was little new material to add in the meetings. Perhaps, the audit form phase could have stopped sooner in the process as it may have been more productive to get us to move on to another structured project. However, I understand the audit forms needed to be carried out over a significant period of time.

Throughout the project, I liked how we were invited to numerous events by the Open Exeter team as this promoted a sense of inclusion and better awareness of data issues throughout the entire university. For instance, being given a table at the Digital Scholarship  Showcase on 28th May, 2012 proved a useful platform to share my research and data management issues.

The Open Exeter project also gave me an opportunity to hone my communication, organisational and pedagogic skills. During a PhD researchers workshop on 22nd June, 2012, I helped lead a ‘Speed Data Dating’ session which was equally fun and informative.

I also practiced speaking and listening skills in meetings with fellow PhD Open Exeter researchers. I found these meet ups invaluable. The number of group meetings were also evenly spaced throughout the project. I started the Open Exeter project in the first year of my doctoral studies. I benefitted greatly from listening to data management advice from researchers in their second and third years. I understand our newly created survival guide helps fill in this need for advice. However, I still found the face-to-face meet ups the most helpful part of the Open Exeter project. I wonder if first year students would benefit from talking to fellow students from second and third year in their department about data management issues. I am sure there are many students who would volunteer for a buddy/mentoring system to add another credential to their CVs. First year PhD researchers may also take a deeper interest in important data management issues if it was communicated by another fellow student as they may be eager to learn how to avoid common data management pitfalls.

From my perspective, the meetings with Jill, Gareth, Hannah and the PhD researchers helped me consider how the way I manage data now will have an impact on my research in my final year. I absorbed so much new information I would not have previously considered let alone have known the specific questions to ask. I enjoyed learning about data management topics ranging from the Freedom of Information Act to the advantages of academic depositories like ERIC.

Helping assist at the Information Stand and workshops during Open Access Week in October, 2012, marked another highlight of my involvement with the project. When I was able to confidently explain data management issues to students who came to the stand, it made me realise how much I had learned throughout the Open Exeter Project. Attending the workshops also highlighted the impact social media networks can have on disseminating research data to the public.

The provision of an iPad on the Open Exeter project also introduced me to effective methods for disseminating data online and between apps. My iPad rapidly became an essential tool for managing data particularly when working remotely. The iPad proved invaluable as it afforded me a new-found workflow freedom to edit, store, back up my field recordings and share data with users in the UK while simultaneously carrying out research on site on a project in America.

Overall, the Open Exeter project generously provided me with practical tools, useful advice and an excellent introduction to issues surrounding open access research and data management. Jill, Gareth and Hannah were a real pleasure to work with as their friendliness, enthusiasm and sincere kindness remained consistent throughout my time on the project. Ultimately, my involvement in the project has positively shaped the ways I consider saving, storing and sharing my doctoral research data.

Posted under Follow the Data, PGR students

This post was written by Jill Evans on January 21, 2013