Follow the Data – end of project feedback from Annie Blanchette

Collaborating on the Open Exeter project served first and foremost as a great opportunity to reflect on and develop my research data management process according to some of the best practices and solutions out there.

As a foreign student, I had never heard of the Data Protection Act until the Open Exeter project leaders invited Caroline Dominey to present the DPA and Freedom of Access policy in a workshop. This awoke me to some of these requirements and prompted me to ensure my data process was adequate. Given the sensitive nature of my data  – dealing with real life subjects that can be recognised – I met with Ms Dominey in order to review my intended process. This has been very helpful too for the approval of my ethical research protocol. I have also learned about Open Access policies, some of the benefits of sharing, as well as measures to control diffusion in order to suit the sensitive nature of my data.

Through discussions with the project leaders and other fellow students involved in Open Exeter, I have learned a lot about data management processes and tools. For instance, I have found tools for encryption and synching, storing data with a good level of security online and managing efficiently research references. I have learned about sites to build a Data Management Plan (such as DMP online). Undertaking such a process proved very helpful in my ongoing data management, and facilitated my approval by the ethical committee as it allowed me to document potential ethical issues at every step of the process and measures envisioned to minimise the latter. The creation of the survival guide for new students was also a great opportunity to reflect on and share, as well as learn about best practices from others. Learning about good practices in terms of folder structures, versioning and naming conventions was very helpful too, although I wish we had covered this before so that I could have implemented this system earlier. While my submission delay does not allow me to adjust all my files at the moment, the system seems very promising and easy to implement, so I am hopeful it will work well for my future projects. The project was also an opportunity to learn to work with the iPad as a research tool, while benefiting from other’s exploration of this tool.

What worked:

Although I am still struggling to do the actual writing up of my thesis with the iPad, it has been playing a crucial role at every step of my data collection (managing field appointments, recording interviews and field notes, conducting photo reviews with participants and accessing/sharing my data for the purpose of interpretation).

Monitoring my data management throughout the project was also a great help because it pushed me to reflect on the nature of the data, as well as proper ways of handling it. Although reporting on a weekly basis was at time difficult given the diverse nature of my data, it also helped me create a much more detailed account for my thesis, which increased its credibility.

What didn’t work as well:

I was able to put together a process of synching and encryption, however, I am still struggling a bit with overwriting issues (especially with dropbox). I believe this will get resolved once I implement an appropriate versioning system.

Other suggestions:

I think offering the opportunity to students to take part in research data management groups (with data monitoring activities, group discussions and workshops) would be potentially of great benefits to some interested postgrad researchers. Perhaps I would have liked having a little bit more group interactions because I found it was great to interact with a team of people committed to data management reflection.  This and the one to one meetings often helped me fix some issues that would have otherwise blocked me for much longer. Thank you for giving me the opportunity to participate!

Posted under Follow the Data, PGR students

This post was written by Jill Evans on January 21, 2013

Follow the Data – end of project feedback from Duncan Wright

Firstly, I’d like to say that I have really enjoyed the time that I have spent working on the Open Exeter Project. It has been a pleasure working with all of the project team, staff and PhD researchers alike, and I feel that it has been a productive experience. I feel the project has been well managed throughout, with exemplary correspondence and engagement with researchers: at no point was it unclear what was required from us, and the aims and objectives of each task were consistently communicated effectively. I have particularly appreciated the time spent in workshops, and it is probably in this format which I gained the most. Conversely, I found tasks/activities online more difficult to undertake, although this is probably as much a reflection on the manner in which I operate than anything else. The times when specialists were invited into the workshops seemed to work especially well, and the level at which the information was pitched was far better than that of the PGR development workshops.

Particular aspects of the project I found especially informative for my own research, and I would have certainly have benefitted further still if I’d encountered some of the information nearer the beginning of my studies. The workshop/discussion on academic referencing software, such as Mendeley and EndNote, was really informative and should be considered for the future – I’m aware that there is an EndNote workshop, but researchers should be made aware of the wider range of resources available where possible. Creating a data management plan also provided a useful insight: I’ve actually mentioned data management in a few of the research fellowships that I’ve been applying for, and it is experience which I hope to use to my benefit from in the future. It has been good too to learn about Open Access, and its implications for research even if some of those are potentially difficult/challenging – it seems a shame that more academics in particular are not engaging with Open Access, or at least involved in discussing its implementation.

I think that the lack of interest from academics was reflected during the Open Access week though sadly. It was also disappointing that the University didn’t deem it important enough to warrant a more suitable location for the stall – this is in no way a criticism of the team, as I am aware that a better position was sought after, but it did negatively impact the visibility of the stall and what we had to offer. Open Access week was a well-publicised, efficiently organised and very informative event, and all of the sessions that I went to were interesting, pitched at a good level and useful. It was just a shame that it wasn’t enjoyed by more people!

That’s probably the sum of most of my thoughts. As I said before, my main thoughts about Open Exeter is that it has been a very positive experience, and a real pleasure to work with you all.

Posted under Follow the Data, PGR students

This post was written by Jill Evans on January 21, 2013

Follow the Data – end of project feedback from Philip Bremner

I have found working on the JISC-funded Open Exeter project an invaluable experience. It has greatly enhanced my understanding of the need for and methods of rigorous research data management. It has encouraged me to consider issues such as: what constitutes research data? What is the value of making research data openly available? Who is responsible for research data management issues and long-term archiving? Speaking personally, participation in the project has contributed to my personal development as a researcher. In the future I will feel more confident about discussing research data management procedures in the form of a research data management plan, which is required by the research councils when applying for research funding.

In addition to this I felt that the project team valued my contribution to the project, along with that of the other PGRs. As PGRs, we participated in a number of useful workshops, reports of which can be seen on the project blog: http://blogs.exeter.ac.uk/openexeterrdm/, along with many other interesting articles about the progress of the project. Various topics were discussed at these workshops such as data protection, reference management, the creation of an institutional repository etc. We also ran an event where PGRs took the lead in facilitating discussion of research management issues more generally amongst postgraduate researchers in the University. A number of other events have been organised under the auspices of the project, such as the very successful University-wide Open Access Week, which ran as part of international Open Access Week. Within my own College, the project team participated in the PGR induction programme where we tried to raise awareness of research data management issues amongst new PhD students.

The Open Exeter project has produced a number of very useful outputs, which we PGRs have been involved with in one way or another. Principally, of course, is the creation of a robust institutional repository to make research data available on open access. We were fortunate enough to be able to test drive the repository (and thereby gain a sneak preview of it). I have to admit, it seemed able to handle all the different shapes and sizes of research data that we could think of throwing at it. Another significant output is the data management policies for researchers at the University, which we had input on. As a result of these policies, PhD students (and other researchers) will be required to submit their research data for long-term archiving in the institutional repository thereby making the data available for other researchers to use. This is in line with the requirements of many of the research councils, which now make data archiving compulsory.

Looking back at the report I wrote following the initial project workshop, I wrote: ‘research data management is not something I had given much thought to…’ I can safely say that I have now given the matter some considerable thought and feel that it is one that is relevant to most researchers in the University. I feel that the Open Exeter project has been instrumental in raising awareness of research data management issues amongst researchers at the University and the project’s Advocacy and Governance Officer deserves special thanks in that regard. What is more, the project has produced some excellent training materials, which are already being delivered as part of the researcher development programme.

Looking to the future, I feel that there is still work to be done in relation to promoting open access to research data in terms of advocacy, training and data curation. My concern is that the excellent work achieved by the project will not continue beyond the conclusion of the project if sufficient funding is not in place. In my view, it is essential to ensure that there are dedicated personnel within the University whose main concern is dealing within research data management issues. The data repository, although a fantastic achievement, cannot be considered as a static system. It requires proper curation by specially trained staff who are willing and able to deal with any concerns or queries that are raised in relation to its operation.

I am very grateful for the opportunity to participate in this project and would like to thank the project team for their enthusiasm and support.

Posted under Follow the Data, PGR students

This post was written by Jill Evans on January 21, 2013

IDCC 2013

Just a quick note to say that Ian and I will be attending the 8th International Digital Curation Conference in Amsterdam next week.

On Monday 14th January, in conjunction with Cathy Pink and Jez Cope from the University of Bath, who work on the JISC-funded Research 360 project, we will be holding a  “Designing Data Management Training Resources Workshop” – for more details, see the workshop programme.

Ian will also be doing a demo on “Submitting BIG data to a DSpace repository” and we will be showing our lovely poster on “Encouraging Junior Researchers to Value and Share Data Management Skills” – Do come and talk to us if you are going to be there!

We’ll try to tweet about the conference (#IDCC13) – but since my iPad has suffered an unfortunate smashing incident, we can’t promise constant tweets! Remember you can follow us on Twitter too (@OpenExeterRDM)!

Posted under News

This post was written by Hannah Lloyd-Jones on January 11, 2013

The Holistic Librarian – Thing 21

Hello. I’m Caroline Huxtable, the Subject Librarian for Computer Science, Engineering, Mathematics, Medical Imaging and Physics.

As part of the Holistic Librarian project I was asked to research three tasks of the ‘23 Things (+1) for Research Data Management’.

Task 21 was to research the answer to the question: Which criteria could a researcher use to select which research data he needs to preserve in the long-term?

What I knew about the topic before the task:

I knew that it is important for researchers to preserve their data long-term for potential future use by the wider research community, in order to contribute to the advancement of knowledge. I also knew that increasingly it is a requirement of funding bodies that data arising from publicly-funded research is preserved and made openly available  as a public good, with ‘as few restrictions as possible in a timely and responsible manner that does not harm intellectual property’ (Source of quote: RCUK Common Principles on Data Policy). Additionally, I was aware that it is vital to select which data should/can be preserved, as impractical to preserve and provide access to all data, not least because of cost implications.

What I know now:

It is good practice, and increasingly a requirement of funders, that – at the outset of a project or even at the grant application stage – researchers create and implement a Data Management Plan, which typically includes information on ‘what data will be created and how, and outlines the plans for sharing and preservation, noting what is appropriate given the nature of the data and any restrictions that may need to be applied’. [Source: Digital Curation Centre (DCC) website; section on Data Management Plans].

The plan should specify, for example:

  • which data will be preserved.
  • whether any of the data will be deleted prior to archiving (see Things 19 and 20 for further comments on this issue).
  • which type(s) of data will be preserved (raw data, derived data, samples etc.).

As well as stating the above in the Data Management Plan, the appraisal and selection of data for preservation will be an ongoing, iterative process as the research from which it derives progresses.

The criteria which the researcher could use to select which data to preserve long-term include:

  • whether the confidentiality and/or sensitivity of any of the data means that it cannot be archived, or can only be preserved in a dark archive.
  • any funding body or other legal requirements on which data must be archived and/or made accessible.
  • whether any non-disclosure agreements apply to any of the data.
  • does the data fit into the archive or repository’s selection policy?
  • the likely costs of preservation.
  • how significant is the data for research and/or scientific or social progress?
  • is the data unique?
  • is the data usable by others?
  • the volume of data and the available storage capabilities.
  • copyright and other legal rights pertaining to the data and its ability to be preserved and/or made accessible.
  • does the data constitute the ‘vital records’ of a project, and therefore need to be retained indefinitely?
  • technical issues, e.g. can the file format be maintained or transferred for future use?
  • is there adequate metadata to describe the data and ensure its discoverability?

How I obtained this knowledge:

What else I would like to know about the topic:

Much of the existing information that I discovered seems to be written for use by repository managers, librarians, data curators etc. to assist in the creation of data appraisal, selection and curation policies. So it would be beneficial to have available a checklist of criteria from the researcher’s perspective, to which I could point researchers when answering enquiries.

How I found the task and how I would improve it:

I found the task interesting and somewhat easier to research than Things 19 and 20, perhaps because there seems to have been more written and collected together than for those earlier tasks.

Posted under Holistic Librarian, Training

This post was written by Caroline Huxtable on January 7, 2013

The Holistic Librarian – Thing 20

Hello. I’m Caroline Huxtable, the Subject Librarian for Computer Science, Engineering, Mathematics, Medical Imaging and Physics.

As part of the Holistic Librarian project I was asked to research three tasks of the ‘23 Things (+1) for Research Data Management’.

Task 20 was to research the answer to the question: A researcher wants to archive sensitive research data securely for long-term preservation. What options does she have?

What I knew about the topic before the task:

I was aware that research outputs may contain sensitive data that must be securely stored and preserved. I understood sensitive data to be:

  • personal data relating to individuals.
  • data which allows individuals to be identified.
  • confidential data.
  • data which is considered commercially sensitive such as trade secrets.

What I know now:

The UK Data Protection Act 1998 defines sensitive personal data as relating to matters including racial or ethnic origin, political and/or religious beliefs, and physical or mental health. The Act applies only to personal or sensitive personal data, and not to all research data in general, nor to anonymised data. I found it difficult to find any satisfactory definition of sensitive data more generally (rather than specifically personal data).

Sensitive data may exist in both digital and non-digital formats.

Issues to consider in relation to the secure archiving of sensitive research data include:

  • Physical security of the data, such as controlling access to rooms and buildings where data is stored and transporting sensitive (non-digital) data only when absolutely necessary.
  • Network security, e.g. not storing sensitive data on servers or computers connected to an external network, and ensuring that firewalls and other security protection are in place and kept updated.
  • Security of computer systems and files, including:
    • password protection, using complex passwords and changing them regularly.
    • implementing administrator-only permissions to access some or all of the data as appropriate (the fine detail of these permissions would need to be considered in the light of how many individuals are working with the data, and whether each person requires access to all the data or just sub-sets thereof).
    • encryption of files.
    • imposition of non-disclosure agreements for users of confidential data.
    • not exchanging sensitive data via the cloud or email unless it has been encrypted.
    • ensuring that secure measure are used when data is destroyed or devices that hold or have accessed such data are disposed of.
  • Data containing personal information is governed by the Data Protection Act, which allows personal data to be accessible only to authorised persons. It must be treated with higher security than data that does not contain personal information. This can be achieved by, for example, anonymising or aggregating data pertaining to individuals, or by storing personal information separately from the rest of the related data.
  • Whether certain data should be permanently deleted from the archived datasets so as to avoid accidental discoverability.
  • Whether data should be archived in a dark archive, i.e. preserved long-term, but not publicly discoverable or accessible.

How I obtained this knowledge:

I primarily consulted the website of the UK Data Archive, in particular the section on data security. I also got some tips from a presentation given as part of the Open Exeter project by Caroline Dominey, the University Records Manager, on data protection, storage and sharing. Additionally I looked at the University’s Information Security Policy.

What else I would like to know about the topic:

I would like to have a greater understanding of what constitutes sensitive data. As stated above, I struggled to find a good definition, and am therefore unsure whether my answer is as comprehensive as it could be in relation to the secure preservation of such data. I would welcome expert guidance on where to find further information; a training session would be ideal, as I learn better in such an environment rather than via the self-directed learning method.

How I found the task and how I would improve it:

I found the question ‘What options does she have’? slightly ambiguous in meaning. I took it to mean (and answered) ‘what issues does she need to consider in relation to the security of sensitive data’?

Posted under Holistic Librarian, Training

This post was written by Caroline Huxtable on January 7, 2013

The Holistic Librarian – Thing 19

Hello. I’m Caroline Huxtable, the Subject Librarian for Computer Science, Engineering, Mathematics, Medical Imaging and Physics.

As part of the Holistic Librarian project I was asked to research three tasks of the ‘23 Things (+1) for Research Data Management’.

Task 19 was to research the answer to the question: A researcher is working with a commercial partner on a research project. In which circumstances could the researcher make the research data from this project available on Open Access?

What I knew about the topic before the task:

I was aware that researchers in the College of Engineering, Mathematics and Physical Sciences (whose subjects I support) collaborate on projects with a range of regional, national and international external organisations including multinational companies, and that this work is often therefore commercially sensitive. However, I have not had to deal with any queries about the use of research data arising from such work, so this was a new topic for me.

What I know now:

The researcher must ensure that s/he abides by the conditions of any agreements entered into with the commercial partner, including in respect of the use of research data arising from the joint research project, such as whether it can be made available on Open Access (for example in a subject or institutional repository). The researcher must also ensure that the content of any Open Access data does not infringe copyright, e.g. that it is not derived from a licensed or commercial product. If the data does contain copyrighted material, the researcher must ensure that permission has been sought from and granted by the rights holder to include it in the Open Access dataset. Any material for which such permissions have not been granted must be deleted from the dataset prior to it being made Open Access. If the dataset has been sponsored or funded by any organisation other than the researcher’s employer, the researcher must ensure that s/he has fulfilled all obligations to that institution or organisation regarding Open Access publication.

Additionally, it is good practice, and increasingly a requirement of funders, that – at the outset of a project or even at the grant application stage- researchers create and implement a Data Management Plan, which typically includes information on ‘what data will be created and how, and outlines the plans for sharing and preservation, noting what is appropriate given the nature of the data and any restrictions that may need to be applied’. [Source: Digital Curation Centre (DCC) website; section on Data Management Plans]. Such a plan will need to consider whether there are ethical, privacy or commercial issues which may prohibit making some or all of the data publicly available on Open Access. Any restrictions on access to any of the data should be justified in the plan, for example due to the terms of a commercial partnership agreement, which may include a non-disclosure agreement or an expectation that the data will be exploited commercially or has the potential to be patented.

These are the considerations that a researcher must take into account when deciding whether such research data could be made Open Access.

How I obtained this knowledge:

The Digital Curation Centre website contains some useful information. For example, I looked at its document ‘Policy-making for Research Data in Repositories: a Guide’, and also consulted the section on Data Management Plans, e.g. the ‘Checklist for a Data Management Plan’. I also consulted the University of Exeter’s Research and Knowledge Transfer webpages, in particular the ‘Intellectual property and commercialisation’ section of their Research Toolkit.

What else I would like to know about the topic:

I feel that I have barely scratched the surface of this topic, and would like to know more. I do not feel confident that I have got a clear understanding of the subject, nor that I would be able to help a researcher who asked me this question. I would refer a query on this subject to a member of the Open Exeter team, or to Research and Knowledge Transfer.

I would welcome expert guidance on where to find further information; a training session would be ideal, as I learn better in such an environment rather than via the self-directed learning method.

How I found the task and how I would improve it:

I found this task very difficult to research, as it is not an area for which I had any prior knowledge, nor have I had any enquiries from researchers about it.

It would have been much more helpful to have had a list of links to relevant resources to refer to as I performed the task. I really needed to be pointed in the right direction, at least to get me started; I don’t learn well when faced with a bare question with no context or background.

Posted under Holistic Librarian, Training

This post was written by Caroline Huxtable on January 4, 2013

The Holistic Librarian – Thing 7

Hi, I’m Natasha Bayliss and I’m the Subject Librarian for Biosciences, Geography, Psychology, Sports and Health Science, Clinical sciences and Medicine.

Task 7 was to document “If a researcher asked you how to cite a data set, which resources could you point him to?”

What I knew about the topic beforehand:

I was aware that data required citations (just as any other source would) but it wasn’t always clear exactly how to do this as the standards for data citations are not universally agreed upon.

What I know now:

Generally speaking there are still debates about what should make up a complete citation. Although most citation methods include the following:

  • author,
  • title,
  • year of publication,
  • publisher (for data this is often the archive where it is housed),
  • edition or version, and
  • access information (a URL or other persistent identifier).

There are a number of reasons for this including the fact that it allows you to clearly identify the creator, the nature / type of the data and provides the means to access the information. However, the nature of the data produced can sometimes making it more complicated to cite. For example a dataset may have multiple parts or contain different types of data outputs. It worth noting that the final citation will depend on the referencing style the author is using within their publication and some referencing styles require additional fields.

How to Cite Datasets and Link to Publications is a really useful guide produced by the DCC that will help you with citing your data. If you are using data from a data archive you may find they produce citation guides that will help you as well. For example How to cite ESDS data and How to cite census data .

How did I obtain this knowledge?

The ESRC produce a helpful guide called Data Citation: what you need to know .

Ball, A. & Duke, M. (2012). ‘How to Cite Datasets and Link to Publications’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/how-guides

What else would I like to know about this topic:

Given the on-going debates surrounding dataset citation standards it would be interesting to see if a standardized method is developed in the future.

How did I find this task? How would I improve it?

I found the task useful and hope that it will enhance the support I provide to researchers.

 

Posted under Holistic Librarian, Training

This post was written by Hannah Lloyd-Jones on January 4, 2013

Tags: ,

The Holistic Librarian – Thing 6

Hi, I’m Natasha Bayliss and I’m the Subject Librarian for Biosciences, Geography, Psychology, Sports and Health Science, Clinical sciences and Medicine.

Task 6 was to document “Where can a University of Exeter researcher store her live research data?

What I knew about the topic beforehand:

Research data takes many forms, ranging from measurements, numbers and images to documents and publications. Therefore there are many different ways in which you can choose to store your research data. Often when creating and storing data you need to address issues surrounding ethics and data protection.

What I know now:

It’s essential to consider how you are going to store your research data from the beginning. There are a number of different ways that you can store your data. The nature of your data may determine which option you select.

All University of Exeter members have an allocation of secure filespace on a central server that can be used for storing work for access from any computer connected to the University network. This is known as the U: Drive because that is how it is identified on the open-access PC cluster rooms. More information about the U: Drive can be found at http://as.exeter.ac.uk/it/files/udrive/ .

You could store your data on a PC, Laptop or on storage devices such as external hard drives, USB memory sticks, CDs or DVDs. Cloud storage is an alternative form of storing data. It involves storing data on servers that are generally hosted by third parties, such as Google Docs, Dropbox and Skydrive. Be wary of using cloud storage for confidential data.

Backing up your data (i.e. saving it in more than one location) is critical. Useful advice on this topic can be found at http://www.gla.ac.uk/services/datamanagement/lookingafteryourdata/back-up/

How did I obtain this knowledge?

I looked at the following websites:

UK Data Archive

University of Glasgow, Data Management

Guides and Help Sheets for Researchers

Cloud storage

What else would I like to know about this topic?

Advice on storing confidential data and data security for researchers.

How did I find this task? How would I improve it?

I found the task useful and hope that it will enhance the support I provide to researchers.

Posted under Holistic Librarian, Training

This post was written by Hannah Lloyd-Jones on January 4, 2013

Tags: ,