What is Research Data? Some end of term musings…

Following on from our workshop last week I have been pondering our use of language in the Open Exeter project. I won’t say it has been keeping me up at night but…

As we are funded under the Managing Research Data 2011-13 programme I thought I would briefly blog on Research Data. This may seem strange but it has become increasingly clear that this phrase means different things to different people. I should state here that in a past life I trained as a historian and, although I did use an element of quantitative analysis, at the time I would have said that I didn’t use data in my PhD; that was used by my friends in the sciences. I used text based sources, they used numbers and data was numerical! Clearly I did use data, I just didn’t realise it.

Is it important that we define the term though? Is “Research Data” just one of those phrases that people use and they know what they mean and the person they are talking to knows what they mean but the conversation may be unintelligible to outside ears? Personally, I believe that we should find common ground. How can we create generic training materials on Research Data Management if the very title holds differing meanings to different researchers and librarians?

Looking at training materials and policies that have already been created it is clear that the term has been defined slightly differently in each case. To take three examples:

  • In the glossary to the Cambridge Data Management materials they state, “We refer to ‘research data’ a lot on the Cambridge data management pages. We put nearly all research materials under this umbrella – so, yes, spreadsheets of statistics and equipment outputs are ‘data,’ but so might be research-related e-mails, drafts, interviews, analyses, footnotes, and references.”
  • On the Oxford 101 flyer ‘Managing your research data at the University of Oxford’ they state that “Research data can be textual, numerical, qualitative, quantitative, final, preliminary, physical, digital or print”, without actually defining the term.
  • The University of Hertfordshire Policy states that data is, “distinct units of information such as facts, numbers, letters, symbols, usually formatted in a specific way, stored in a database and suitable for processing by a computer”. Does this mean that data has to be electronic? Surely not.

The Australian National Data Service has an interesting guide on this very topic entitled “What is Research Data?” In this they state there are “recognised definitions of research data available”. Does this mean there is more than one definition? If so, what should be included? Does one researcher’s output (be it an article, conference proceeding, piece of software, law report etc.) become another researcher’s data? Is it too simplistic to think of inputs and outputs as different entities? Are they one and the same; just at a different stage of a research lifecycle?

I certainly don’t have answers to the questions I have posed here but hopefully by the end of the project I will have a better idea of what the answers may be. These are only initial thoughts and I would be interested to hear what other people think. Can we define the term “Research Data” and is it actually important that we do?

Posted under Follow the Data

This post was written by Gareth Cole on December 21, 2011

First PGR workshop

Having returned from the JISC programme meeting and IDCC7 at Bristol we were straight back into our project work. On Monday we held our first workshop with our PGR students. This proved to be more successful than we could have hoped.

After a buffet lunch, introductions and iPad set up we started with a discussion on terminology. As we are working with students from Biosciences, Archaeology, Law, Sports Science, Film Studies, Engineering and Business these to led to some interesting points being raised. Even the term “Research Data” meant different things to different students. However, the was common agreement that, whatever you call them, data can be split into three stages.

  • First there is the raw/unprocessed/primary/unconstructed data
  • Secondly there  is the processed/analysed/constructed data
  • Thirdly there is published data

However, even this is fraught with difficulties as one of the students said that in their discipline students would not consider themselves to use “data” as they base their research on published reports. It certainly gave us food for thought.

The discussion then moved onto the term “metadata”. Again, this meant different things to each student and some had not heard of the term.

This discussion reinforced what I had heard at various workshops I have attended that use of language is essential when discussing RDM with researchers and will be critical when we start to create training materials.

Talking of which… Following a much needed coffee break we asked the PGRs to evaluate a number of training materials which various institutions have already created. I won’t name names, but there was general agreement that one was better than the rest! Again, the resultant discussion raised issues that we as a project need to bear in mind:

  • The use of bullet points
  • An easily navigable site
  • Embedded videos (although not universally popular)
  • Clear site map
  • Clear audience in mind when creating materials
  • Every body said that they preferred face to face training

I will blog these points in more detail at a later date and following the workshop we are confident that we have a group of engaged and very articulate students who will help the project achieve its aims.

Posted under Follow the Data

This post was written by Gareth Cole on December 14, 2011