The purpose, practicalities, pitfalls
and policies of managing and
sharing data in the UK
AAMG-CICAG Measurement,
Information and Innovation meeting
20 October 2015
Dr Danny Kingsley
Can we cover this in 15 minutes
(allowing 5 min for questions?)
• UK policy landscape
• Places to share data
• What are we trying to achieve?
• Let’s start at the beginning
• Basics of Research Data Management
• Issues with sharing (or not) data
The data policy landscape
Lots of slightly different rules in the UK
Policies
• Funder
– RCUK Common Principles on Data Policy
• Government
– Draft Concordat on Open Research Data released by the RCUK
for consultation which ended on 28 September
• http://www.rcuk.ac.uk/research/opendata/
– Cambridge coordinated a joint response with other universities
• https://unlockingresearch.blog.lib.cam.ac.uk/?p=285
• Publishers
• Institutional
– Cambridge University Research Data Management Policy
Framework. http://www.data.cam.ac.uk/university-policy
RCUK Common Principles on Data
–“Publicly funded research data are
a public good (…), which should be
made openly available with as few
restrictions as possible”
–http://www.rcuk.ac.uk/research/datapolicy
/
The principles might be common…
What the researcher hears
From Bill Hubbard Getting the rights right: when policies collide
http://www.slideshare.net/UKSG/hubbard-uksg-may2015-public
Places to share data
There are lots of options
Open repositories
• (some are free, some charge)
Disciplinary specific repositories
• Gene Expression Omnibus
– Public function genomics data repository
• http://www.ncbi.nlm.nih.gov/geo/
• arXiv
– e-prints in Physics, Mathematics, Computer Science, Quantitative
Biology, Quantitative Finance and Statistics
• http://arxiv.org/
• Oxford Text Archive
– Literary and linguistic texts for higher education
• http://ota.ox.ac.uk/
• UK Data Service
– Social science data
• http://ukdataservice.ac.uk/
• Natural Environment Research Council (NERC) run 7 repositories
• http://www.nerc.ac.uk/research/sites/data/
Journals
• Either as supplementary data, or in data-only
journals
– PLOS data sharing policy (Dec 2013)
• https://www.plos.org/plos-data-policy-faq/
– Nature’s journal Scientific Data
• http://www.nature.com/sdata/about
We are a long way from there
So what’s it all about then?
What are we actually trying to
achieve with open data policies?
In conversation with Ben Ryan EPSRC
• Please share:
– the data that underpins publications
– the data that validates research findings
– the data that is worth keeping
• The default position is ‘data should be open’
• Published research findings should be testable
• Maximise the impact of publicly funded research
• Maintain public trust in science and research
• They are trying to create a new research culture
• https://unlockingresearch.blog.lib.cam.ac.uk/?p=151
Responses to data sharing policies
• What’s the minimum we can get away with?
• This is crap
• ‘They’ are just doing this because ‘they’ can
• But it will take a huge effort to get the data in
a useable form
• No-one will look at it
• What a waste of time
Data excuse bingo
We are trying to start at the end
We should begin at the beginning - a
stitch in time and all that…
In conversation with Michael Ball BBSRC
• Disciplines themselves must establish ways of
dealing with data
– This is the beginning of an ongoing process
• Researchers need to consider how to deal
with data from the beginning of a research
project
• You can ask for money to manage data in the
grant application
• https://unlockingresearch.blog.lib.cam.ac.uk/?p=337
Research data management
• The practice of sharing data requires the data
to be:
– Accessible
– Intelligible
– Assessable
– Reusable
Some of it is really obvious
• How many of you:
– Use a file naming protocol?
– Ensure all your laptops are backed up?
– Have written a data management plan for your
current project?
– Determined who in the team owns the data?
• PS: this last one REALLY matters
Skillsets required for managing and
curating data
http://www.dcc.ac.uk/sites/default/files/documents/RDMF/RDMF2/coreSkillsDiagram.gif
Lots of jobs…
Issues with sharing data
Both with sharing and not sharing
Issues raised by researchers
• There is a very real concern that the UK will
become unattractive for collaborations
• Researchers discussing changing the type of
research being done to reduce the amount of
data being produced
• There is discussion in some circles whether
applying for EPSRC funding is worth the hassle
Consequences of not sharing data
• Medicine
– Having the data publicly available in two trials of deworming pills
demonstrated that a population wide deworming program did not improve
school performance
– http://www.buzzfeed.com/bengoldacre/deworming-trials
• Economics
– A study widely cited to justify budget cutting in the US had a mistake in the
calculations which was only revealed when the Excel file was released
– http://www.bloomberg.com/bw/articles/2013-04-18/faq-reinhart-rogoff-and-
the-excel-error-that-changed-history
• Physics
– It took 12.5 years to withdraw Jan Hendrik Schon’s work on ‘organic
semiconductors’ because the reviewers were unable to replicate the results
without access to the original data or lab books
– http://www.science20.com/science_20/jan_hendrik_sch%C3%B6n_world_cla
ss_physics_fraud_gets_last_laugh_whole_book_about_himself
Questions?
Dr Danny Kingsley
Head of Scholarly Communication
University of Cambridge
Email: dak45@cam.ac.uk
Blog: https://unlockingresearch.blog.lib.cam.ac.uk/
Website: http://osc.cam.ac.uk
Twitter: @dannykay68

The purpose, practicalities, pitfalls and policies of managing and sharing data in the UK

  • 1.
    The purpose, practicalities,pitfalls and policies of managing and sharing data in the UK AAMG-CICAG Measurement, Information and Innovation meeting 20 October 2015 Dr Danny Kingsley
  • 2.
    Can we coverthis in 15 minutes (allowing 5 min for questions?) • UK policy landscape • Places to share data • What are we trying to achieve? • Let’s start at the beginning • Basics of Research Data Management • Issues with sharing (or not) data
  • 3.
    The data policylandscape Lots of slightly different rules in the UK
  • 4.
    Policies • Funder – RCUKCommon Principles on Data Policy • Government – Draft Concordat on Open Research Data released by the RCUK for consultation which ended on 28 September • http://www.rcuk.ac.uk/research/opendata/ – Cambridge coordinated a joint response with other universities • https://unlockingresearch.blog.lib.cam.ac.uk/?p=285 • Publishers • Institutional – Cambridge University Research Data Management Policy Framework. http://www.data.cam.ac.uk/university-policy
  • 5.
    RCUK Common Principleson Data –“Publicly funded research data are a public good (…), which should be made openly available with as few restrictions as possible” –http://www.rcuk.ac.uk/research/datapolicy /
  • 6.
  • 7.
    What the researcherhears From Bill Hubbard Getting the rights right: when policies collide http://www.slideshare.net/UKSG/hubbard-uksg-may2015-public
  • 8.
    Places to sharedata There are lots of options
  • 9.
    Open repositories • (someare free, some charge)
  • 10.
    Disciplinary specific repositories •Gene Expression Omnibus – Public function genomics data repository • http://www.ncbi.nlm.nih.gov/geo/ • arXiv – e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics • http://arxiv.org/ • Oxford Text Archive – Literary and linguistic texts for higher education • http://ota.ox.ac.uk/ • UK Data Service – Social science data • http://ukdataservice.ac.uk/ • Natural Environment Research Council (NERC) run 7 repositories • http://www.nerc.ac.uk/research/sites/data/
  • 11.
    Journals • Either assupplementary data, or in data-only journals – PLOS data sharing policy (Dec 2013) • https://www.plos.org/plos-data-policy-faq/ – Nature’s journal Scientific Data • http://www.nature.com/sdata/about
  • 12.
    We are along way from there
  • 13.
    So what’s itall about then? What are we actually trying to achieve with open data policies?
  • 14.
    In conversation withBen Ryan EPSRC • Please share: – the data that underpins publications – the data that validates research findings – the data that is worth keeping • The default position is ‘data should be open’ • Published research findings should be testable • Maximise the impact of publicly funded research • Maintain public trust in science and research • They are trying to create a new research culture • https://unlockingresearch.blog.lib.cam.ac.uk/?p=151
  • 15.
    Responses to datasharing policies • What’s the minimum we can get away with? • This is crap • ‘They’ are just doing this because ‘they’ can • But it will take a huge effort to get the data in a useable form • No-one will look at it • What a waste of time
  • 16.
  • 17.
    We are tryingto start at the end We should begin at the beginning - a stitch in time and all that…
  • 18.
    In conversation withMichael Ball BBSRC • Disciplines themselves must establish ways of dealing with data – This is the beginning of an ongoing process • Researchers need to consider how to deal with data from the beginning of a research project • You can ask for money to manage data in the grant application • https://unlockingresearch.blog.lib.cam.ac.uk/?p=337
  • 19.
    Research data management •The practice of sharing data requires the data to be: – Accessible – Intelligible – Assessable – Reusable
  • 20.
    Some of itis really obvious • How many of you: – Use a file naming protocol? – Ensure all your laptops are backed up? – Have written a data management plan for your current project? – Determined who in the team owns the data? • PS: this last one REALLY matters
  • 21.
    Skillsets required formanaging and curating data http://www.dcc.ac.uk/sites/default/files/documents/RDMF/RDMF2/coreSkillsDiagram.gif
  • 22.
  • 23.
    Issues with sharingdata Both with sharing and not sharing
  • 24.
    Issues raised byresearchers • There is a very real concern that the UK will become unattractive for collaborations • Researchers discussing changing the type of research being done to reduce the amount of data being produced • There is discussion in some circles whether applying for EPSRC funding is worth the hassle
  • 25.
    Consequences of notsharing data • Medicine – Having the data publicly available in two trials of deworming pills demonstrated that a population wide deworming program did not improve school performance – http://www.buzzfeed.com/bengoldacre/deworming-trials • Economics – A study widely cited to justify budget cutting in the US had a mistake in the calculations which was only revealed when the Excel file was released – http://www.bloomberg.com/bw/articles/2013-04-18/faq-reinhart-rogoff-and- the-excel-error-that-changed-history • Physics – It took 12.5 years to withdraw Jan Hendrik Schon’s work on ‘organic semiconductors’ because the reviewers were unable to replicate the results without access to the original data or lab books – http://www.science20.com/science_20/jan_hendrik_sch%C3%B6n_world_cla ss_physics_fraud_gets_last_laugh_whole_book_about_himself
  • 26.
    Questions? Dr Danny Kingsley Headof Scholarly Communication University of Cambridge Email: [email protected] Blog: https://unlockingresearch.blog.lib.cam.ac.uk/ Website: http://osc.cam.ac.uk Twitter: @dannykay68