SlideShare a Scribd company logo
RESEARCH DATA SHARING:
A BASIC FRAMEWORK
Paul Groth @pgroth
pgroth.com
Elsevier Labs @elsevierlabs
LERU Summer School 2016
Data Stewardship for Scientific Discovery and Innovation
WHAT IS DATA?
WHAT IS DATA?
“Data refers to entities used as evidence of phenomena for
the purposes of research or scholarship”
[Borgman Big Data, Little
Data, No Data 2015 p.29]
WHY COLLECT
DATA?
WHY COLLECT
DATA?
Borgman, C. L. (2012). The conundrum of sharing
research data. Journal of the American Society for
Information Science and Technology.
HOW IS DATA
OBTAINED
HOW IS DATA
OBTAINED
Borgman, C. L. (2012). The conundrum of sharing
research data. Journal of the American Society for
Information Science and Technology.
WHY SHARE DATA?
WHY SHARE DATA?
• R1: reproduce or verify research,
• R2: make results of publicly funded
research available to the public
• R3: enable others to ask new
questions of extant data
• R4: advance the state of research
and innovation.
Borgman, C. L. (2012). The conundrum of sharing research data.
Journal of the American Society for Information Science and
Technology.
• All empirical papers must archive their data upon acceptance in order to be published unless the authors provide
a compelling reason why they cannot (e.g., expense, confidentiality). The action editor will be the final arbiter of whether the reason is
sufficiently compelling.
• “Data” refers to an electronic file containing nonidentified responses that are potentially already coded. Normally, the data would
represent an early stage of electronic processing, before individual responses have been aggregated. The data must be in
a form that allows all reported statistical analyses to be reproduced
while retaining the confidentiality of individual participants. This entails that the data are formatted and documented in a way that makes
the structure of the data set readily apparent.
• Archiving consists either of submitting the data to the journal (to be displayed as supplementary material at the end of the article),
sending it to some other archive that is accessible to established researchers and maintained by a substantial established institution, or
authors making the data available on their own website, assuming that they can assure us the site will be maintained by a recognized
institution for a reasonable period of time. Again, action editors will be the final arbiters of the appropriateness of an archive.
• Any publication that reports analyses of or refers to archived data will be expected to cite the original
publication in which the data were reported.
• This policy is new and therefore open to modification. Our aim is to implement a policy that maximizes transparency while minimizing the
burden on authors.
Research Data Sharing: A Basic Framework
THE IMPORTANCE OF CITING DATA
Data Citation Synthesis Group: Joint Declaration of Data Citation
Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014
[https://www.force11.org/group/joint-declaration-data-
citation-principles-final].
1. Importance
2. Credit and Attribution
3. Evidence
4. Unique Identification
5. Access
6. Persistence
7. Specificity and Verifiability
8. Interoperability and Flexibility
Research Data Sharing: A Basic Framework
10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA
https://www.elsevier.com/con
nect/10-aspects-of-highly-
effective-research-data
https://storify.com/chenghlee/dataformathell
http://isps.yale.edu/sites/default/files/files/I
DCC14_DQR_PeerGreenStephenson.pdf
ALL DATA ISN’T SUCCESSFUL
BARRIERS TO REACHING SUCCESSFUL
DATA?
Common practice: data is very fragmented
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
17
ALL DATA ISN’T CURATED
Cost of documentation
http://www.indoition.com/en/services/costs
-prices-software-documentation.htm
20Yolanda GilUSC Information Sciences Institute gil@isi.edu
Measuring Time Savings with
“Reproducibility Maps” [Garijo et al PLOS CB12]
2 months of effort in reproducing published method (in PLoS’10)
Authors expertise was required
Comparison of
ligand binding
sites
Comparison of
dissimilar protein
structures
Graph network
generation
Molecular Docking
Work with D. Garijo of UPM and P. Bourne of UCSD
CURRENT STRATEGIES FOR DATA SHARING
SUBJECT SPECIFIC REPOSITORIES
SUBJECT SPECIFIC REPOSITORIES
COMMUNITY SPECIFIC REPOSITORIES
GENERIC REPOSITORIES
http://data.mendeley.com/
Each dataset receives a versioned
DOI, so it can be cited
The citation for the
associated article is
displayed
DATA PUBLICATION
BENEFITS OF MACHINE READBILITY
HOW DO WE MOVE UP THE PYRAMID
https://www.elsevier.com/con
nect/10-aspects-of-highly-
effective-research-data
60 % OF TIME IS SPENT ON DATA
PREPARATION
CURATED DATA SETS
http://ivory.idyll.org/blog/replication-i.html
MORE SEMANTICS
A FRAMEWORK FOR HELPING
RESEARCHERS SHARE DATA
• What data?
• Determine the context
• Why is data being collected?
• How is data obtained?
• What is the researchers’ reason for sharing?
• Document
• Understand Cost/benefit tradeoffs
• Target audience
• Automation
FURTHER READING
• Syllabus for Data Management and Practice, Part I, Winter 2016. Data
Management and Practice, Part I (2016)Christine L Borgmam.
https://works.bepress.com/borgman/381/
• Christine L. Borgman. “Big Data, Little Data, No Data”
• Reference list
://www.zotero.org/groups/borgman_big_data_little_data_no_data
• Borgman, C. L. (2012). The conundrum of sharing research data. Journal of
the American Society for Information Science and Technology.
• Goodman A, Pepe A, Blocker AW, Borgman CL, Cranmer K, et al. (2014)
Ten Simple Rules for the Care and Feeding of Scientific Data. PLoS Comput
Biol 10(4): e1003542. doi: 10.1371/journal.pcbi.1003542

More Related Content

What's hot (20)

PPTX
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
PPTX
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Stuart Chalk
 
PDF
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
National Information Standards Organization (NISO)
 
PPTX
Sanderson Shout It Out: LOUD
National Information Standards Organization (NISO)
 
PPTX
Data for Science: How Elsevier is using data science to empower researchers
Paul Groth
 
PPTX
Knowledge graphs on the Web
Armin Haller
 
PPTX
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Stuart Chalk
 
PPTX
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Stuart Chalk
 
PDF
Open Research Data: Licensing | Standards | Future
Ross Mounce
 
PPTX
Scientific Units in the Electronic Age
Stuart Chalk
 
PPTX
FedCentric_Presentation
Yatpang Cheung
 
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Armin Haller
 
PPTX
The State of Linked Government Data
Richard Cyganiak
 
PDF
McGeary Data Curation Network: Developing and Scaling
National Information Standards Organization (NISO)
 
PDF
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
PDF
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Microsoft Azure for Research
 
PDF
Dealing with Open Domain Data
Mathieu d'Aquin
 
PDF
Trustworthy AI and Open Science
Beth Plale
 
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Stuart Chalk
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
National Information Standards Organization (NISO)
 
Data for Science: How Elsevier is using data science to empower researchers
Paul Groth
 
Knowledge graphs on the Web
Armin Haller
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Stuart Chalk
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Stuart Chalk
 
Open Research Data: Licensing | Standards | Future
Ross Mounce
 
Scientific Units in the Electronic Age
Stuart Chalk
 
FedCentric_Presentation
Yatpang Cheung
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Armin Haller
 
The State of Linked Government Data
Richard Cyganiak
 
McGeary Data Curation Network: Developing and Scaling
National Information Standards Organization (NISO)
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Microsoft Azure for Research
 
Dealing with Open Domain Data
Mathieu d'Aquin
 
Trustworthy AI and Open Science
Beth Plale
 

Viewers also liked (20)

PPTX
Telling your research story with (alt)metrics
Paul Groth
 
PPTX
Altmetrics: painting a broader picture of impact
Paul Groth
 
PPTX
"Don't Publish, Release" - Revisited
Paul Groth
 
PPTX
Transparency in the Data Supply Chain
Paul Groth
 
PPTX
Data Integration vs Transparency: Tackling the tension
Paul Groth
 
PPTX
Decoupling Provenance Capture and Analysis from Execution
Paul Groth
 
PPTX
Knowledge Graphs at Elsevier
Paul Groth
 
PPTX
Open PHACTS API Walkthrough
Paul Groth
 
PPTX
Tradeoffs in Automatic Provenance Capture
Paul Groth
 
PPTX
Provenance for Data Munging Environments
Paul Groth
 
PPT
Validation of Europeana data: application profile, OWL ontology, or else?
Antoine Isaac
 
PPTX
Knowledge Graph Construction and the Role of DBPedia
Paul Groth
 
PDF
DC-2016 Keynote 2016-10-13
Bradley Allen
 
PPTX
Points of Strength & Distinction at Assiut University Faculty of Education (A...
memogreat
 
PPTX
Neobr introduction to realist training 20150302
RE/MAX Grand Lake
 
PDF
Achtergrondinformatie Media Persbericht
keijman
 
PDF
Twidiko 1 — Slideshare
svetlichny
 
PPT
2-28-10 Youth Announcements
realifesigma
 
PDF
Engaging photos online
Bradley Wilson
 
PPTX
Un Ejemplo De Multimedia
su30su
 
Telling your research story with (alt)metrics
Paul Groth
 
Altmetrics: painting a broader picture of impact
Paul Groth
 
"Don't Publish, Release" - Revisited
Paul Groth
 
Transparency in the Data Supply Chain
Paul Groth
 
Data Integration vs Transparency: Tackling the tension
Paul Groth
 
Decoupling Provenance Capture and Analysis from Execution
Paul Groth
 
Knowledge Graphs at Elsevier
Paul Groth
 
Open PHACTS API Walkthrough
Paul Groth
 
Tradeoffs in Automatic Provenance Capture
Paul Groth
 
Provenance for Data Munging Environments
Paul Groth
 
Validation of Europeana data: application profile, OWL ontology, or else?
Antoine Isaac
 
Knowledge Graph Construction and the Role of DBPedia
Paul Groth
 
DC-2016 Keynote 2016-10-13
Bradley Allen
 
Points of Strength & Distinction at Assiut University Faculty of Education (A...
memogreat
 
Neobr introduction to realist training 20150302
RE/MAX Grand Lake
 
Achtergrondinformatie Media Persbericht
keijman
 
Twidiko 1 — Slideshare
svetlichny
 
2-28-10 Youth Announcements
realifesigma
 
Engaging photos online
Bradley Wilson
 
Un Ejemplo De Multimedia
su30su
 
Ad

Similar to Research Data Sharing: A Basic Framework (20)

PDF
Data sharing as part of the research workflow
Varsha Khodiyar
 
PPTX
Data Literacy: Creating and Managing Reserach Data
cunera
 
PDF
Alain Frey Research Data for universities and information producers
Incisive_Events
 
PPTX
Talk on Research Data Management
Anita de Waard
 
PDF
Minimal viable data reuse
voginip
 
PPTX
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
Rebecca Grant
 
PDF
Va sla nov 15 final
Margaret Henderson
 
PPTX
Chapter 12
Webometrics Class
 
PPTX
Reproducible research: theory
C. Tobin Magle
 
PPTX
Ten Habits of Highly Successful Data
Anita de Waard
 
PPTX
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
SC CTSI at USC and CHLA
 
PDF
Preparing Data for Sharing: The FAIR Principles
London School of Hygiene and Tropical Medicine
 
PPTX
Publishing perspectives on data management & future directions
ARDC
 
PPTX
Research data life cycle
University of Arizona
 
PPTX
The habits of highly successful data:
Anita de Waard
 
PDF
Data sharing as part of the research ecosystem
Varsha Khodiyar
 
PDF
Guy avoiding-dat apocalypse
ENUG
 
PPTX
Open science, open data - FOSTER training, Potsdam
Platforma Otwartej Nauki
 
PDF
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
Susanna-Assunta Sansone
 
PPTX
Introduction to research data management
rds-wayne-edu
 
Data sharing as part of the research workflow
Varsha Khodiyar
 
Data Literacy: Creating and Managing Reserach Data
cunera
 
Alain Frey Research Data for universities and information producers
Incisive_Events
 
Talk on Research Data Management
Anita de Waard
 
Minimal viable data reuse
voginip
 
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
Rebecca Grant
 
Va sla nov 15 final
Margaret Henderson
 
Chapter 12
Webometrics Class
 
Reproducible research: theory
C. Tobin Magle
 
Ten Habits of Highly Successful Data
Anita de Waard
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
SC CTSI at USC and CHLA
 
Preparing Data for Sharing: The FAIR Principles
London School of Hygiene and Tropical Medicine
 
Publishing perspectives on data management & future directions
ARDC
 
Research data life cycle
University of Arizona
 
The habits of highly successful data:
Anita de Waard
 
Data sharing as part of the research ecosystem
Varsha Khodiyar
 
Guy avoiding-dat apocalypse
ENUG
 
Open science, open data - FOSTER training, Potsdam
Platforma Otwartej Nauki
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
Susanna-Assunta Sansone
 
Introduction to research data management
rds-wayne-edu
 
Ad

More from Paul Groth (20)

PDF
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
PDF
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
PDF
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
PDF
Data Curation and Debugging for Data Centric AI
Paul Groth
 
PPTX
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
PPTX
Data Communities - reusable data in and outside your organization.
Paul Groth
 
PPTX
Minimal viable-datareuse-czi
Paul Groth
 
PDF
Knowledge Graph Maintenance
Paul Groth
 
PDF
Knowledge Graph Futures
Paul Groth
 
PDF
Knowledge Graph Maintenance
Paul Groth
 
PPTX
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
PPTX
Thinking About the Making of Data
Paul Groth
 
PPTX
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
PPTX
From Data Search to Data Showcasing
Paul Groth
 
PPTX
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
PPTX
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
PPTX
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
PPTX
Progressive Provenance Capture Through Re-computation
Paul Groth
 
PPTX
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
PPTX
Are we finally ready for transclusion?*
Paul Groth
 
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Data Curation and Debugging for Data Centric AI
Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
Data Communities - reusable data in and outside your organization.
Paul Groth
 
Minimal viable-datareuse-czi
Paul Groth
 
Knowledge Graph Maintenance
Paul Groth
 
Knowledge Graph Futures
Paul Groth
 
Knowledge Graph Maintenance
Paul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Thinking About the Making of Data
Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
From Data Search to Data Showcasing
Paul Groth
 
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Progressive Provenance Capture Through Re-computation
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
Are we finally ready for transclusion?*
Paul Groth
 

Recently uploaded (20)

PDF
Unit-3 ppt.pdf organic chemistry unit 3 heterocyclic
visionshukla007
 
PPTX
CNS.pptx Central nervous system meninges ventricles of brain it's structure a...
Ashwini I Chuncha
 
PPT
Experimental Design by Cary Willard v3.ppt
MohammadRezaNirooman1
 
PPTX
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
PPTX
MODULE 2 Effects of Lifestyle in the Function of Respiratory and Circulator...
judithgracemangunday
 
PPTX
Class12_Physics_Chapter2 electric potential and capacitance.pptx
mgmahati1234
 
PPTX
Envenomation AND ANIMAL BITES DETAILS.pptx
HARISH543351
 
PDF
2025-06-10 TWDB Agency Updates & Legislative Outcomes
tagdpa
 
PDF
A High-Caliber View of the Bullet Cluster through JWST Strong and Weak Lensin...
Sérgio Sacani
 
PDF
The emergence of galactic thin and thick discs across cosmic history
Sérgio Sacani
 
PDF
Insect Behaviour : Patterns And Determinants
SheikhArshaqAreeb
 
PPTX
Entner-Doudoroff pathway by Santosh .pptx
santoshpaudel35
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PDF
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
PDF
A Man of the Forest: The Contributions of Gifford Pinchot
RowanSales
 
PDF
NRRM 330 Dynamic Equlibrium Presentation
RowanSales
 
PDF
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
PPTX
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
PPTX
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
PPTX
Microbiome_Engineering_Poster_Fixed.pptx
SupriyaPolisetty1
 
Unit-3 ppt.pdf organic chemistry unit 3 heterocyclic
visionshukla007
 
CNS.pptx Central nervous system meninges ventricles of brain it's structure a...
Ashwini I Chuncha
 
Experimental Design by Cary Willard v3.ppt
MohammadRezaNirooman1
 
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
MODULE 2 Effects of Lifestyle in the Function of Respiratory and Circulator...
judithgracemangunday
 
Class12_Physics_Chapter2 electric potential and capacitance.pptx
mgmahati1234
 
Envenomation AND ANIMAL BITES DETAILS.pptx
HARISH543351
 
2025-06-10 TWDB Agency Updates & Legislative Outcomes
tagdpa
 
A High-Caliber View of the Bullet Cluster through JWST Strong and Weak Lensin...
Sérgio Sacani
 
The emergence of galactic thin and thick discs across cosmic history
Sérgio Sacani
 
Insect Behaviour : Patterns And Determinants
SheikhArshaqAreeb
 
Entner-Doudoroff pathway by Santosh .pptx
santoshpaudel35
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
A Man of the Forest: The Contributions of Gifford Pinchot
RowanSales
 
NRRM 330 Dynamic Equlibrium Presentation
RowanSales
 
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
Microbiome_Engineering_Poster_Fixed.pptx
SupriyaPolisetty1
 

Research Data Sharing: A Basic Framework

  • 1. RESEARCH DATA SHARING: A BASIC FRAMEWORK Paul Groth @pgroth pgroth.com Elsevier Labs @elsevierlabs LERU Summer School 2016 Data Stewardship for Scientific Discovery and Innovation
  • 3. WHAT IS DATA? “Data refers to entities used as evidence of phenomena for the purposes of research or scholarship” [Borgman Big Data, Little Data, No Data 2015 p.29]
  • 5. WHY COLLECT DATA? Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology.
  • 7. HOW IS DATA OBTAINED Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology.
  • 9. WHY SHARE DATA? • R1: reproduce or verify research, • R2: make results of publicly funded research available to the public • R3: enable others to ask new questions of extant data • R4: advance the state of research and innovation. Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology.
  • 10. • All empirical papers must archive their data upon acceptance in order to be published unless the authors provide a compelling reason why they cannot (e.g., expense, confidentiality). The action editor will be the final arbiter of whether the reason is sufficiently compelling. • “Data” refers to an electronic file containing nonidentified responses that are potentially already coded. Normally, the data would represent an early stage of electronic processing, before individual responses have been aggregated. The data must be in a form that allows all reported statistical analyses to be reproduced while retaining the confidentiality of individual participants. This entails that the data are formatted and documented in a way that makes the structure of the data set readily apparent. • Archiving consists either of submitting the data to the journal (to be displayed as supplementary material at the end of the article), sending it to some other archive that is accessible to established researchers and maintained by a substantial established institution, or authors making the data available on their own website, assuming that they can assure us the site will be maintained by a recognized institution for a reasonable period of time. Again, action editors will be the final arbiters of the appropriateness of an archive. • Any publication that reports analyses of or refers to archived data will be expected to cite the original publication in which the data were reported. • This policy is new and therefore open to modification. Our aim is to implement a policy that maximizes transparency while minimizing the burden on authors.
  • 12. THE IMPORTANCE OF CITING DATA Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 [https://www.force11.org/group/joint-declaration-data- citation-principles-final]. 1. Importance 2. Credit and Attribution 3. Evidence 4. Unique Identification 5. Access 6. Persistence 7. Specificity and Verifiability 8. Interoperability and Flexibility
  • 14. 10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA https://www.elsevier.com/con nect/10-aspects-of-highly- effective-research-data
  • 16. BARRIERS TO REACHING SUCCESSFUL DATA?
  • 17. Common practice: data is very fragmented Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper. End of story. 17
  • 18. ALL DATA ISN’T CURATED
  • 20. 20Yolanda GilUSC Information Sciences Institute [email protected] Measuring Time Savings with “Reproducibility Maps” [Garijo et al PLOS CB12] 2 months of effort in reproducing published method (in PLoS’10) Authors expertise was required Comparison of ligand binding sites Comparison of dissimilar protein structures Graph network generation Molecular Docking Work with D. Garijo of UPM and P. Bourne of UCSD
  • 21. CURRENT STRATEGIES FOR DATA SHARING
  • 25. GENERIC REPOSITORIES http://data.mendeley.com/ Each dataset receives a versioned DOI, so it can be cited The citation for the associated article is displayed
  • 27. BENEFITS OF MACHINE READBILITY
  • 28. HOW DO WE MOVE UP THE PYRAMID https://www.elsevier.com/con nect/10-aspects-of-highly- effective-research-data
  • 29. 60 % OF TIME IS SPENT ON DATA PREPARATION
  • 33. A FRAMEWORK FOR HELPING RESEARCHERS SHARE DATA • What data? • Determine the context • Why is data being collected? • How is data obtained? • What is the researchers’ reason for sharing? • Document • Understand Cost/benefit tradeoffs • Target audience • Automation
  • 34. FURTHER READING • Syllabus for Data Management and Practice, Part I, Winter 2016. Data Management and Practice, Part I (2016)Christine L Borgmam. https://works.bepress.com/borgman/381/ • Christine L. Borgman. “Big Data, Little Data, No Data” • Reference list ://www.zotero.org/groups/borgman_big_data_little_data_no_data • Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology. • Goodman A, Pepe A, Blocker AW, Borgman CL, Cranmer K, et al. (2014) Ten Simple Rules for the Care and Feeding of Scientific Data. PLoS Comput Biol 10(4): e1003542. doi: 10.1371/journal.pcbi.1003542

Editor's Notes

  • #19: http://www.tamr.com/piketty-revisited-improving-economics-data-science/
  • #30: NASA, A.40 Computational Modeling Algorithms and Cyberinfrastructure, tech. report, NASA, 19 Dec. 2011