Open Data 
Open Notebook Science 
Peter Murray-Rust, 
Open Science, Rio, BR, 2014-08-22
Retrieved 2014-08-08 
Lancet 2011 
31 USD 
For 1 day 
PMR: Closed Access Means People Die
Overview 
• Most scientific data is lost; costs many billions… 
• … AND LIVES. 
• Human problem; lack of vision + active 
opposition. 
• Born-open data and Open Notebook Science 
• Jean-Claude Bradley 
• Panton Principles and Fellows (OKFN) 
• Digital Enlightenment or Digital Darkness?
Reasons for Open Data/Science 
• Moral: Closed can be unjust 
• Ethical: Community norms expect it 
• Utilitarian: Greater communal good f 
• Personal: Greater personal benefit
RCUK 
Wellcome 
ERC 
NSF 
FWF… 
require 
fully OPEN 
[at Research Data Alliance, we are entering a new “era of open science”, which will be “good 
for citizens, good for scientists and good for society”. 
She explicitly highlighted the transformative potential of open access, open data, open 
software and open educational resources – mentioning the EU’s policy requiring open access 
to all publications and data resulting from EU funded research. 
http://blog.okfn.org/2013/03/21/we-are-entering-an-era-of-open-science-says-eu-vp-neelie-kroes/# 
sthash.3SWDXDE6.dpuf
Scientific and Medical publication (STM)[+] 
• World Citizens pay $400,000,000,000… 
• … for research in 1,500,000 articles … 
• … cost $300,000 each to create … 
• … $7000 each to “publish” [*]… 
• … $10,000,000,000 from academic libraries … 
• … to “publishers” who forbid access to 99.9% of 
citizens of the world … 
[+] Figures probably +- 50 % 
[*] arXiV preprint server costs $7 USD per paper
US Taxpayers spend 139 Billion USD / yr 
on Scientific Research 
4 Billion USD on human genome 
yielded 800 Billion USD and 4 M job-years
Bad publication wastes science 
…three problems—flawed design, non-publication, 
and poor reporting—together 
meant >85% of research funds were wasted, a 
global total loss >100 billion USD per year. [Lancet 
2009http://www.thelancet.com/journals/lancet /article/PIIS0140-6736%2809%2960329- 
9/fu lltext.] 
[Even more] waste clearly occurs after 
publication: from poor access, poor 
dissemination, and poor uptake of the findings 
of research. 
[PLOS Medicine 2014-05-27 DOI: 10.1371/journal.pmed.1001651]
Authors don’t deposit data (Ross Mounce)
C) What’s the problem with this spectrum? 
Original thanks to ChemBark 
Org. Lett., 2011, 13 (15), pp 4084–4087
After AMI2 processing….. 
… AMI2 has detected a square
Open data and Open Science
PM-R writes about 
how Open gave him 
5 jobs 
August 2014 
Marcus Hanwell 
http://opensource.com/tags/open-science 
Ross Mounce
Traditional Research and Publication 
“Lab” work paper/th 
esis 
Write 
rewrite 
Re-experiment 
process “belongs” 
to publisher 
publish 
??? 
Validation?? 
DATA 
output “belongs” 
to publisher 
Walls of 
academia
Free/Open Software Development 
CODE 
REPOSITORY 
World 
community 
CODE 
validate 
rewrite 
CODE 
fork 
CODE 
Re-use 
CODE 
Re-use 
Github, BitBucket 
StackOverflow, 
Apache 
inspires 
OSI 
NO WALLS 
BORN-OPEN-SOURCE 
Example: ContentMine at 
http://github.com/ContentMine/quickscrape
BornOS commits in 4 hours
Continuous integration in PMR group 
does the code still work?
Open data
Restrictions on Re-use of Crystallographic data 
NOTE: The CCDC is based on data contributed by 
scientists as part of publication and validation
Elsevier wants to control Open Data 
ViceChancellor Cambridge 
[asked by Michelle Brook]
Licences destroy Content Mining 
WE WALKED OUT 
• Brit Library 
• JISC 
• RLUK 
• OKFN 
• … 
• Ross Mounce 
• PM-R 
STM Publishers Licence 
2012_03_15_Sample_Licence_Text_Data_Mining.pdf 
(Summary: PMR has NO rights) 
• [cannot publish to: ] “libraries, repositories, or archives” 
• [cannot] “Make the results of any TDM Output available on an externally facing server or 
website” 
• “Subscriber shall pay a […] fee” 
Heather Piwowar: “negotiating with publishers [made me physically ill]”
Human Genome Project 
https://en.wikipedia.org/wiki/Bermuda_Principles 
• Automatic release of sequence assemblies larger than 1 
kb (preferably within 24 hours). 
• Immediate publication of finished annotated 
sequences. 
• Aim to make the entire sequence freely available in the 
public domain for both research and development in 
order to maximise benefits to society.
Panton Principles for Open Data in 
science(2010) 
• PUBLISH YOUR DATA OPENLY 
• …make an explicit and robust statement of your wishes. 
• Use a recognized waiver or license that is appropriate for 
data. 
• open as defined by the Open Knowledge/Data Definition 
(… NOT non-commercial) 
• Explicit dedication of data … into the public domain via 
PDDL or CCZero 
Peter Murray-Rust, Cameron Neylon, Rufus Pollock, John 
Wilbanks
Panton Authors and Fellows
Open data and Open Science
Open Notebook Science
Open notebook science is the practice of 
making the entire primary record of a research 
project publicly available online as it is 
recorded. (WP) 
Jean-Claude Bradley was a chemist who 
actively promoted Open Science in 
chemistry,… He coined the term Open 
Notebook Science. … A memorial 
symposium was held July 14, 2014 at 
Cambridge University, UK.[9]
Open data and Open Science
Open Source software inspires Open Science 
Jean-Claude Bradley 2006
Open Notebook Science, ONS 
Jean-Claude Bradley 2006
Jean-Claude Bradley 2006
Jean-Claude Bradley 2006
Jean-Claude Bradley 2006
Volunteer community in chemistry: Open Data/Source/Standards
Award of Blue Obelisk 
Jean-Claude Bradley Egon Willighagen
Realising OpenNotebookScience 
When a distinguished but elderly scientist states that something is 
possible, he is almost certainly right. When he states that something is 
impossible, he is very probably wrong. 
http://en.wikipedia.org/wiki/Clarke's_three_laws 
Open Inspirations (some are zero budget) 
• Open Street Map 
• Journal Of Machine Learning Research 
• Blue Obelisk 
• arXiV 
• Protein Data Bank 
• Galaxy Zoo
Self-benefit drives Open 
• I put my data/papers in a repository because I 
HAVE TO 
• I commit my code to GitHub because I WANT 
TO: 
– It’s safe 
– It’s validated 
– I know it works 
– There are tools to search it 
– Other coders improve and add to it
http://en.wikipedia.org/wiki/Reinventing_Discovery 
http://michaelnielsen.org/blog/reinventing-discovery/
The Polymath project 
Tim Gowers and the world 
http://polymathprojects.org/2013/11/04/polymath9-pnp/#comments 
http://gowers.wordpress.com/2013/11/03/dbd1-initial-post/
Open Notebook Science 
TOOLS 
Open 
engineered 
repository 
INSTRUMENT 
World 
community 
validate 
merge 
MODEL 
CODE 
DATA 
DATA 
knowledge 
calibrate 
Machines 
and humans 
Working 
together 
Problems are solved communally; 
Nothing is needlessly duplicated; “publication“ is 
continuous ; data are SEMANTIC
Sophie Kershaw, Panton Fellow
Open Notebook Science 
TOOLS 
Open 
engineered 
repository 
INSTRUMENT 
World 
community 
validate 
merge 
MODEL 
CODE 
DATA 
DATA 
knowledge 
calibrate 
Machines 
and humans 
Working 
together 
Problems are solved communally; 
Nothing is needlessly duplicated; “publication“ is 
continuous ; data are SEMANTIC
Benefits of OpenNotebookScience 
• Fraud is virtually impossible 
• Priority and credit are algorithmically established 
• It is difficult to be scooped… 
• Data and ideas cannot be lost 
• The world discovers you and you the world 
• Time to announcement is much advanced 
(?years) 
• The “publication process” is vastly less onerous 
• … but others may use your work in other ways
http://www.budapestopenaccessinitiative.org/read 
… an unprecedented public good. … 
… completely free and unrestricted access to [peer-reviewed 
literature] by all scientists, scholars, teachers, 
students, and other curious minds. … 
…Removing access barriers to this literature will 
accelerate research, enrich education, share the 
learning of the rich with the poor and the poor with 
the rich, make this literature as useful as it can be, and 
lay the foundation for uniting humanity in a common 
intellectual conversation and quest for knowledge. 
(Budapest Open Access Initiative, 2003)
Open Notebook Science 
TOOLS 
ONS 
repository 
World 
community 
INSTRUMENT 
validate 
merge 
MODEL 
CODE 
DATA 
DATA 
knowledge 
calibrate 
Machines and 
humans 
working together 
CC-BY 
Problems are solved communally; 
Nothing is needlessly duplicated; “publication“ is 
continuous and immediate
Traditional Research and Publication 
“Lab” work paper/th 
esis 
Write 
rewrite 
Re-experiment 
publish 
??? 
Validation?? 
DATA 
output “belongs” 
to publisher 
Is there anything we can do with this?
Open Notebook Science 
TOOLS 
ONS 
repository 
World 
community 
INSTRUMENT 
validate 
merge 
MODEL 
CODE 
DATA 
DATA 
knowledge 
calibrate 
Machines and 
humans 
working together 
CC-BY/0 
Problems are solved communally; 
Nothing is needlessly duplicated; “publication“ is 
continuous and immediate

More Related Content

PPTX
Open Notebook Science
PPTX
OpenNotebookScience NOW!
PPTX
Copyright Reform and Open Data
PPTX
ContentMine: Open Data and Social Machines
PPTX
ContentMine and WikiData
PPTX
Petermrjisc20141201
PPTX
Principles and practice of Open Science
Open Notebook Science
OpenNotebookScience NOW!
Copyright Reform and Open Data
ContentMine: Open Data and Social Machines
ContentMine and WikiData
Petermrjisc20141201
Principles and practice of Open Science

What's hot (20)

PPTX
The Content Mine (presented at UKSG)
PPTX
Disruptive Communities and Technology
PPTX
Making Theses USEFUL
PPTX
Content Mining for Machines and Humans
PPTX
ContentMine: Liberating scholarship from Open publications and theses
PPTX
Embrace the Open Revolution
PPTX
PPTX
Can Computers understand the scientific literature (includes compscie material)
PPTX
Open Data and Open Science
PPTX
The culture of researchData
PPT
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
PPTX
The culture of researchData
PPTX
Improving the troubled relationship between Scientists and Wikipedia
PDF
Open scholarship [a FOSTER open science talk]
PPTX
Disrupting the Publisher-Academic Complex
PPTX
Climate Change and Human Migration
PDF
Open Access for Early Career Researchers
PPTX
ContentMining and Clinical Trials
PPTX
ContentMining and Clinical Trials
PPT
Authenticating Scientists with OpenID
The Content Mine (presented at UKSG)
Disruptive Communities and Technology
Making Theses USEFUL
Content Mining for Machines and Humans
ContentMine: Liberating scholarship from Open publications and theses
Embrace the Open Revolution
Can Computers understand the scientific literature (includes compscie material)
Open Data and Open Science
The culture of researchData
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
The culture of researchData
Improving the troubled relationship between Scientists and Wikipedia
Open scholarship [a FOSTER open science talk]
Disrupting the Publisher-Academic Complex
Climate Change and Human Migration
Open Access for Early Career Researchers
ContentMining and Clinical Trials
ContentMining and Clinical Trials
Authenticating Scientists with OpenID
Ad

Viewers also liked (20)

PPTX
Introduction to open science
PPTX
Open Science
PDF
Science in the Open - Science Commons Pacific Northwest
PPT
Columbia Talk on Open Notebook Science
PDF
Building Capacity for Open Science
PPTX
Open science, open data - FOSTER training, Potsdam
PPTX
Open Science and European Access Policies in H2020
PPTX
The Future of Open Science
PDF
Relationships between Open Science, Science 2.0, and Social Media
PPTX
Open science
PPTX
What is Open Science and what role does it play in Development?
PDF
Presentation on Open Science and its 'Impacts';
PPTX
Directions in Open Science
PPTX
Open Science: What, why, how?
PDF
Winning research proposals with open science
PPTX
Scholarly publishing in the context of open science
PPT
Open Science at the European Commission
PPTX
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
PPTX
Open Science in a European Perspective
PDF
Connecting the dots - e-Infra services for open science
Introduction to open science
Open Science
Science in the Open - Science Commons Pacific Northwest
Columbia Talk on Open Notebook Science
Building Capacity for Open Science
Open science, open data - FOSTER training, Potsdam
Open Science and European Access Policies in H2020
The Future of Open Science
Relationships between Open Science, Science 2.0, and Social Media
Open science
What is Open Science and what role does it play in Development?
Presentation on Open Science and its 'Impacts';
Directions in Open Science
Open Science: What, why, how?
Winning research proposals with open science
Scholarly publishing in the context of open science
Open Science at the European Commission
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Open Science in a European Perspective
Connecting the dots - e-Infra services for open science
Ad

Similar to Open data and Open Science (20)

PPTX
Making Theses USEFUL
PPTX
Open Knowledge and University of Cambridge European Bioinformatics Institute
PPTX
Open science and its advocacy
PPTX
Benefits and practice of open science
PPTX
Learn to speak open
PPTX
OpenNotebookScience NOW!
PDF
Data and Research Infrastructures and Open Science
PPTX
OA to-publications-and-data-ibdpan2016
PPTX
The Culture of Research Data, by Peter Murray-Rust
PDF
Open science
PDF
Do you speak open science
PPTX
Disruptive Communities and Technology
PPTX
Winning Horizon 2020 with Open Science
PPT
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
PDF
Digital Resources for Open Science
PDF
An open science introduction. Olinfer 18, La havana, Cuba 12-14 nov 2018
PPT
What does open science mean? A stakeholder perspective
PDF
KEYNOTE: Erin McKiernan, My pledge to be open (Yeah, how’s that going?)
PDF
The OpenCon Intro to Open Data
PDF
Open Notebook Science: Research in Real-Time
Making Theses USEFUL
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open science and its advocacy
Benefits and practice of open science
Learn to speak open
OpenNotebookScience NOW!
Data and Research Infrastructures and Open Science
OA to-publications-and-data-ibdpan2016
The Culture of Research Data, by Peter Murray-Rust
Open science
Do you speak open science
Disruptive Communities and Technology
Winning Horizon 2020 with Open Science
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Digital Resources for Open Science
An open science introduction. Olinfer 18, La havana, Cuba 12-14 nov 2018
What does open science mean? A stakeholder perspective
KEYNOTE: Erin McKiernan, My pledge to be open (Yeah, how’s that going?)
The OpenCon Intro to Open Data
Open Notebook Science: Research in Real-Time

More from petermurrayrust (20)

PPTX
Omdi2021 Ontologies for (Materials) Science in the Digital Age
PPTX
Open Science Principles and Practice
PPTX
Open Virus Indian Presentation
PPTX
Can machines understand the scientific literature?
PPTX
OpenVirus at OpenPublishingFest
PPTX
Open Virus Indian Presentation
PPTX
Automatic mining of data from materials science literature
PPTX
openVirus - tools for discovering literature on viruses
PPTX
XML for science; its huge potential; but are pubiishers preventing it?
PPTX
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
PPTX
Early Career Reseachers and Open Healthcare
PPTX
Rapid biomedical search
PPTX
Scientific search for everyone
PPTX
Openplant2018 Poster; Semantic searching
PPTX
Extracting science from the archive
PPTX
WikiFactMine: Ontology for Everybody and Everything
PPTX
Paradise Lost and The Right to Read is the Right to Mine
PPTX
Young people in an Age of Knowledge Neocolonialism
PPTX
WikiFactMine: Science for Everyone
PPTX
ContentMining and Copyright at CopyCamp2017
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Open Science Principles and Practice
Open Virus Indian Presentation
Can machines understand the scientific literature?
OpenVirus at OpenPublishingFest
Open Virus Indian Presentation
Automatic mining of data from materials science literature
openVirus - tools for discovering literature on viruses
XML for science; its huge potential; but are pubiishers preventing it?
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers and Open Healthcare
Rapid biomedical search
Scientific search for everyone
Openplant2018 Poster; Semantic searching
Extracting science from the archive
WikiFactMine: Ontology for Everybody and Everything
Paradise Lost and The Right to Read is the Right to Mine
Young people in an Age of Knowledge Neocolonialism
WikiFactMine: Science for Everyone
ContentMining and Copyright at CopyCamp2017

Recently uploaded (20)

PDF
Sujay Rao Mandavilli IJISRT25AUG764 context based approaches to population ma...
PDF
2019UpdateAHAASAAISGuidelineSlideDeckrevisedADL12919.pdf
PDF
Phytogeography- A General Account with spl reference to continental drift, Ag...
PPTX
The Female Reproductive System - Grade 10 ppt
PDF
Chemistry and Changes 8th Grade Science .pdf
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PDF
Energy Giving Molecules bioenergetics again
PPTX
Thyroid disorders presentation for MBBS.pptx
PPTX
complications of tooth extraction.pptx FIRM B.pptx
PDF
Physics of Bitcoin #30 Perrenod Santostasi.pdf
PPTX
Spectroscopic Techniques for M Tech Civil Engineerin .pptx
PPT
ZooLec Chapter 13 (Digestive System).ppt
PPT
Chapter 52 introductory biology course Camp
PDF
Glycolysis by Rishikanta Usham, Dhanamanjuri University
PDF
SOCIAL PSYCHOLOGY chapter 1-what is social psychology and its definition
PDF
TOPIC-1-Introduction-to-Bioinformatics_for dummies
PDF
CHEM - GOC general organic chemistry.ppt
PDF
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
PDF
Sustainable Biology- Scopes, Principles of sustainiability, Sustainable Resou...
PDF
No dilute core produced in simulations of giant impacts on to Jupiter
Sujay Rao Mandavilli IJISRT25AUG764 context based approaches to population ma...
2019UpdateAHAASAAISGuidelineSlideDeckrevisedADL12919.pdf
Phytogeography- A General Account with spl reference to continental drift, Ag...
The Female Reproductive System - Grade 10 ppt
Chemistry and Changes 8th Grade Science .pdf
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
Energy Giving Molecules bioenergetics again
Thyroid disorders presentation for MBBS.pptx
complications of tooth extraction.pptx FIRM B.pptx
Physics of Bitcoin #30 Perrenod Santostasi.pdf
Spectroscopic Techniques for M Tech Civil Engineerin .pptx
ZooLec Chapter 13 (Digestive System).ppt
Chapter 52 introductory biology course Camp
Glycolysis by Rishikanta Usham, Dhanamanjuri University
SOCIAL PSYCHOLOGY chapter 1-what is social psychology and its definition
TOPIC-1-Introduction-to-Bioinformatics_for dummies
CHEM - GOC general organic chemistry.ppt
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
Sustainable Biology- Scopes, Principles of sustainiability, Sustainable Resou...
No dilute core produced in simulations of giant impacts on to Jupiter

Open data and Open Science

  • 1. Open Data Open Notebook Science Peter Murray-Rust, Open Science, Rio, BR, 2014-08-22
  • 2. Retrieved 2014-08-08 Lancet 2011 31 USD For 1 day PMR: Closed Access Means People Die
  • 3. Overview • Most scientific data is lost; costs many billions… • … AND LIVES. • Human problem; lack of vision + active opposition. • Born-open data and Open Notebook Science • Jean-Claude Bradley • Panton Principles and Fellows (OKFN) • Digital Enlightenment or Digital Darkness?
  • 4. Reasons for Open Data/Science • Moral: Closed can be unjust • Ethical: Community norms expect it • Utilitarian: Greater communal good f • Personal: Greater personal benefit
  • 5. RCUK Wellcome ERC NSF FWF… require fully OPEN [at Research Data Alliance, we are entering a new “era of open science”, which will be “good for citizens, good for scientists and good for society”. She explicitly highlighted the transformative potential of open access, open data, open software and open educational resources – mentioning the EU’s policy requiring open access to all publications and data resulting from EU funded research. http://blog.okfn.org/2013/03/21/we-are-entering-an-era-of-open-science-says-eu-vp-neelie-kroes/# sthash.3SWDXDE6.dpuf
  • 6. Scientific and Medical publication (STM)[+] • World Citizens pay $400,000,000,000… • … for research in 1,500,000 articles … • … cost $300,000 each to create … • … $7000 each to “publish” [*]… • … $10,000,000,000 from academic libraries … • … to “publishers” who forbid access to 99.9% of citizens of the world … [+] Figures probably +- 50 % [*] arXiV preprint server costs $7 USD per paper
  • 7. US Taxpayers spend 139 Billion USD / yr on Scientific Research 4 Billion USD on human genome yielded 800 Billion USD and 4 M job-years
  • 8. Bad publication wastes science …three problems—flawed design, non-publication, and poor reporting—together meant >85% of research funds were wasted, a global total loss >100 billion USD per year. [Lancet 2009http://www.thelancet.com/journals/lancet /article/PIIS0140-6736%2809%2960329- 9/fu lltext.] [Even more] waste clearly occurs after publication: from poor access, poor dissemination, and poor uptake of the findings of research. [PLOS Medicine 2014-05-27 DOI: 10.1371/journal.pmed.1001651]
  • 9. Authors don’t deposit data (Ross Mounce)
  • 10. C) What’s the problem with this spectrum? Original thanks to ChemBark Org. Lett., 2011, 13 (15), pp 4084–4087
  • 11. After AMI2 processing….. … AMI2 has detected a square
  • 13. PM-R writes about how Open gave him 5 jobs August 2014 Marcus Hanwell http://opensource.com/tags/open-science Ross Mounce
  • 14. Traditional Research and Publication “Lab” work paper/th esis Write rewrite Re-experiment process “belongs” to publisher publish ??? Validation?? DATA output “belongs” to publisher Walls of academia
  • 15. Free/Open Software Development CODE REPOSITORY World community CODE validate rewrite CODE fork CODE Re-use CODE Re-use Github, BitBucket StackOverflow, Apache inspires OSI NO WALLS BORN-OPEN-SOURCE Example: ContentMine at http://github.com/ContentMine/quickscrape
  • 16. BornOS commits in 4 hours
  • 17. Continuous integration in PMR group does the code still work?
  • 19. Restrictions on Re-use of Crystallographic data NOTE: The CCDC is based on data contributed by scientists as part of publication and validation
  • 20. Elsevier wants to control Open Data ViceChancellor Cambridge [asked by Michelle Brook]
  • 21. Licences destroy Content Mining WE WALKED OUT • Brit Library • JISC • RLUK • OKFN • … • Ross Mounce • PM-R STM Publishers Licence 2012_03_15_Sample_Licence_Text_Data_Mining.pdf (Summary: PMR has NO rights) • [cannot publish to: ] “libraries, repositories, or archives” • [cannot] “Make the results of any TDM Output available on an externally facing server or website” • “Subscriber shall pay a […] fee” Heather Piwowar: “negotiating with publishers [made me physically ill]”
  • 22. Human Genome Project https://en.wikipedia.org/wiki/Bermuda_Principles • Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours). • Immediate publication of finished annotated sequences. • Aim to make the entire sequence freely available in the public domain for both research and development in order to maximise benefits to society.
  • 23. Panton Principles for Open Data in science(2010) • PUBLISH YOUR DATA OPENLY • …make an explicit and robust statement of your wishes. • Use a recognized waiver or license that is appropriate for data. • open as defined by the Open Knowledge/Data Definition (… NOT non-commercial) • Explicit dedication of data … into the public domain via PDDL or CCZero Peter Murray-Rust, Cameron Neylon, Rufus Pollock, John Wilbanks
  • 27. Open notebook science is the practice of making the entire primary record of a research project publicly available online as it is recorded. (WP) Jean-Claude Bradley was a chemist who actively promoted Open Science in chemistry,… He coined the term Open Notebook Science. … A memorial symposium was held July 14, 2014 at Cambridge University, UK.[9]
  • 29. Open Source software inspires Open Science Jean-Claude Bradley 2006
  • 30. Open Notebook Science, ONS Jean-Claude Bradley 2006
  • 34. Volunteer community in chemistry: Open Data/Source/Standards
  • 35. Award of Blue Obelisk Jean-Claude Bradley Egon Willighagen
  • 36. Realising OpenNotebookScience When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. http://en.wikipedia.org/wiki/Clarke's_three_laws Open Inspirations (some are zero budget) • Open Street Map • Journal Of Machine Learning Research • Blue Obelisk • arXiV • Protein Data Bank • Galaxy Zoo
  • 37. Self-benefit drives Open • I put my data/papers in a repository because I HAVE TO • I commit my code to GitHub because I WANT TO: – It’s safe – It’s validated – I know it works – There are tools to search it – Other coders improve and add to it
  • 39. The Polymath project Tim Gowers and the world http://polymathprojects.org/2013/11/04/polymath9-pnp/#comments http://gowers.wordpress.com/2013/11/03/dbd1-initial-post/
  • 40. Open Notebook Science TOOLS Open engineered repository INSTRUMENT World community validate merge MODEL CODE DATA DATA knowledge calibrate Machines and humans Working together Problems are solved communally; Nothing is needlessly duplicated; “publication“ is continuous ; data are SEMANTIC
  • 42. Open Notebook Science TOOLS Open engineered repository INSTRUMENT World community validate merge MODEL CODE DATA DATA knowledge calibrate Machines and humans Working together Problems are solved communally; Nothing is needlessly duplicated; “publication“ is continuous ; data are SEMANTIC
  • 43. Benefits of OpenNotebookScience • Fraud is virtually impossible • Priority and credit are algorithmically established • It is difficult to be scooped… • Data and ideas cannot be lost • The world discovers you and you the world • Time to announcement is much advanced (?years) • The “publication process” is vastly less onerous • … but others may use your work in other ways
  • 44. http://www.budapestopenaccessinitiative.org/read … an unprecedented public good. … … completely free and unrestricted access to [peer-reviewed literature] by all scientists, scholars, teachers, students, and other curious minds. … …Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge. (Budapest Open Access Initiative, 2003)
  • 45. Open Notebook Science TOOLS ONS repository World community INSTRUMENT validate merge MODEL CODE DATA DATA knowledge calibrate Machines and humans working together CC-BY Problems are solved communally; Nothing is needlessly duplicated; “publication“ is continuous and immediate
  • 46. Traditional Research and Publication “Lab” work paper/th esis Write rewrite Re-experiment publish ??? Validation?? DATA output “belongs” to publisher Is there anything we can do with this?
  • 47. Open Notebook Science TOOLS ONS repository World community INSTRUMENT validate merge MODEL CODE DATA DATA knowledge calibrate Machines and humans working together CC-BY/0 Problems are solved communally; Nothing is needlessly duplicated; “publication“ is continuous and immediate

Editor's Notes