Andrew Lang
Professor of Mathematics
Oral Roberts University
February 17, 2014
OSU Research Week
Open Notebooks Science
Open Notebooks Science
Open Notebooks Science
-Cameron Neylon
Open Notebooks Science
Open Notebooks Science
Open Notebooks Science
Open Notebooks Science
Open Notebooks Science
Open Notebooks Science
Eight committees investigated the allegations and
published reports, finding no evidence of fraud or scientific
misconduct.
However, the reports* called on the scientists to avoid any
such allegations in the future by taking steps to regain
public confidence in their work, for example by opening up
access to their supporting data, processing methods and
software, and by promptly honouring freedom of
information requests.
* Archana Venkatraman, "Data Without the Doubts". Information World Review
Andrew Wakefield’s study,
linked the measles, mumps
and rubella vaccine to autism.
Vaccination rates in the
developed world plummeted
after the study’s publication
and a heated anti-vaccination
movement persists today.
http://www.cfr.org/interactives/GH_Vaccine_Map/#map
Open Notebooks Science
?
Science has lost its way, at a big cost to humanity
Researchers are rewarded for splashy findings, not for double-checking
accuracy. So many scientists looking for cures to diseases have been
building on ideas that aren't even true.
A few years ago, scientists at the Thousand Oaks biotech
firm Amgen set out to double-check the results of 53 landmark papers in
their fields of cancer research and blood biology.
The idea was to make sure that research on which Amgen was spending
millions of development dollars still held up. They figured that a few of
the studies would fail the test — that the original results couldn't be
reproduced because the findings were especially novel or described
fresh therapeutic approaches.
But what they found was startling: Of the 53 landmark papers, only six
could be proved valid.
http://www.latimes.com/business/la-fi-hiltzik-20131027,0,1228881.column#axzz2ix1w9zGf
Open Notebooks Science
Open Notebooks Science
A special challenge for science writers covering research
today arises from science’s growing credibility problem. It
stems from the cumulative effect of errors and exaggerations
that has fueled a recent rise in retractions, misconduct, and
fraud among peer-reviewed researchers.
For reporters covering major scientific developments – from
the search for alien life and genomics, to particle physics,
climate change and cancer — it can be difficult to distinguish
error from fraud, sloppiness from deception, eagerness from
greed or, increasingly, scientific conviction from partisan
passion. Findings in fields from climate change to vaccines
can also be deceptively cherry-picked in service of a political
cause.
trust
evidence
trust
documentation
trust
confidence
trust
reproducibility
Anything produced is released under a CC0 license:
Open Data, Open Access, Open Source.
Open Notebooks Science
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
no insider information
reusability
reproducibility
transparency
no insider information
reusability
reproducibility
transparency
no insider information
reusability
reproducibility
transparency
no insider information
reusability
reproducibility
transparency
no insider information
reusability
reproducibility
transparency
Open Notebooks Science
Open Drug
Discovery for
Neglected
Diseases
malaria
schistosomiasis
gram positive bacteria
breast cancer
Open Notebooks Science
Drugs for neglected diseases
need to be…
cheap and…
easy to make.
docking
combinatorial
library
synthesis
solvent
selection
recrystallization
biological
assay
solubility
models
solubility data
melting point
models
melting point
data
The big picture
docking
combinatorial
library
synthesis
solvent
selection
recrystallization
biological
assay
solubility
models
solubility data
melting point
models
melting point
data
Let’s focus
Early models, before 2005 were…
…specialized
1979 Martin – disubstituted benzenes
1987 Hanson – normal alkanes
1988 Needham – normal and branched alkanes
1990 Abramowitz – non-hydrogen bonded benzenes
1991 Dearden – anilines
1993 Katritzky – aldehydes, amines, and ketones
1994 Simamora – rigid aromatic
1996 Charlton – alkanes
1996 Katritzky – pyridines
1999 Zhao – aliphatic
2001 Chickos – homologous series
2003 Bergstrom – druglike (N = 277, r2 = 0.54)
In 2005…
…everything changed
MDPI - cheminformatics.org
Karthikeyan 2005 N = 4173, r2 = 0.65
PHYSPROP
Clark 2005 N = 6257, r2 = 0.61
Recent melting point models
use these datasets…
…never reproducing r2 = 0.65 (0.47 – 0.56)
Even though [a] melting point
can be measured accurately, its
prediction has been a
notoriously difficult problem.
We began measuring, collecting, and
curating melting points in the Fall of 2010
Jean-Claude Bradley’s
Chemical Information Retrieval
Course at Drexel
567 curated and referenced measurements from
Fall 2010 Chemical Information Retrieval course
Most popular data sources…
…chemical vendors
Alfa Aesar donates ~13,000
melting points to the public domain
collection
curation
modelingvalidation
measurement
ONS
melting point
workflow
Collection: Open Data
source data points curated values source year data type
Bell 2483 1631 1995 donated-CC0
Bergstrom 277 277 2003 open
MDPI-Karthikeyan 4450 4084 2005 open
Hughes 287 262 2008 open
Oxford-MSDS 3217 1481 2010 open
Drugbank 875 875 2011 open
Griffiths 3757 278 2011 donated-CC0
Alfa Aesar 12986 8739 2011 donated-CC0
PHYSPROP 11645 9694 2011 donated-CC0
ONS 471 471 2012 open
27792 curated measurements
for 19515 compounds
Curation is…
…lots of hard, tedious work
(Jean-Claude Bradley and Antony Williams)
Antony Williams – RSC ChemSpider
Inconsistencies and SMILES problems
within the “high trust level” MDPI dataset
PHYSPROP Structure Errors (Incorrect Valence)
2315 out of 43543 contained pentavalent nitrogens
PHYSPROP Errors: Structure displayed is for the neutral
compound dopamine but the associated CAS Number and
chemical name in the file are for the hydrobromide salt.
unit errors: Kelvin/Celsius, Fahrenheit/Celsius
bad SMILES (non-rendering, hypervalency)
salts associated with SMILES for free base
using boiling point for melting point
Some melting points can’t be resolved
only with literature: 4-benzyltoluene
Open lab notebook page
measuring the melting point of 4-benzyltoluene
Melting
Point
Model
CDK
descriptor calculator
R
statistical computing
melting point data
use this model
compounds
doubleplusgood
single
CDK
descriptor calculator
R
statistical computing
Melting
Point
Model
Open Notebooks Science
Straight chain carboxylic acids from 1 to 10 carbons
Straight chain alcohols from 1 to 10 carbons
Comparison of model with
double+ validated measurements
Cyclic primary amines from 3 to 6 carbons
cyclobutylamine flagged for measurement
only single source available
Publication of double+ validated
melting point dataset
…as a preprint
Publication of double+ validated
melting point dataset
…as a book
Data and model deployed…
…on the web
web service
…in Google spreadsheets
…as an app
 Can the solvents used to recrystallize compounds in
organic teaching labs be improved?
 Trans-dibenzalacetone
 Aldol condensation between two molecules of
benzaldehyde and one molecule of acetone
[Matthew McBride: Undergraduate Research Assistant - Drexel]
 First recrystallized in ethyl acetate in 1906: Straus
and Ecker, Ber. 39, 2988 (1906)
 Recrystallized in ethyl acetate in Organic Syntheses
 Recommended recrystallization solvent: ethyl acetate.
(http://classes.kvcc.edu/chm230/mixed%20aldol%20condensation.pdf
(http://www.xula.edu/chemistry/documents/orgleclab/Aldol_notes.pdf)
Enter compound identification and desired parameters
How does it work?
1. Look up the solvent boiling point
2. Look up the room temperature solubility or predict it via measured or
predicted Abraham descriptors
3. Look up the solute melting point or predict it via a model
4. Use the melting point and the solubility at room temperature to predict
the solubility at boiling
5. Calculate the predicted recrystallization yield
Lists solvents and their predicted recrystallization yield.
Prediction is generated by the temperature dependent
solubility curves.
 ethyl acetate (predicted yield of 72%) vs ethanol
(predicted yield of 93%)
 ethyl acetate
 ethanol
0.09M
1.1M
0.62M
2.06M
Dibenzalacetone derivatives docking against tubulin
(paclitaxel site)
 Derivatives of dibenzalacetone may be synthesized
by altering the aldehyde used
 From a library of derivatives, the following
compound was the top hit for the docking site of
Taxol
 Uses phenanthrene-9-carboxaldehyde
 Perform a Reaxys search to determine availability
of synthesis procedures
 No results
[Matthew McBride: Undergraduate Research Assistant - Drexel]
 Used methanol and benzene
 Melting Point: 264-265°C
(http://usefulchem.wikispaces.com/EXP286)
[Matthew McBride: Undergraduate Research Assistant - Drexel]
trust
reproducibility
open notebook science
Acknowledgements
Jean-Claude Bradley (Drexel)
Cameron Neylon (Advocacy Director at PLOS)
Antony Williams (RSC ChemSpider)
Drexel research assistants: Evan Curtin and Matthew
McBride
ORU research assistants: David Bulger, Daryl Charron,
Lizzie Clark, Lacey Condron, Samantha Gaines, Alejandro
Hernandez, Maria Hernandez, Jesse Patsolic, and
Matthew Wilson

More Related Content

PPTX
The Revelation of Jean-Claude Bradley
PPT
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
PDF
Fixing Science: The Replicability Crisis
PDF
Reproducibility, open access, open science
PPT
Garcia Ethics 2016
PDF
David Tyrpak CV
PPTX
Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...
PDF
OpenTrials - Cochrane Colloquium 2016
The Revelation of Jean-Claude Bradley
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Fixing Science: The Replicability Crisis
Reproducibility, open access, open science
Garcia Ethics 2016
David Tyrpak CV
Laurie Goodman at #CSE2014: Reproducibility: It's going to cost you time and ...
OpenTrials - Cochrane Colloquium 2016

Similar to Open Notebooks Science (20)

PPTX
The Role of Trust in Science at SLA 2011
PPTX
modeling melting points
PPT
NBCC Open Notebook Science Talk
PPT
IJCAI09 Open Notebook Science talk
PPT
Peer Review and Science2.0
PPT
PPTX
Bradley SLA Talk on Open Melting Point Collections
PPTX
Bradley Opal 2011
PPTX
NASA Open Notebook Science Talk
PPT
NITLE Open Notebook Science Talk
PPT
BrightTALK Open Notebook Science
PPT
Bradley Open Notebook Science Georgia Tech OA week
PPTX
ACRL Trust in Science Talk
PPT
ACRL Open Notebook Science talk
PPT
OpenSciNY Open Notebook Science
PPTX
Open Notebook Science HUBzero 2011
PPTX
CINF 2012 talk Recrystallization App
PPT
Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...
PPTX
IGERT Drexel Open Notebook Science Talk
PPTX
Bradley Open Notebook Science ACSfall2012
The Role of Trust in Science at SLA 2011
modeling melting points
NBCC Open Notebook Science Talk
IJCAI09 Open Notebook Science talk
Peer Review and Science2.0
Bradley SLA Talk on Open Melting Point Collections
Bradley Opal 2011
NASA Open Notebook Science Talk
NITLE Open Notebook Science Talk
BrightTALK Open Notebook Science
Bradley Open Notebook Science Georgia Tech OA week
ACRL Trust in Science Talk
ACRL Open Notebook Science talk
OpenSciNY Open Notebook Science
Open Notebook Science HUBzero 2011
CINF 2012 talk Recrystallization App
Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...
IGERT Drexel Open Notebook Science Talk
Bradley Open Notebook Science ACSfall2012

More from Andrew Lang (10)

PPTX
Lewis, Science, Religion, and Aliens
PDF
Higher Education
PPTX
Quantum Psychology
PPT
Lewis' view of Venus in Perelandra
PPT
Written rummage
PPT
I'm a professor
PPT
Lewis' view of Mars in out of the silent planet
PPT
Sortase A Inhibition By Ugi Products (Complex)
PPTX
Chemistry in Second Life
PPTX
Why the Universe appears designed and why it doesn’t have to be
Lewis, Science, Religion, and Aliens
Higher Education
Quantum Psychology
Lewis' view of Venus in Perelandra
Written rummage
I'm a professor
Lewis' view of Mars in out of the silent planet
Sortase A Inhibition By Ugi Products (Complex)
Chemistry in Second Life
Why the Universe appears designed and why it doesn’t have to be

Recently uploaded (20)

PDF
BCKIC FOUNDATION_MAY-JUNE 2025_NEWSLETTER
PPTX
1. (Teknik) Atoms, Molecules, and Ions.pptx
PPTX
flavonoids/ Secondary Metabolites_BCH 314-2025.pptx
PDF
Sujay Rao Mandavilli Degrowth delusion FINAL FINAL FINAL FINAL FINAL.pdf
PDF
2024_PohleJellKlug_CambrianPlectronoceratidsAustralia.pdf
PDF
The scientific heritage No 167 (167) (2025)
PDF
LEUCEMIA LINFOBLÁSTICA AGUDA EN NIÑOS. Guías NCCN 2020-desbloqueado.pdf
PDF
Microplastics: Environmental Impact and Remediation Strategies
PDF
TOPIC-1-Introduction-to-Bioinformatics_for dummies
PDF
Glycolysis by Rishikanta Usham, Dhanamanjuri University
PDF
Pharmacokinetics Lecture_Study Material.pdf
PDF
naas-journal-rating-2025 for all the journals
PPTX
Chapter 7 HUMAN HEALTH AND DISEASE, NCERT
PDF
Physics of Bitcoin #30 Perrenod Santostasi.pdf
PPTX
ELS 2ND QUARTER 2 FOR HUMSS STUDENTS.pptx
PDF
Pentose Phosphate Pathway by Rishikanta Usham, Dhanamanjuri University
PPTX
The Electromagnetism Wave Spectrum. pptx
PPT
plant growth and development after seeding plant .ppt
PDF
CoSEE-Cat:AComprehensiveSolarEnergeticElectronevent Catalogueobtainedfromcomb...
PPTX
INTRODUCTION TO CELL STRUCTURE_LESSON.pptx
BCKIC FOUNDATION_MAY-JUNE 2025_NEWSLETTER
1. (Teknik) Atoms, Molecules, and Ions.pptx
flavonoids/ Secondary Metabolites_BCH 314-2025.pptx
Sujay Rao Mandavilli Degrowth delusion FINAL FINAL FINAL FINAL FINAL.pdf
2024_PohleJellKlug_CambrianPlectronoceratidsAustralia.pdf
The scientific heritage No 167 (167) (2025)
LEUCEMIA LINFOBLÁSTICA AGUDA EN NIÑOS. Guías NCCN 2020-desbloqueado.pdf
Microplastics: Environmental Impact and Remediation Strategies
TOPIC-1-Introduction-to-Bioinformatics_for dummies
Glycolysis by Rishikanta Usham, Dhanamanjuri University
Pharmacokinetics Lecture_Study Material.pdf
naas-journal-rating-2025 for all the journals
Chapter 7 HUMAN HEALTH AND DISEASE, NCERT
Physics of Bitcoin #30 Perrenod Santostasi.pdf
ELS 2ND QUARTER 2 FOR HUMSS STUDENTS.pptx
Pentose Phosphate Pathway by Rishikanta Usham, Dhanamanjuri University
The Electromagnetism Wave Spectrum. pptx
plant growth and development after seeding plant .ppt
CoSEE-Cat:AComprehensiveSolarEnergeticElectronevent Catalogueobtainedfromcomb...
INTRODUCTION TO CELL STRUCTURE_LESSON.pptx

Open Notebooks Science

Editor's Notes

  • #85: http://usefulchem.wikispaces.com/D-EXP022 From a library of derivatives, it was the hop hit for the docking site of taxol