SlideShare a Scribd company logo
Integration of oreChemwith the eCrystals repository for crystal structuresMark Borkum, Simon Coles and Jeremy Frey15 September 2010
OverviewMotivationImplementationDiscussion and Summary2
Current Practice in CrystallographyCrystallography data is highly structuredThe de facto standard adopted by the community is the CIF (Crystallographic Information File)Relatively few crystal structures are openly published3http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
Open Access JournalsAdvantages:Rapid publicationHighly citedData is available to downloadDisadvantages:Electronic onlyNot all data is of primary importance to the underlying chemistryBy-products, unexpected results, tracking reactions, etc.4
Crystallography and Fraud5
The eCrystals FederationJISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation servicesLed by the UK National Crystallography Service (NCS)With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics6
eCrystals – University of SouthamptonLocated @ http://ecrystals.chem.soton.ac.ukArchive for crystal structures that are generated by:Southampton Chemical Crystallography GroupUK National Crystallography Service (NCS)Modified version of EPrints 3.1OAI-PMH compliantExtensible platform (with plug-ins architecture)7
What is an eCrystal?“all the fundamental and derived data resulting from a single crystal X-ray structure determination”“the information supplied should enable any reader to check the reliability and validity”8http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
The Scientific Web9
The Data Deluge10In Haiku:Lots of producers;Generating more datathan ever before.40 years ago, a PhD student would determine 3 structures over the entire course of their study!The Great Wave off Kanagawa by Katsushika Hokusai
ProvenanceThe 7 W’s [Goble 2002]Who, What, Where,  Why, When, Which, & (W)HowThe Why aspect is usually ignored Rational, intent, hypothesis, protocol, methodology, workflow, etc.11“Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.”Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
“In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra)12
Why “Why” MattersIt is the reason for the data’s existenceIt gives us the ability to interpret the data in the correct contextIt allows us to align the data with the big picture13http://www.myexperiment.org/workflows/16.html
The oreChem Core OntologyDescribes three concepts:The methodology (planned method) of a scientific experimentThe enactment of methodologiesThe provenance of realised artefacts14
Methodology (Planned Method)The “plan” is modelled as a directed graphTwo node types:Plan Stagedescription of an activity that will be enactedPlan Objectdescription of an artefact that will be realised15
Enactment (of a Methodology)Each “run” (of a plan) is modelled as a directed graph Two node types:Stagedescription of an activity that has been enactedObjectdescription of an artefact that has been realised16
ProvenanceProspectiveThe plan describes a scientific experiment that will be enactedRetrospectiveThe run describes a scientific experiment that hasbeen enactedEvery ‘run thing’ is linked to exactly one ‘plan thing’17
oreChem Plug-in for eCrystalsThree components:orechem:Plan (the eCrystals methodology) “eCrystalorechem:Run” mapping “orechem:Run provenance graph” pipeline18
The eCrystals Methodology19BeforeAfter
Example: eCrystal #643BeforeAfter20
SPARQL RequestPREFIX orechem:   <http://www.openarchives.org/2010/05/24-orechem-ns#>PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reportedWHERE {  ?run a orechem:Run ;orechem:hasPlanecrystals:Ecrystals ;orechem:containsObject ?raw ;orechem:containsObject ?derived ;orechem:containsObject ?reported .  ?raw a orechem:File ;orechem:hasPlanObjectecrystals:HKL .  ?derived a orechem:File ;orechem:derivedFrom ?raw .  ?reported a orechem:File ;orechem:hasPlanObjectecrystals:CIF ;orechem:derivedFrom ?derived .}21
SPARQL Response (for eCrystal #643)22?run?reported?derived?raw
Summary<summary/>23
AcknowledgmentsoreChem is funded by Microsoft External ResearcheCrystals is funded by both EPSRC and JISCThe oreChem project team:Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden.24
25#ahm2010#ahm#ahm10#pch2010http://pegasus.chem.soton.ac.uk#ahm2010 until 11am Wed 15 Sept 2010

More Related Content

PPTX
oreChem: Planning and Enacting Chemistry on the Semantic Web
Mark Borkum
 
PPT
Webtracks at JISC Managing Research Data Meeting
Cameron Neylon
 
PDF
Using Neo4j for exploring the research graph connections made by RD-Switchboard
amiraryani
 
PDF
Implementing a VO archive for datacubes of galaxies
Jose Enrique Ruiz
 
PPTX
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
Sahidul Islam
 
PDF
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
Aksw Group
 
PDF
OntoMaven Repositories and OMG API4KP
Aksw Group
 
PDF
Research Objects in Wf4Ever
Jose Enrique Ruiz
 
oreChem: Planning and Enacting Chemistry on the Semantic Web
Mark Borkum
 
Webtracks at JISC Managing Research Data Meeting
Cameron Neylon
 
Using Neo4j for exploring the research graph connections made by RD-Switchboard
amiraryani
 
Implementing a VO archive for datacubes of galaxies
Jose Enrique Ruiz
 
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
Sahidul Islam
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
Aksw Group
 
OntoMaven Repositories and OMG API4KP
Aksw Group
 
Research Objects in Wf4Ever
Jose Enrique Ruiz
 

Viewers also liked (8)

PPT
Change
frank tan
 
PPS
新年
frank tan
 
PPT
The Power Of Multiplication
frank tan
 
PPS
Soo presentation
frank tan
 
PDF
Presentatie webrichtlijnen
Tjitte Folkertsma
 
PPTX
FAS: Shop2market over Conversie Attributie
Tjitte Folkertsma
 
PPT
Peter Sinnige - webvideo
Tjitte Folkertsma
 
PPSX
New Excited Info
frank tan
 
Change
frank tan
 
新年
frank tan
 
The Power Of Multiplication
frank tan
 
Soo presentation
frank tan
 
Presentatie webrichtlijnen
Tjitte Folkertsma
 
FAS: Shop2market over Conversie Attributie
Tjitte Folkertsma
 
Peter Sinnige - webvideo
Tjitte Folkertsma
 
New Excited Info
frank tan
 
Ad

Similar to Integration of oreChem with the eCrystals repository for crystal structures (20)

PPT
The eCrystals Federation
ManjulaPatel
 
PPT
Curation and Preservation of Crystallography Data
ManjulaPatel
 
PPTX
Pearce Element Ratio (PER).pptx
Sujan Pandey
 
PDF
E bank uk_linking_research_data_scholarly
Luisa Francisco
 
PPTX
Role Of Major Elements In Igneous Rocks.pptx
KAVERI
 
PPTX
Applied Biochemistry
christanantony
 
PPTX
BCA Education and Outreach
Simon Coles
 
PPT
Core Concepts: Periodic Table Online Preview
rosenpub
 
PDF
RPardo_CAC_2010_final
Eduardo Alonso-Gil
 
PPTX
OREChem Services and Workflows
marpierc
 
PPT
Chapter 1 (JF302)
Mohd Nurilhadi Darmi
 
PPTX
CC: Chemistry Online Preview
rosenpub
 
PPTX
Acs denver dirks potenzone 30 aug2011
Rudy Potenzone
 
PPTX
Introduction to Crystallography
NIPIN K P
 
PPT
Ch College Mineral Assemblages.ppt
ssuser0e171d
 
PPTX
An Introduction to Crystallography
Geology Department, Faculty of Science, Tanta University
 
PPT
To OO or not to OO? Revelations from defining an ontology for an archaeologic...
Paul Cripps
 
PDF
PHYSICAL METALLURG.pdf
MMruthyunjaya
 
PDF
Saulius Gražulis The Crystalography Open Database
Aidis Stukas
 
PDF
Domain Science and Engineering: A Foundation for Software Development 1st Edi...
zuddaskiboba
 
The eCrystals Federation
ManjulaPatel
 
Curation and Preservation of Crystallography Data
ManjulaPatel
 
Pearce Element Ratio (PER).pptx
Sujan Pandey
 
E bank uk_linking_research_data_scholarly
Luisa Francisco
 
Role Of Major Elements In Igneous Rocks.pptx
KAVERI
 
Applied Biochemistry
christanantony
 
BCA Education and Outreach
Simon Coles
 
Core Concepts: Periodic Table Online Preview
rosenpub
 
RPardo_CAC_2010_final
Eduardo Alonso-Gil
 
OREChem Services and Workflows
marpierc
 
Chapter 1 (JF302)
Mohd Nurilhadi Darmi
 
CC: Chemistry Online Preview
rosenpub
 
Acs denver dirks potenzone 30 aug2011
Rudy Potenzone
 
Introduction to Crystallography
NIPIN K P
 
Ch College Mineral Assemblages.ppt
ssuser0e171d
 
To OO or not to OO? Revelations from defining an ontology for an archaeologic...
Paul Cripps
 
PHYSICAL METALLURG.pdf
MMruthyunjaya
 
Saulius Gražulis The Crystalography Open Database
Aidis Stukas
 
Domain Science and Engineering: A Foundation for Software Development 1st Edi...
zuddaskiboba
 
Ad

Recently uploaded (20)

PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
The Future of Artificial Intelligence (AI)
Mukul
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 

Integration of oreChem with the eCrystals repository for crystal structures

  • 1. Integration of oreChemwith the eCrystals repository for crystal structuresMark Borkum, Simon Coles and Jeremy Frey15 September 2010
  • 3. Current Practice in CrystallographyCrystallography data is highly structuredThe de facto standard adopted by the community is the CIF (Crystallographic Information File)Relatively few crystal structures are openly published3http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
  • 4. Open Access JournalsAdvantages:Rapid publicationHighly citedData is available to downloadDisadvantages:Electronic onlyNot all data is of primary importance to the underlying chemistryBy-products, unexpected results, tracking reactions, etc.4
  • 6. The eCrystals FederationJISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation servicesLed by the UK National Crystallography Service (NCS)With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics6
  • 7. eCrystals – University of SouthamptonLocated @ http://ecrystals.chem.soton.ac.ukArchive for crystal structures that are generated by:Southampton Chemical Crystallography GroupUK National Crystallography Service (NCS)Modified version of EPrints 3.1OAI-PMH compliantExtensible platform (with plug-ins architecture)7
  • 8. What is an eCrystal?“all the fundamental and derived data resulting from a single crystal X-ray structure determination”“the information supplied should enable any reader to check the reliability and validity”8http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
  • 10. The Data Deluge10In Haiku:Lots of producers;Generating more datathan ever before.40 years ago, a PhD student would determine 3 structures over the entire course of their study!The Great Wave off Kanagawa by Katsushika Hokusai
  • 11. ProvenanceThe 7 W’s [Goble 2002]Who, What, Where, Why, When, Which, & (W)HowThe Why aspect is usually ignored Rational, intent, hypothesis, protocol, methodology, workflow, etc.11“Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.”Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
  • 12. “In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra)12
  • 13. Why “Why” MattersIt is the reason for the data’s existenceIt gives us the ability to interpret the data in the correct contextIt allows us to align the data with the big picture13http://www.myexperiment.org/workflows/16.html
  • 14. The oreChem Core OntologyDescribes three concepts:The methodology (planned method) of a scientific experimentThe enactment of methodologiesThe provenance of realised artefacts14
  • 15. Methodology (Planned Method)The “plan” is modelled as a directed graphTwo node types:Plan Stagedescription of an activity that will be enactedPlan Objectdescription of an artefact that will be realised15
  • 16. Enactment (of a Methodology)Each “run” (of a plan) is modelled as a directed graph Two node types:Stagedescription of an activity that has been enactedObjectdescription of an artefact that has been realised16
  • 17. ProvenanceProspectiveThe plan describes a scientific experiment that will be enactedRetrospectiveThe run describes a scientific experiment that hasbeen enactedEvery ‘run thing’ is linked to exactly one ‘plan thing’17
  • 18. oreChem Plug-in for eCrystalsThree components:orechem:Plan (the eCrystals methodology) “eCrystalorechem:Run” mapping “orechem:Run provenance graph” pipeline18
  • 21. SPARQL RequestPREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#>PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reportedWHERE { ?run a orechem:Run ;orechem:hasPlanecrystals:Ecrystals ;orechem:containsObject ?raw ;orechem:containsObject ?derived ;orechem:containsObject ?reported . ?raw a orechem:File ;orechem:hasPlanObjectecrystals:HKL . ?derived a orechem:File ;orechem:derivedFrom ?raw . ?reported a orechem:File ;orechem:hasPlanObjectecrystals:CIF ;orechem:derivedFrom ?derived .}21
  • 22. SPARQL Response (for eCrystal #643)22?run?reported?derived?raw
  • 24. AcknowledgmentsoreChem is funded by Microsoft External ResearcheCrystals is funded by both EPSRC and JISCThe oreChem project team:Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden.24