SlideShare a Scribd company logo
Making Linked Data SPARQL with the
InterMine Biological Data Warehouse
9th
International SWAT4LS Conference
5-8 December 2016 Amsterdam
Justin Clark-Casey, Software Engineer @InterMine
Daniela Butano, Software Engineer @InterMine
Today’s Talk
● InterMine
● MOLD (Model Organism Linked
Database)
● Providing RDF and SPARQL from all
Mines: The challenges ahead
What is InterMine?
● A biological data warehouse.
● Initially for Drosophilia.
● But with a flexible and extensible data model.
● Now used as infrastructure by many model organism
(MOD) and other life sciences projects.
● Open-source, continuous development for over 15 years.
● 7 software engineers, 1 biologist, 1 PI.
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
InterMine Buildtime Architecture
InterMine Runtime Architecture
Extracting Data
Getting data via a script (Python example)
from intermine.webservice import Service
service = Service("http://www.flymine.org/flymine/service")
query = service.new_query("Gene")
query.add_view("name", "proteins.uniprotName")
query.add_constraint("name", "=", "zerknullt", code = "A")
for row in query.rows():
print row["name"], row["proteins.uniprotName"]
Results
zerknullt ZEN1_DROME
zerknullt B4LZ31_DROVI
Great, but...
These export mechanisms have served us well and continue to do so.
But...
● Query requires use of a bespoke language (InterMine PathQuery).
● Exported data may require transformation.
● Whole biological objects only have a human view.
A core aim of InterMine is to make its data provision FAIR, we are always looking for
ways to facilitate this...
MOLD
● Created by the Dumontier Lab in Stanford.
● Model Organism Linked Database
● Create a LOD of model organism data.
○ With links to ontologies and other LOD (e.g. Bio2RDF).
● Publish tools to access and explore the data.
InterMine RDFization Process
Example of Generated RDF
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
What next?
● Incorporate and extend MOLD components to allow any
mine operator to
○ Generate and publish RDF dumps.
○ Make biological objects available as RDF resources.
○ Provide a SPARQL endpoint.
○ Explore emerging approaches such as Triple Pattern Fragments.
● Mine operators may not be software engineers
○ Software and processes need to be consumable.
Data Challenge : Stable URIs
● Navigation InterMine URIs do not have a stable ID
○ http://www.flymine.org/flymine/report.do?id=1007741
● InterMine ‘shareable’ URIs are better but still have issues
○ http://www.flymine.org/flymine/portal.do?class=Gene&externalids=FBgn00
04053
● Persistence in the face of
○ Name changes
○ Scientific changes
Data Challenge : Ontologies
● As of now, InterMine has a data model with no attached ontologies.
○ Sequence Ontology is a partial exception.
● InterMine-RDFizer generates a vocabulary automatically for the data model.
● But we want to emit RDF that uses existing ontologies
○ Gene Ontology
○ FALDO
○ etc.
● Issues
○ Need a mechanism to attach arbitrary ontologies to the core data model and any extensions.
○ Which ontologies?
○ How do we facilitate user selection?
Tech Challenge : Performance
● Other projects (e.g. MODs) rely on us.
● Questions around SPARQL performance.
● Proposed Solution: Adapt MOLD’s Dockerization approach with a separate
triplestore for data.
○ Pros:
■ Easier deployment.
■ Performance issues can be contained.
■ Decoupled iteration.
○ Cons:
■ Multiple systems.
■ Maturity of Docker?
Maxime Déraspe Department of Molecular Medicine, Université Laval, Québec, CA
Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, US
Gail Binkley Department of Genetics, Stanford University, Stanford, US
Daniela Butano Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
Justin Clark-Casey Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
Kalpana Karra Department of Genetics, Stanford University, Stanford, US
Julie Sullivan Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
J. Michael Cherry Department of Genetics, Stanford University, Stanford, US
Jacques Corbeil Department of Molecular Medicine, Université Laval, Québec, CA
Gos Micklem Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
Michel Dumontier Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, US
THANKYOU!
Justin Clark-Casey
justincc@intermine.org
@justincc
Daniela Butano
daniela@intermine.org
MOLD
http://mo-ld.org/
InterMine
http://intermine.org
@intermineorg
Presentation licensed under Creative
Commons 4.0 Attribution International
goo.gl/IsjPzh

More Related Content

PPT
Importing life science at a into Neo4j
Simon Jupp
 
PPTX
Ontologies neo4j-graph-workshop-berlin
Simon Jupp
 
PPTX
Open Science Data Repository - the platform for materials research
Valery Tkachenko
 
PDF
Capturing the context: one small(ish step for modellers, one giant leap for m...
FAIRDOM
 
PPTX
Building a repository of biomedical ontologies with Neo4j
Simon Jupp
 
PDF
Federated data stores using semantic web technology
Steve Ray
 
PPTX
Experiences to learn from the MS proteomics field
Juan Antonio Vizcaino
 
PPTX
Reproducible and citable data and models: an introduction.
FAIRDOM
 
Importing life science at a into Neo4j
Simon Jupp
 
Ontologies neo4j-graph-workshop-berlin
Simon Jupp
 
Open Science Data Repository - the platform for materials research
Valery Tkachenko
 
Capturing the context: one small(ish step for modellers, one giant leap for m...
FAIRDOM
 
Building a repository of biomedical ontologies with Neo4j
Simon Jupp
 
Federated data stores using semantic web technology
Steve Ray
 
Experiences to learn from the MS proteomics field
Juan Antonio Vizcaino
 
Reproducible and citable data and models: an introduction.
FAIRDOM
 

What's hot (20)

PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Juan Antonio Vizcaino
 
PPTX
Semantic Infrastructure to Enable Collaboration in Ontology Development
Paul Alexander
 
PDF
Collaborative ontology development
sssw2012
 
PPTX
Building a Standard for Standards: The ChAMP Project
Stuart Chalk
 
PPT
Clustering the royal society of chemistry chemical repository to enable enhan...
Valery Tkachenko
 
PPT
Loughborough research forum 2010 data overload presentation
Nicola Louise Beddall-Hill
 
PPTX
Pride and ProteomeXchange
Juan Antonio Vizcaino
 
PPTX
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
PPTX
Crosslinks
ericmeeks
 
PPTX
schema.org and biomedical ontologies
Simon Jupp
 
PPTX
Facilitating semantic alignment.-biohackathon-jupp
Simon Jupp
 
PPTX
Crediting informatics and data folks in life science teams
Carole Goble
 
PDF
Semtech web-protege-tutorial
matthewhorridge
 
PPTX
ACS 248th Paper 71 ChAMP Project
Stuart Chalk
 
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
PDF
OEG-Tools for supporting Ontology Engineering
María Poveda Villalón
 
PPTX
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk
 
PPTX
Proteomics public data resources: enabling "big data" analysis in proteomics
Juan Antonio Vizcaino
 
PDF
Improving the Management of Computational Models -- Invited talk at the EBI
Martin Scharm
 
PPTX
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
Neo4j
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Juan Antonio Vizcaino
 
Semantic Infrastructure to Enable Collaboration in Ontology Development
Paul Alexander
 
Collaborative ontology development
sssw2012
 
Building a Standard for Standards: The ChAMP Project
Stuart Chalk
 
Clustering the royal society of chemistry chemical repository to enable enhan...
Valery Tkachenko
 
Loughborough research forum 2010 data overload presentation
Nicola Louise Beddall-Hill
 
Pride and ProteomeXchange
Juan Antonio Vizcaino
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
Crosslinks
ericmeeks
 
schema.org and biomedical ontologies
Simon Jupp
 
Facilitating semantic alignment.-biohackathon-jupp
Simon Jupp
 
Crediting informatics and data folks in life science teams
Carole Goble
 
Semtech web-protege-tutorial
matthewhorridge
 
ACS 248th Paper 71 ChAMP Project
Stuart Chalk
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
OEG-Tools for supporting Ontology Engineering
María Poveda Villalón
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Juan Antonio Vizcaino
 
Improving the Management of Computational Models -- Invited talk at the EBI
Martin Scharm
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
Neo4j
 
Ad

Viewers also liked (20)

PPTX
Photographic terminology (1)
ItsRylan
 
PPTX
LASĪTĀKĀS GRĀMATAS APRĪLĪ
Cēsu Centrālā bibliotēka
 
PPSX
Aula de Geografia
escolanilce
 
PDF
Como arrumar a mesa para as refeições brincadeiras na cozinha - cristina ca...
Maria Lúcia Medeiros
 
PDF
Quimica2010
Nicolás Ruiz
 
PDF
Magento News @ Magento Meetup Wien 18
Matthias Glitzner-Zeis
 
PPTX
TOP10 pieaugušo grāmatas martā
Cēsu Centrālā bibliotēka
 
PDF
Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...
Raimondo Villano
 
PPTX
Costume research 2
richadasgupta
 
PDF
Brochure jl jaime lopez
Felipe Lopez Casado
 
PDF
Continous Integration in einem Open Source Projekt
Christian Münch
 
PDF
New Infinity Collection Mestre
Bronces Mestre S.A.
 
PDF
Didactica magna
PalOma FV
 
DOC
Resume_JMD_ 01-14-17
Jerry Divirgilio
 
PDF
New collections iv
Bronces Mestre S.A.
 
PPTX
The Material For Writing
Anisa Rahmawati
 
DOC
Conclusiones Semana Mediterránea de Lideres Económicos - ASCAME
ASCAME
 
ODP
Isef2012 fabri jp
fabriziogil3
 
PDF
Magento News @ Magento Meetup Wien 19
Matthias Glitzner-Zeis
 
Photographic terminology (1)
ItsRylan
 
LASĪTĀKĀS GRĀMATAS APRĪLĪ
Cēsu Centrālā bibliotēka
 
Aula de Geografia
escolanilce
 
Como arrumar a mesa para as refeições brincadeiras na cozinha - cristina ca...
Maria Lúcia Medeiros
 
Quimica2010
Nicolás Ruiz
 
Magento News @ Magento Meetup Wien 18
Matthias Glitzner-Zeis
 
TOP10 pieaugušo grāmatas martā
Cēsu Centrālā bibliotēka
 
Raimondo Villano - Pharmacopoeias from the Ducky of Naples to Kingdom of th...
Raimondo Villano
 
Costume research 2
richadasgupta
 
Brochure jl jaime lopez
Felipe Lopez Casado
 
Continous Integration in einem Open Source Projekt
Christian Münch
 
New Infinity Collection Mestre
Bronces Mestre S.A.
 
Didactica magna
PalOma FV
 
Resume_JMD_ 01-14-17
Jerry Divirgilio
 
New collections iv
Bronces Mestre S.A.
 
The Material For Writing
Anisa Rahmawati
 
Conclusiones Semana Mediterránea de Lideres Económicos - ASCAME
ASCAME
 
Isef2012 fabri jp
fabriziogil3
 
Magento News @ Magento Meetup Wien 19
Matthias Glitzner-Zeis
 
Ad

Similar to Making Linked Data SPARQL with the InterMine Biological Data Warehouse (20)

PPTX
Model Organism Linked Data
Michel Dumontier
 
PPTX
intermine.bio2rdf.org : A QLever SPARQL endpoint
François Belleau
 
PPTX
Making data FAIR using InterMine
Justin Clark-Casey
 
PPTX
InterMine Infrastructure LF Meeting 20150428
Vivek Krishnakumar
 
ODP
BiVi Intermine Slides
Yo Yehudi
 
PDF
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Franck Michel
 
PDF
Smith Inter Mine Bosc2008
bosc_2008
 
PDF
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
PPTX
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Vivek Krishnakumar
 
PPTX
Big Data Processing
Michael Ming Lei
 
PDF
Genome science intermine
ELIXIR UK
 
PDF
A Kalderimis - InterMine: Embeddable datamining components
Jan Aerts
 
PPTX
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
PDF
Adaptive Semantic Data Management Techniques for Federations of Endpoints
PlanetData Network of Excellence
 
PDF
An Implementation of a New Framework for Automatic Generation of Ontology and...
IJCSIS Research Publications
 
PDF
Overview of the SPARQL-Generate language and latest developments
Maxime Lefrançois
 
PDF
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
Lixi Conrads
 
PDF
RDF: what and why plus a SPARQL tutorial
Jerven Bolleman
 
PPT
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
PDF
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
Model Organism Linked Data
Michel Dumontier
 
intermine.bio2rdf.org : A QLever SPARQL endpoint
François Belleau
 
Making data FAIR using InterMine
Justin Clark-Casey
 
InterMine Infrastructure LF Meeting 20150428
Vivek Krishnakumar
 
BiVi Intermine Slides
Yo Yehudi
 
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Franck Michel
 
Smith Inter Mine Bosc2008
bosc_2008
 
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Vivek Krishnakumar
 
Big Data Processing
Michael Ming Lei
 
Genome science intermine
ELIXIR UK
 
A Kalderimis - InterMine: Embeddable datamining components
Jan Aerts
 
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
PlanetData Network of Excellence
 
An Implementation of a New Framework for Automatic Generation of Ontology and...
IJCSIS Research Publications
 
Overview of the SPARQL-Generate language and latest developments
Maxime Lefrançois
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
Lixi Conrads
 
RDF: what and why plus a SPARQL tutorial
Jerven Bolleman
 
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 

Recently uploaded (20)

PPTX
Laboratory design and safe microbiological practices
Akanksha Divkar
 
PPTX
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
PPTX
Role of GIS in precision farming.pptx
BikramjitDeuri
 
PPTX
Pharmacognosy: ppt :pdf :pharmacognosy :
Vishnukanchi darade
 
PPTX
Sleep_pysilogy_types_REM_NREM_duration_Sleep center
muralinath2
 
PDF
A deep Search for Ethylene Glycol and Glycolonitrile in the V883 Ori Protopla...
Sérgio Sacani
 
PDF
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
PPTX
Hydrocarbons Pollution. OIL pollutionpptx
AkCreation33
 
PPTX
General Characters and Classification of Su class Apterygota.pptx
Dr Showkat Ahmad Wani
 
PPTX
Limbic system_components_connections_ functions.pptx
muralinath2
 
PDF
Identification of unnecessary object allocations using static escape analysis
ESUG
 
PPTX
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
PPTX
Internal Capsule_Divisions_fibres_lesions
muralinath2
 
PDF
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
DOCX
Echoes_of_Andromeda_Partial (1).docx9989
yakshitkrishnia5a3
 
PDF
The Cosmic Symphony: How Photons Shape the Universe and Our Place Within It
kutatomoshi
 
PPTX
Q1_Science 8_Week4-Day 5.pptx science re
AizaRazonado
 
PPTX
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
PPTX
ANTIANGINAL DRUGS.pptx m pharm pharmacology
46JaybhayAshwiniHari
 
PDF
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
Laboratory design and safe microbiological practices
Akanksha Divkar
 
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
Role of GIS in precision farming.pptx
BikramjitDeuri
 
Pharmacognosy: ppt :pdf :pharmacognosy :
Vishnukanchi darade
 
Sleep_pysilogy_types_REM_NREM_duration_Sleep center
muralinath2
 
A deep Search for Ethylene Glycol and Glycolonitrile in the V883 Ori Protopla...
Sérgio Sacani
 
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
Hydrocarbons Pollution. OIL pollutionpptx
AkCreation33
 
General Characters and Classification of Su class Apterygota.pptx
Dr Showkat Ahmad Wani
 
Limbic system_components_connections_ functions.pptx
muralinath2
 
Identification of unnecessary object allocations using static escape analysis
ESUG
 
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
Internal Capsule_Divisions_fibres_lesions
muralinath2
 
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
Echoes_of_Andromeda_Partial (1).docx9989
yakshitkrishnia5a3
 
The Cosmic Symphony: How Photons Shape the Universe and Our Place Within It
kutatomoshi
 
Q1_Science 8_Week4-Day 5.pptx science re
AizaRazonado
 
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
ANTIANGINAL DRUGS.pptx m pharm pharmacology
46JaybhayAshwiniHari
 
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 

Making Linked Data SPARQL with the InterMine Biological Data Warehouse

  • 1. Making Linked Data SPARQL with the InterMine Biological Data Warehouse 9th International SWAT4LS Conference 5-8 December 2016 Amsterdam Justin Clark-Casey, Software Engineer @InterMine Daniela Butano, Software Engineer @InterMine
  • 2. Today’s Talk ● InterMine ● MOLD (Model Organism Linked Database) ● Providing RDF and SPARQL from all Mines: The challenges ahead
  • 3. What is InterMine? ● A biological data warehouse. ● Initially for Drosophilia. ● But with a flexible and extensible data model. ● Now used as infrastructure by many model organism (MOD) and other life sciences projects. ● Open-source, continuous development for over 15 years. ● 7 software engineers, 1 biologist, 1 PI.
  • 13. Extracting Data Getting data via a script (Python example) from intermine.webservice import Service service = Service("http://www.flymine.org/flymine/service") query = service.new_query("Gene") query.add_view("name", "proteins.uniprotName") query.add_constraint("name", "=", "zerknullt", code = "A") for row in query.rows(): print row["name"], row["proteins.uniprotName"] Results zerknullt ZEN1_DROME zerknullt B4LZ31_DROVI
  • 14. Great, but... These export mechanisms have served us well and continue to do so. But... ● Query requires use of a bespoke language (InterMine PathQuery). ● Exported data may require transformation. ● Whole biological objects only have a human view. A core aim of InterMine is to make its data provision FAIR, we are always looking for ways to facilitate this...
  • 15. MOLD ● Created by the Dumontier Lab in Stanford. ● Model Organism Linked Database ● Create a LOD of model organism data. ○ With links to ontologies and other LOD (e.g. Bio2RDF). ● Publish tools to access and explore the data.
  • 21. What next? ● Incorporate and extend MOLD components to allow any mine operator to ○ Generate and publish RDF dumps. ○ Make biological objects available as RDF resources. ○ Provide a SPARQL endpoint. ○ Explore emerging approaches such as Triple Pattern Fragments. ● Mine operators may not be software engineers ○ Software and processes need to be consumable.
  • 22. Data Challenge : Stable URIs ● Navigation InterMine URIs do not have a stable ID ○ http://www.flymine.org/flymine/report.do?id=1007741 ● InterMine ‘shareable’ URIs are better but still have issues ○ http://www.flymine.org/flymine/portal.do?class=Gene&externalids=FBgn00 04053 ● Persistence in the face of ○ Name changes ○ Scientific changes
  • 23. Data Challenge : Ontologies ● As of now, InterMine has a data model with no attached ontologies. ○ Sequence Ontology is a partial exception. ● InterMine-RDFizer generates a vocabulary automatically for the data model. ● But we want to emit RDF that uses existing ontologies ○ Gene Ontology ○ FALDO ○ etc. ● Issues ○ Need a mechanism to attach arbitrary ontologies to the core data model and any extensions. ○ Which ontologies? ○ How do we facilitate user selection?
  • 24. Tech Challenge : Performance ● Other projects (e.g. MODs) rely on us. ● Questions around SPARQL performance. ● Proposed Solution: Adapt MOLD’s Dockerization approach with a separate triplestore for data. ○ Pros: ■ Easier deployment. ■ Performance issues can be contained. ■ Decoupled iteration. ○ Cons: ■ Multiple systems. ■ Maturity of Docker?
  • 25. Maxime Déraspe Department of Molecular Medicine, Université Laval, Québec, CA Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, US Gail Binkley Department of Genetics, Stanford University, Stanford, US Daniela Butano Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK Justin Clark-Casey Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK Kalpana Karra Department of Genetics, Stanford University, Stanford, US Julie Sullivan Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK J. Michael Cherry Department of Genetics, Stanford University, Stanford, US Jacques Corbeil Department of Molecular Medicine, Université Laval, Québec, CA Gos Micklem Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK Michel Dumontier Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, US