SlideShare a Scribd company logo
Building a repository of biomedical
ontologies with Neo4j
Simon Jupp jupp@ebi.ac.uk, @simonjupp
Samples, Phenotypes and Ontologies Team
European Bioinformatics Institute
Cambridge, UK.
Biological data heavily interlinked
Proteome
Metabolome
Genome
tissue
CE-MS
antibody array LC-MS/MS
m/z
600 800 1000 1200 1400 1600
10
20
30
40
50
60
70
80
90
100
Intensity
609.256
b6
755.422
y8
882.357
b9
852.476
y9
995.435
b10
1092.506
b11
1181.252
y12
1318.578
b13
1587.759
b16
1715.817
b18
858.408
b18 ++
794.380
b16 ++
0
miRNA
array
mRNA
array
PathwaysProtein Interaction
Drug targets
We need terminology standards
Dyschromatopsia
Search PubMed for “color blindness”
Search PubMed for “Dyschromatopsia”
Search PubMed for "abnormality of the eye"
The ontology of color blindness
HP:0011518 (Dichromacy )HP:0011518 (Eye)
HP:0000551 (Abnormality of color vision )
HP:0007641 (Dyschromatopsia)
Is-a
Is-a
Disease-location
The ontology of color blindness
HP:0011518 (Dichromacy )HP:0011518 (Eye)
HP:0000551 (Abnormality of color vision )
HP:0007641 (Dyschromatopsia)
Is-a
Is-a
Disease-location
“Colorblindness”
“A form of colorblindness in
which only two of the three
fundamental colors can be
distinguished due to a lack of
one of the retinal cone
pigments.”
synonym
definition
9
Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue /
enzyme source
Development
Anatomy
Phenotype
Plasmodium
life cycle
-Sequence types
and features
-Genetic Context
- Molecule role
- Molecular Function
- Biological process
- Cellular component
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
- Human phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
eVOC (Expressed
Sequence Annotation
for Humans)
Ontologies for life sciences
Ontology Lookup Service
• Ontology search engine (Solr)
• Graph database of terms (Neo4j)
• Powerful RESTful API (Built with Spring data neo4j / rest)
• Open source project
• Generic infrastructure (can load any ontology represented in OWL)
https://github.com/EBISPOT/OLS
Repository of over 140 biomedical ontologies (4.5 million terms, 11 million relations)
http://www.ebi.ac.uk/ols/beta
Web Ontology Language – (OWL)
• W3C standard vocabulary for describing
ontologies
• Powerful knowledge representation
However
• OWL ontologies aren’t graphs, but…
… can be represented as an RDF graph
… people want to use them as graphs
• Plenty of RDF databases around
• But incomplete w.r.t. OWL semantics
• SPARQL is an acquired taste
OWL to Neo4j schema
• Each node label one of {Class, Property, Individuals} AND {Ontology name}
• All OWL annotations become properties (labels, id, descriptions etc)
• Superclass of (named and simple existentials) become edges in Neo4j
• E.g. In OWL “heart” subclassOf (part-of some “cardiovascular system”)
In Neo4j “heart” part-of “cardiovascular system”
What are the sub types of “colorblindess”?
MATCH (n:Class {obo_id: 'HP:0007641'})<-[r*]-(types:Class)
RETURN n, r, types
What parts of the eye are related to
diseases?MATCH
(eye:Class {obo_id: 'UBERON:0000970'})<-[r:Related
{label : "part_of"}]-(eye_part:Class)<-[r1:Related
{label : "has_disease_location"}]-(disease:Class)
RETURN eye, r,r1, eye_part, disease
Finding common ancestors via shortest path
Match p=shortestPath( (a:Class)-[r:SUBCLASSOF*]-(b:Class) )
Return nodes(p)
What is the common taxonomic
superfamily of Gibbons and Chimpanzees?
(or Hylobatidae and Pan troglodytes!)
https://commons.wikimedia.org/wiki/File:Hylobates_lar_pair_of_white_and_black_01.jpg
OLS visualisations
• Partonomy for heart from the UBERON anatomy ontology
MATCH path = (n:Class)-[r:SUBCLASSOF|PartOf*]->(ancestor)
REST API (Spring Data REST + Neo4j)
• Crawlable API - Hypermedia drivel (HAL)
• Get ontology and term meta data
• /ontologies
• /ontologies/{name}
• /ontologies/{name}/terms
• /ontologies/{name}/terms/{termid}
• Get related terms and navigate ontology structure
• /ontologies/{name}/terms/{termid}/parent
• /ontologies/{name}/terms/{termid}/children
• /ontologies/{name}/terms/{termid}/descendants
• /ontologies/{name}/terms/{termid}/ancestors
• /ontologies/{name}/terms/{termid}/{relation} e.g. part_of
http://www.ebi.ac.uk/ols/beta/api
Building the index
• We check all 140 external ontology files nightly for
changes
• We have a master build index
• When ontology updates we remove the old version and
reload using the Neo4j BatchInserter (Potentially fragile)
• We push master index to various production data centers
• Provides load balancing
Nightly crawl of all
>140 registered
ontologies
Conclusion
• We’ve built a scalable repository of biomedical ontologies
with Neo4j
• Generic OWL indexer (simplified OWL)
• Powerful REST API built with Spring
• Acts as standalone OWL ontology server
• Now being deployed externally
• Beta ~2000 users / 10 Million requests per month
• Would like to discuss
• Batch Inserter
• Migrating to Spring Data Neo4j 4
Acknowledgements
• Sample Phenotypes and Ontologies Team - Tony
Burdett, James Malone, Dani Welter, Catherine Leroy,
Sira Sarntivijai, Ilinca Tudose, Helen Parkinson
• Matt Pearce – Flax (BioSOLR project)
• Michal Bachman and GraphAware team (Neo4j training)
• Funding
• European Molecular Biology Laboratory (EMBL)
• European Union projects: DIACHRON, BioMedBridges and
CORBEL

More Related Content

PPTX
Building a repository of biomedical ontologies with Neo4j
Simon Jupp
 
PPTX
Ontologies neo4j-graph-workshop-berlin
Simon Jupp
 
PPTX
Facilitating semantic alignment.-biohackathon-jupp
Simon Jupp
 
PPTX
schema.org and biomedical ontologies
Simon Jupp
 
PPT
Importing life science at a into Neo4j
Simon Jupp
 
PPTX
Semantics as a service at EMBL-EBI
Simon Jupp
 
PPTX
Ontologies: Necessary, but not sufficient
robertstevens65
 
PPTX
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
robertstevens65
 
Building a repository of biomedical ontologies with Neo4j
Simon Jupp
 
Ontologies neo4j-graph-workshop-berlin
Simon Jupp
 
Facilitating semantic alignment.-biohackathon-jupp
Simon Jupp
 
schema.org and biomedical ontologies
Simon Jupp
 
Importing life science at a into Neo4j
Simon Jupp
 
Semantics as a service at EMBL-EBI
Simon Jupp
 
Ontologies: Necessary, but not sufficient
robertstevens65
 
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
robertstevens65
 

What's hot (20)

PDF
Neo4j and bioinformatics
Pablo Pareja Tobes
 
PDF
Bh14 ogo
jesualdofernandez
 
PPTX
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Pablo Pareja Tobes
 
PDF
Federated data stores using semantic web technology
Steve Ray
 
PDF
BioSamples Database Linked Data, SWAT4LS Tutorial
Rothamsted Research, UK
 
PPTX
All together now: piecing together the knowledge graph of life
Chris Mungall
 
PPT
OWL-XML-Summer-School-09
Duncan Hull
 
PDF
Ontologies for life sciences: examples from the gene ontology
Melanie Courtot
 
PDF
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Justin Clark-Casey
 
PPTX
Classifications in EOL
Cyndy Parr
 
PPT
Building and Using Ontologies to do biology
robertstevens65
 
PPTX
How to search_free_crystallography_databases_benedictine_university final 111...
Benedictine University Library
 
PPT
The importance of the InChI identifier as a foundation technology for eScienc...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PPT
Ontology learning from text
robertstevens65
 
PDF
SWAT4LS 2014 SLIDE by Yamamoto
yayamamo @ DBCLS Kashiwanoha
 
PPT
Issues in Learning an Ontology from Text
robertstevens65
 
PDF
BioSD Tutorial 2014 Editition
Rothamsted Research, UK
 
PPTX
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk
 
PDF
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Hilmar Lapp
 
Neo4j and bioinformatics
Pablo Pareja Tobes
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Pablo Pareja Tobes
 
Federated data stores using semantic web technology
Steve Ray
 
BioSamples Database Linked Data, SWAT4LS Tutorial
Rothamsted Research, UK
 
All together now: piecing together the knowledge graph of life
Chris Mungall
 
OWL-XML-Summer-School-09
Duncan Hull
 
Ontologies for life sciences: examples from the gene ontology
Melanie Courtot
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Justin Clark-Casey
 
Classifications in EOL
Cyndy Parr
 
Building and Using Ontologies to do biology
robertstevens65
 
How to search_free_crystallography_databases_benedictine_university final 111...
Benedictine University Library
 
The importance of the InChI identifier as a foundation technology for eScienc...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Ontology learning from text
robertstevens65
 
SWAT4LS 2014 SLIDE by Yamamoto
yayamamo @ DBCLS Kashiwanoha
 
Issues in Learning an Ontology from Text
robertstevens65
 
BioSD Tutorial 2014 Editition
Rothamsted Research, UK
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk
 
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Hilmar Lapp
 
Ad

Viewers also liked (20)

PDF
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
Neo4j
 
PPTX
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
StampedeCon
 
PPTX
Neo4j GraphTalks - Semantische Netze
Neo4j
 
ODP
Graph databases in computational bioloby: case of neo4j and TitanDB
Andrei KUCHARAVY
 
PDF
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Neo4j
 
PPTX
Temporal graph
Vinay Sarda
 
PPTX
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Sebastian Verheughe
 
PDF
The power of graphs to analyze biological data
datablend
 
PDF
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
Neo4j
 
PDF
GraphConnect Europe 2016 - Moving Graphs to Production at Scale - Ian Robinson
Neo4j
 
PDF
GraphConnect Europe 2016 - Navigating All the Knowledge - James Weaver
Neo4j
 
PDF
GraphConnect Europe 2016 - Governing Multichannel Services with Graphs - Albe...
Neo4j
 
PPTX
GraphConnect Europe 2016 - Inside the Spider’s Web: Dependency Management wit...
Neo4j
 
PDF
GraphConnect Europe 2016 - How Go and Neo4j enabled the FT to Deliver at Spee...
Neo4j
 
PDF
GraphConnect Europe 2016 - Building Spring Data Neo4j 4.1 Applications Like A...
Neo4j
 
PDF
GraphConnect Europe 2016 - Creating the Best Teams Ever with Collaborative Fi...
Neo4j
 
PDF
GraphConnect Europe 2016 - Pushing the Evolution of Software Analytics with G...
Neo4j
 
PDF
GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham
Neo4j
 
PDF
GraphConnect Europe 2016 - Who Cares What Beyonce Ate for Lunch? - Alicia Powers
Neo4j
 
PDF
Slides from GraphDay Santa Clara
Neo4j
 
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
Neo4j
 
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
StampedeCon
 
Neo4j GraphTalks - Semantische Netze
Neo4j
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Andrei KUCHARAVY
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Neo4j
 
Temporal graph
Vinay Sarda
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Sebastian Verheughe
 
The power of graphs to analyze biological data
datablend
 
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
Neo4j
 
GraphConnect Europe 2016 - Moving Graphs to Production at Scale - Ian Robinson
Neo4j
 
GraphConnect Europe 2016 - Navigating All the Knowledge - James Weaver
Neo4j
 
GraphConnect Europe 2016 - Governing Multichannel Services with Graphs - Albe...
Neo4j
 
GraphConnect Europe 2016 - Inside the Spider’s Web: Dependency Management wit...
Neo4j
 
GraphConnect Europe 2016 - How Go and Neo4j enabled the FT to Deliver at Spee...
Neo4j
 
GraphConnect Europe 2016 - Building Spring Data Neo4j 4.1 Applications Like A...
Neo4j
 
GraphConnect Europe 2016 - Creating the Best Teams Ever with Collaborative Fi...
Neo4j
 
GraphConnect Europe 2016 - Pushing the Evolution of Software Analytics with G...
Neo4j
 
GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham
Neo4j
 
GraphConnect Europe 2016 - Who Cares What Beyonce Ate for Lunch? - Alicia Powers
Neo4j
 
Slides from GraphDay Santa Clara
Neo4j
 
Ad

Similar to GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp (20)

PDF
Connecting life sciences data at the European Bioinformatics Institute
Connected Data World
 
PDF
Ontology Services for the Biomedical Sciences
Connected Data World
 
PDF
Ontology-based data access and semantic mining with Aber-OWL
Robert Hoehndorf
 
PPT
Working with big biomedical ontologies
robertstevens65
 
PDF
Open interoperability standards, tools and services at EMBL-EBI
Pistoia Alliance
 
PPTX
Drug-discovery knowledge integration and analysis using OWL and reasoners
Samuel Croset
 
PPT
JulieKlein_Bosc2012
KUPKB_Team
 
PPT
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
Jan Aerts
 
PDF
#LAWDI Open Context, publishing linked data in archaeology
ekansa
 
PPT
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
PDF
Semantic tools for aggregation of morphological characters across studies
balhoff
 
PDF
Meghyn slides-hse-2014
Vitaliy Dolgorukov
 
PPTX
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...
Trish Whetzel
 
PDF
Unifying ontology services for functional genomic annotations
Tomasz Adamusiak
 
PPTX
Investigating Term Reuse and Overlap in Biomedical Ontologies
Maulik Kamdar
 
PPT
Bio solr building a better search for bioinformatics
Charlie Hull
 
PDF
Building a Model Organism Metabolome Database
Christoph Steinbeck
 
PDF
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Hilmar Lapp
 
PPTX
A Semantic Web based Framework for Linking Healthcare Information with Comput...
Koray Atalag
 
Connecting life sciences data at the European Bioinformatics Institute
Connected Data World
 
Ontology Services for the Biomedical Sciences
Connected Data World
 
Ontology-based data access and semantic mining with Aber-OWL
Robert Hoehndorf
 
Working with big biomedical ontologies
robertstevens65
 
Open interoperability standards, tools and services at EMBL-EBI
Pistoia Alliance
 
Drug-discovery knowledge integration and analysis using OWL and reasoners
Samuel Croset
 
JulieKlein_Bosc2012
KUPKB_Team
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
Jan Aerts
 
#LAWDI Open Context, publishing linked data in archaeology
ekansa
 
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
Semantic tools for aggregation of morphological characters across studies
balhoff
 
Meghyn slides-hse-2014
Vitaliy Dolgorukov
 
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...
Trish Whetzel
 
Unifying ontology services for functional genomic annotations
Tomasz Adamusiak
 
Investigating Term Reuse and Overlap in Biomedical Ontologies
Maulik Kamdar
 
Bio solr building a better search for bioinformatics
Charlie Hull
 
Building a Model Organism Metabolome Database
Christoph Steinbeck
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Hilmar Lapp
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
Koray Atalag
 

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
Neo4j
 
PDF
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
PPTX
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
PDF
Neo4j: The Art of the Possible with Graph
Neo4j
 
PDF
Smarter Knowledge Graphs For Public Sector
Neo4j
 
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
PDF
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
PDF
Démonstration Digital Twin Building Wire Management
Neo4j
 
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
PDF
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
PDF
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
Neo4j
 
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
The Future of Artificial Intelligence (AI)
Mukul
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Doc9.....................................
SofiaCollazos
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 

GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

  • 1. Building a repository of biomedical ontologies with Neo4j Simon Jupp [email protected], @simonjupp Samples, Phenotypes and Ontologies Team European Bioinformatics Institute Cambridge, UK.
  • 2. Biological data heavily interlinked Proteome Metabolome Genome tissue CE-MS antibody array LC-MS/MS m/z 600 800 1000 1200 1400 1600 10 20 30 40 50 60 70 80 90 100 Intensity 609.256 b6 755.422 y8 882.357 b9 852.476 y9 995.435 b10 1092.506 b11 1181.252 y12 1318.578 b13 1587.759 b16 1715.817 b18 858.408 b18 ++ 794.380 b16 ++ 0 miRNA array mRNA array PathwaysProtein Interaction Drug targets
  • 3. We need terminology standards Dyschromatopsia
  • 4. Search PubMed for “color blindness”
  • 5. Search PubMed for “Dyschromatopsia”
  • 6. Search PubMed for "abnormality of the eye"
  • 7. The ontology of color blindness HP:0011518 (Dichromacy )HP:0011518 (Eye) HP:0000551 (Abnormality of color vision ) HP:0007641 (Dyschromatopsia) Is-a Is-a Disease-location
  • 8. The ontology of color blindness HP:0011518 (Dichromacy )HP:0011518 (Eye) HP:0000551 (Abnormality of color vision ) HP:0007641 (Dyschromatopsia) Is-a Is-a Disease-location “Colorblindness” “A form of colorblindness in which only two of the three fundamental colors can be distinguished due to a lack of one of the retinal cone pigments.” synonym definition
  • 9. 9 Genotype Phenotype Sequence Proteins Gene products Transcript Pathways Cell type BRENDA tissue / enzyme source Development Anatomy Phenotype Plasmodium life cycle -Sequence types and features -Genetic Context - Molecule role - Molecular Function - Biological process - Cellular component -Protein covalent bond -Protein domain -UniProt taxonomy -Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction -Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version -Mosquito gross anatomy -Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy -Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development -NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype - Human phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history eVOC (Expressed Sequence Annotation for Humans) Ontologies for life sciences
  • 10. Ontology Lookup Service • Ontology search engine (Solr) • Graph database of terms (Neo4j) • Powerful RESTful API (Built with Spring data neo4j / rest) • Open source project • Generic infrastructure (can load any ontology represented in OWL) https://github.com/EBISPOT/OLS Repository of over 140 biomedical ontologies (4.5 million terms, 11 million relations) http://www.ebi.ac.uk/ols/beta
  • 11. Web Ontology Language – (OWL) • W3C standard vocabulary for describing ontologies • Powerful knowledge representation However • OWL ontologies aren’t graphs, but… … can be represented as an RDF graph … people want to use them as graphs • Plenty of RDF databases around • But incomplete w.r.t. OWL semantics • SPARQL is an acquired taste
  • 12. OWL to Neo4j schema • Each node label one of {Class, Property, Individuals} AND {Ontology name} • All OWL annotations become properties (labels, id, descriptions etc) • Superclass of (named and simple existentials) become edges in Neo4j • E.g. In OWL “heart” subclassOf (part-of some “cardiovascular system”) In Neo4j “heart” part-of “cardiovascular system”
  • 13. What are the sub types of “colorblindess”? MATCH (n:Class {obo_id: 'HP:0007641'})<-[r*]-(types:Class) RETURN n, r, types
  • 14. What parts of the eye are related to diseases?MATCH (eye:Class {obo_id: 'UBERON:0000970'})<-[r:Related {label : "part_of"}]-(eye_part:Class)<-[r1:Related {label : "has_disease_location"}]-(disease:Class) RETURN eye, r,r1, eye_part, disease
  • 15. Finding common ancestors via shortest path Match p=shortestPath( (a:Class)-[r:SUBCLASSOF*]-(b:Class) ) Return nodes(p) What is the common taxonomic superfamily of Gibbons and Chimpanzees? (or Hylobatidae and Pan troglodytes!) https://commons.wikimedia.org/wiki/File:Hylobates_lar_pair_of_white_and_black_01.jpg
  • 16. OLS visualisations • Partonomy for heart from the UBERON anatomy ontology MATCH path = (n:Class)-[r:SUBCLASSOF|PartOf*]->(ancestor)
  • 17. REST API (Spring Data REST + Neo4j) • Crawlable API - Hypermedia drivel (HAL) • Get ontology and term meta data • /ontologies • /ontologies/{name} • /ontologies/{name}/terms • /ontologies/{name}/terms/{termid} • Get related terms and navigate ontology structure • /ontologies/{name}/terms/{termid}/parent • /ontologies/{name}/terms/{termid}/children • /ontologies/{name}/terms/{termid}/descendants • /ontologies/{name}/terms/{termid}/ancestors • /ontologies/{name}/terms/{termid}/{relation} e.g. part_of http://www.ebi.ac.uk/ols/beta/api
  • 18. Building the index • We check all 140 external ontology files nightly for changes • We have a master build index • When ontology updates we remove the old version and reload using the Neo4j BatchInserter (Potentially fragile) • We push master index to various production data centers • Provides load balancing Nightly crawl of all >140 registered ontologies
  • 19. Conclusion • We’ve built a scalable repository of biomedical ontologies with Neo4j • Generic OWL indexer (simplified OWL) • Powerful REST API built with Spring • Acts as standalone OWL ontology server • Now being deployed externally • Beta ~2000 users / 10 Million requests per month • Would like to discuss • Batch Inserter • Migrating to Spring Data Neo4j 4
  • 20. Acknowledgements • Sample Phenotypes and Ontologies Team - Tony Burdett, James Malone, Dani Welter, Catherine Leroy, Sira Sarntivijai, Ilinca Tudose, Helen Parkinson • Matt Pearce – Flax (BioSOLR project) • Michal Bachman and GraphAware team (Neo4j training) • Funding • European Molecular Biology Laboratory (EMBL) • European Union projects: DIACHRON, BioMedBridges and CORBEL