Reasoning with Biological
Knowledge
Benjamin Good
Gene Ontology Consortium
Lawrence Berkeley National Labs
@bgood
U.S. Semantic Technologies Symposium Series
March 11-13, 2019 at Duke University in Durham, NC
Biological Knowledge
• There is a lot:
PubMed grows by
more than 1
million articles a
year
Human Gene
Hard to answer
important questions
What are the functions of
Fibronectin?
My experiment suggested that 238
genes are doing something
important.
What are they doing?
What processes are they
enabling?
Vital to Structure
Biological Knowledge
Knowledge Bases
convert textual and or
expert knowledge into
a form suitable for
question answering
Knowledge Bases
Gene Ontology (GO)
Mission
“The mission of the GO Consortium is
to develop a comprehensive,
computational model of biological
systems, ranging from the molecular to
the organism level, across the
multiplicity of species in the tree of life.”
http://geneontology.org
The Gene Ontology
Ontology
- 45k terms
- 106k edges
Biological
Process
Cellular
Component
Molecular
Function
Gene
s
G1 G2 G3 G4 G5 G6
Gene
“Annotations”
GO Knowledge Base
2018
• More than 140,000 published papers
referenced in …
• More than 750,000 experimentally supported
annotations ...
• Used to infer more than 7,000,000 functional
annotations for more than 3000 organisms.
• >22,000 citations of original paper.
GO Consortium. (2018) The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. gky1055
of reasoning with the
GO
GO evolution timeline
1998 - flatfile
2000 - SQL database
2006 - OBO format, OBO-Edit
2012 - Ontology editors start working in OWL
2017 - GO annotations done natively in OWL as
“GO-CAMs”. OWL reasoning now used throughout.
● Most people involved are
biologists by training and
objective.
● Adoption of Semantic Web
approaches purely driven
by application need.
From unconnected
gene annotations to
models of function
NEDD4 Ubiquitin-protein ligase activity
NEDD4 Nucleus
NEDD4
Ubiquitin-dependent protein
catabolic process
GO-CAM
NEDD4
Causal Activity Models
“GO-CAMs”
OWL framework for developing models of gene
function of arbitrary complexity
Ontology of a GO-CAM
Classes from GO (and other ontologies) are used to
describe the types of nodes (OWL instances) in the
model.
Relations from the Relation Ontology are used to
describe the relations linking the nodes.
Evidence classes from the Evidence Code Ontology
justify the asserted relationships
OWL Representation
<http://model.geneontology.org/R-HSA-112409/007>
a <http://purl.obolibrary.org/obo/GO_0005634> ,
<http://www.w3.org/2002/07/owl#NamedIndividual> ;
<http://purl.org/dc/elements/1.1/contributor>
"http://orcid.org/0000-0002-7757-3347" ;
<http://purl.org/dc/elements/1.1/date>
"2019-03-05" ;
<http://purl.org/pav/providedBy>
"https://reactome.org" ;
<http://model.geneontology.org/R-HSA-112409/011>
Ubiquitin-protein ligase
activity
Ligase
activity
Nucleus
Cell
ComponentClasse
s
Individuals
<http://purl.obolibrary.org/obo/BFO_0000066>
Occurs in
Object
Property
Propertie
s
OWL 2: annotatable
axioms
ECO_0000313: imported information
used in automatic assertion
<http://geneontology.org/lego/evidence>
<http://model.geneontology.org/R-HSA-112409/0088>
ECO_000311: imported information ● All claims annotated with terms
from the Evidence Code
Ontology
● And provenance: contributor,
date, source, provider
<dc:contributor>
"http://orcid.org/0000-0002-7757-3347"
“2019-03-05”
"https://reactome.org"
<dc:date>
<pav:providedBy>
Everything is OWL
Makes it possible to use general purpose OWL
reasoners with thousands of axioms encoded in GO,
RO, and other ontologies to :
a. Find errors
b. Infer relationships
Uses of OWL
Reasoning
• Editorial interface
• Very large and complex set of ontologies
• Automated consistency checking helps keep
distributed and diverse curation community
using same basic OWL representation
• Type inference reduces curator burden
• Import pipelines
• Quality control
• Type inference increases automated integration
• Export pipelines
• Inferred relations facilitate query
Consistency Checking
• Prevent invalid
inferences
• Helps keep a
distributed
curation
community using
the same patterns
• QC for import
code
Interface Turns red when OWL
inconsistency is detected.
Cellular Component Inference
Translesion synthesis by POLI
OWL Inference
example
?Y
Protein Complex
DNA-Directed
Polymerase Activity
RO:
Enabled by?X
DNA
polymerase
Complex
...
OWL
axioms
GO-CAM model
(OWL instances)
RDF:type
rdfs:subClassOf
RDF:type
RO: Capable of
DNA Polymerase
Activity
RDF:type
?X type protein-
containing complex
?X capable of ?Y
?Y type DNA
polymerase activity
?X→ type DNA
polymerase complex
Arachne Reasoner
https://github.com/balhoff/arachne
Molecular Function
Inference
Connecting to the
Gene Ontology
automatically
Showing imported data
without any type
assignment. Type (and
thus connection to the
ontology) is inferred
automatically based on
properties of the instance
Biological Process Inference
Relation Inference
Positively
regulates
Relation
Ontology
Causal Relation Query
Reasoning challenges
● Many challenges rooted in the very large T-box
○ In 2017, no open-source OWL reasoner could provide
fast enough inferences for the Noctua application.
Jim Balhoff had to write Arachne for it to work
■ = very high barrier to entry.
○ Explanations
■ Challenging UX even when “just” OWL inferences.
● We reason in OWL, Query in RDF, share as CSV,
JSON, Neo4J, GPAD, GAFF, … would be nice to have
a unified platform for the kb.
● NLP, ML for knowledge acquisition, curator assist
Acknowledgements
A very long list of people that created and contributed to
the Gene Ontology Consortium over the past 20 years
Specifically:
Chris Mungall, Paul Thomas, David Hill, Huaiyu Mi,
Peter D’Eustachio, Kimberly Van Auken, Seth
Carbon, Jim Balhoff, Laurent-Philippe Albou
Any questions?
http://geneontology.org/go-cam
@bgood
bgood@lbl.gov
Representing and reasoning with biological knowledge
Building
monsters
GO:0086094
positive regulation of ryanodine-sensitive calcium-release channel
activity by adrenergic receptor signaling pathway involved in positive
regulation of cardiac muscle contraction
GO:0010768
negative regulation of transcription from RNA polymerase II promoter in
response to UV-induced DNA damage
By Philippe Semeria
Pathway
to GO-
CAMReactome
GO-CAM
(OWL)
BioPAX
Series of rule-
based
transformations

More Related Content

PPTX
Integrating Pathway Databases with Gene Ontology Causal Activity Models
PPTX
Light Intro to the Gene Ontology
PPTX
All together now: piecing together the knowledge graph of life
PPTX
US2TS presentation on Gene Ontology
PPTX
Collaboratively Creating the Knowledge Graph of Life
PPTX
Ontology Development Kit: Bio-Ontologies 2019
PPT
Gene Ontology Project
PPTX
The Gene Ontology & Gene Ontology Annotation resources
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Light Intro to the Gene Ontology
All together now: piecing together the knowledge graph of life
US2TS presentation on Gene Ontology
Collaboratively Creating the Knowledge Graph of Life
Ontology Development Kit: Bio-Ontologies 2019
Gene Ontology Project
The Gene Ontology & Gene Ontology Annotation resources

What's hot (18)

PPTX
Experiences in the biosciences with the open biological ontologies foundry an...
PDF
Ontologies for life sciences: examples from the gene ontology
PPTX
Gene Ontology WormBase Workshop International Worm Meeting 2015
PPT
The Language of the Gene Ontology
PPTX
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
PPT
Introduction to Ontologies for Environmental Biology
PPTX
Mungall keynote-biocurator-2017
PPTX
Representation of kidney structures in Uberon
PPT
Modularity and evolvability
PPTX
Experiences with logic programming in bioinformatics
PPTX
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
PDF
Metabolic Network Analysis
PDF
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
PPTX
Presentationonline
PDF
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
PPTX
WikiPathways: how open source and open data can make omics technology more us...
ODP
The roles communities play in improving bioinformatics: better software, bett...
Experiences in the biosciences with the open biological ontologies foundry an...
Ontologies for life sciences: examples from the gene ontology
Gene Ontology WormBase Workshop International Worm Meeting 2015
The Language of the Gene Ontology
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
Introduction to Ontologies for Environmental Biology
Mungall keynote-biocurator-2017
Representation of kidney structures in Uberon
Modularity and evolvability
Experiences with logic programming in bioinformatics
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Metabolic Network Analysis
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
Presentationonline
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
WikiPathways: how open source and open data can make omics technology more us...
The roles communities play in improving bioinformatics: better software, bett...
Ad

Similar to Representing and reasoning with biological knowledge (20)

PPTX
Causal reasoning using the Relation Ontology
PPT
The Past, Present and Future of Knowledge in Biology
PDF
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
PPTX
Web Science, SADI, and the Singularity
PPTX
Tutorial OWL and drug discovery ICBO 2013
PPTX
Drug-discovery knowledge integration and analysis using OWL and reasoners
PPT
Reasoning Requirements for Bioscience
PPT
Building and Using Ontologies to do biology
PPTX
Web Science - ISoLA 2012
PPTX
Why Life is Difficult, and What We MIght Do About It
PPTX
Modeling exposure events and adverse outcome pathways using ontologies
PDF
Knowledge Discovery using an Integrated Semantic Web
PPT
Collaborative Ontology building: So much more than authoring an Ontology
PPTX
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
PPT
Ontology - and Reloaded and Revolutions
PPTX
Computing on the shoulders of giants
PPTX
HyQue: Evaluating scientific Hypotheses using semantic web technologies
PPTX
Gene World: A large-scale gene-centric semantic web knowledge base for molecu...
PPTX
Formalization and implementation of BFO 2 with a focus on the OWL implementation
PPTX
Ontology for the Financial Services Industry
Causal reasoning using the Relation Ontology
The Past, Present and Future of Knowledge in Biology
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
Web Science, SADI, and the Singularity
Tutorial OWL and drug discovery ICBO 2013
Drug-discovery knowledge integration and analysis using OWL and reasoners
Reasoning Requirements for Bioscience
Building and Using Ontologies to do biology
Web Science - ISoLA 2012
Why Life is Difficult, and What We MIght Do About It
Modeling exposure events and adverse outcome pathways using ontologies
Knowledge Discovery using an Integrated Semantic Web
Collaborative Ontology building: So much more than authoring an Ontology
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
Ontology - and Reloaded and Revolutions
Computing on the shoulders of giants
HyQue: Evaluating scientific Hypotheses using semantic web technologies
Gene World: A large-scale gene-centric semantic web knowledge base for molecu...
Formalization and implementation of BFO 2 with a focus on the OWL implementation
Ontology for the Financial Services Industry
Ad

More from Benjamin Good (20)

PPTX
Pathways2GO: Converting BioPax pathways to GO-CAMs
PPTX
Knowledge Beacons
PPTX
Building a Biomedical Knowledge Garden
PPTX
Science Game Lab
PPTX
Wikidata and the Semantic Web of Food
PPTX
Gene Wiki and Wikimedia Foundation SPARQL workshop
PPTX
Opportunities and challenges presented by Wikidata in the context of biocuration
PPTX
Scripps bioinformatics seminar_day_2
PPTX
Wikidata workshop for ISB Biocuration 2016
PPTX
Channeling Collaborative Spirit
PPTX
2016 bd2k bgood_wikidata
PPTX
2016 mem good
PPTX
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
PPTX
Gene Wiki and Mark2Cure update for BD2K
PPTX
2015 6 bd2k_biobranch_knowbio
PDF
(Bio)Hackathons
PDF
Citizen sciencepanel2015 pdf
PDF
Building a massive biomedical knowledge graph with citizen science
PPTX
Branch: An interactive, web-based tool for building decision tree classifiers
PPTX
Serious games for bioinformatics education. ISMB 2014 education workshop
Pathways2GO: Converting BioPax pathways to GO-CAMs
Knowledge Beacons
Building a Biomedical Knowledge Garden
Science Game Lab
Wikidata and the Semantic Web of Food
Gene Wiki and Wikimedia Foundation SPARQL workshop
Opportunities and challenges presented by Wikidata in the context of biocuration
Scripps bioinformatics seminar_day_2
Wikidata workshop for ISB Biocuration 2016
Channeling Collaborative Spirit
2016 bd2k bgood_wikidata
2016 mem good
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
Gene Wiki and Mark2Cure update for BD2K
2015 6 bd2k_biobranch_knowbio
(Bio)Hackathons
Citizen sciencepanel2015 pdf
Building a massive biomedical knowledge graph with citizen science
Branch: An interactive, web-based tool for building decision tree classifiers
Serious games for bioinformatics education. ISMB 2014 education workshop

Recently uploaded (20)

PDF
Social preventive and pharmacy. Pdf
PPTX
A powerpoint on colorectal cancer with brief background
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
02_OpenStax_Chemistry_Slides_20180406 copy.pptx
PPTX
Substance Disorders- part different drugs change body
PPTX
AP CHEM 1.2 Mass spectroscopy of elements
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PDF
Packaging materials of fruits and vegetables
PDF
2019UpdateAHAASAAISGuidelineSlideDeckrevisedADL12919.pdf
PPTX
Platelet disorders - thrombocytopenia.pptx
PDF
Cosmology using numerical relativity - what hapenned before big bang?
PDF
ECG Practice from Passmedicine for MRCP Part 2 2024.pdf
PPTX
diabetes and its complications nephropathy neuropathy
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
endocrine - management of adrenal incidentaloma.pptx
PPTX
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
Social preventive and pharmacy. Pdf
A powerpoint on colorectal cancer with brief background
TORCH INFECTIONS in pregnancy with toxoplasma
Presentation1 INTRODUCTION TO ENZYMES.pptx
02_OpenStax_Chemistry_Slides_20180406 copy.pptx
Substance Disorders- part different drugs change body
AP CHEM 1.2 Mass spectroscopy of elements
Animal tissues, epithelial, muscle, connective, nervous tissue
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
Packaging materials of fruits and vegetables
2019UpdateAHAASAAISGuidelineSlideDeckrevisedADL12919.pdf
Platelet disorders - thrombocytopenia.pptx
Cosmology using numerical relativity - what hapenned before big bang?
ECG Practice from Passmedicine for MRCP Part 2 2024.pdf
diabetes and its complications nephropathy neuropathy
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
endocrine - management of adrenal incidentaloma.pptx
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx

Representing and reasoning with biological knowledge

  • 1. Reasoning with Biological Knowledge Benjamin Good Gene Ontology Consortium Lawrence Berkeley National Labs @bgood U.S. Semantic Technologies Symposium Series March 11-13, 2019 at Duke University in Durham, NC
  • 2. Biological Knowledge • There is a lot: PubMed grows by more than 1 million articles a year Human Gene
  • 3. Hard to answer important questions What are the functions of Fibronectin? My experiment suggested that 238 genes are doing something important. What are they doing? What processes are they enabling?
  • 4. Vital to Structure Biological Knowledge Knowledge Bases convert textual and or expert knowledge into a form suitable for question answering
  • 6. Gene Ontology (GO) Mission “The mission of the GO Consortium is to develop a comprehensive, computational model of biological systems, ranging from the molecular to the organism level, across the multiplicity of species in the tree of life.” http://geneontology.org
  • 7. The Gene Ontology Ontology - 45k terms - 106k edges Biological Process Cellular Component Molecular Function Gene s G1 G2 G3 G4 G5 G6 Gene “Annotations”
  • 8. GO Knowledge Base 2018 • More than 140,000 published papers referenced in … • More than 750,000 experimentally supported annotations ... • Used to infer more than 7,000,000 functional annotations for more than 3000 organisms. • >22,000 citations of original paper. GO Consortium. (2018) The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. gky1055
  • 10. GO evolution timeline 1998 - flatfile 2000 - SQL database 2006 - OBO format, OBO-Edit 2012 - Ontology editors start working in OWL 2017 - GO annotations done natively in OWL as “GO-CAMs”. OWL reasoning now used throughout. ● Most people involved are biologists by training and objective. ● Adoption of Semantic Web approaches purely driven by application need.
  • 11. From unconnected gene annotations to models of function NEDD4 Ubiquitin-protein ligase activity NEDD4 Nucleus NEDD4 Ubiquitin-dependent protein catabolic process GO-CAM NEDD4
  • 12. Causal Activity Models “GO-CAMs” OWL framework for developing models of gene function of arbitrary complexity
  • 13. Ontology of a GO-CAM Classes from GO (and other ontologies) are used to describe the types of nodes (OWL instances) in the model. Relations from the Relation Ontology are used to describe the relations linking the nodes. Evidence classes from the Evidence Code Ontology justify the asserted relationships
  • 14. OWL Representation <http://model.geneontology.org/R-HSA-112409/007> a <http://purl.obolibrary.org/obo/GO_0005634> , <http://www.w3.org/2002/07/owl#NamedIndividual> ; <http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0002-7757-3347" ; <http://purl.org/dc/elements/1.1/date> "2019-03-05" ; <http://purl.org/pav/providedBy> "https://reactome.org" ; <http://model.geneontology.org/R-HSA-112409/011> Ubiquitin-protein ligase activity Ligase activity Nucleus Cell ComponentClasse s Individuals <http://purl.obolibrary.org/obo/BFO_0000066> Occurs in Object Property Propertie s
  • 15. OWL 2: annotatable axioms ECO_0000313: imported information used in automatic assertion <http://geneontology.org/lego/evidence> <http://model.geneontology.org/R-HSA-112409/0088> ECO_000311: imported information ● All claims annotated with terms from the Evidence Code Ontology ● And provenance: contributor, date, source, provider <dc:contributor> "http://orcid.org/0000-0002-7757-3347" “2019-03-05” "https://reactome.org" <dc:date> <pav:providedBy>
  • 16. Everything is OWL Makes it possible to use general purpose OWL reasoners with thousands of axioms encoded in GO, RO, and other ontologies to : a. Find errors b. Infer relationships
  • 17. Uses of OWL Reasoning • Editorial interface • Very large and complex set of ontologies • Automated consistency checking helps keep distributed and diverse curation community using same basic OWL representation • Type inference reduces curator burden • Import pipelines • Quality control • Type inference increases automated integration • Export pipelines • Inferred relations facilitate query
  • 18. Consistency Checking • Prevent invalid inferences • Helps keep a distributed curation community using the same patterns • QC for import code Interface Turns red when OWL inconsistency is detected.
  • 20. OWL Inference example ?Y Protein Complex DNA-Directed Polymerase Activity RO: Enabled by?X DNA polymerase Complex ... OWL axioms GO-CAM model (OWL instances) RDF:type rdfs:subClassOf RDF:type RO: Capable of DNA Polymerase Activity RDF:type ?X type protein- containing complex ?X capable of ?Y ?Y type DNA polymerase activity ?X→ type DNA polymerase complex Arachne Reasoner https://github.com/balhoff/arachne
  • 21. Molecular Function Inference Connecting to the Gene Ontology automatically Showing imported data without any type assignment. Type (and thus connection to the ontology) is inferred automatically based on properties of the instance
  • 25. Reasoning challenges ● Many challenges rooted in the very large T-box ○ In 2017, no open-source OWL reasoner could provide fast enough inferences for the Noctua application. Jim Balhoff had to write Arachne for it to work ■ = very high barrier to entry. ○ Explanations ■ Challenging UX even when “just” OWL inferences. ● We reason in OWL, Query in RDF, share as CSV, JSON, Neo4J, GPAD, GAFF, … would be nice to have a unified platform for the kb. ● NLP, ML for knowledge acquisition, curator assist
  • 26. Acknowledgements A very long list of people that created and contributed to the Gene Ontology Consortium over the past 20 years Specifically: Chris Mungall, Paul Thomas, David Hill, Huaiyu Mi, Peter D’Eustachio, Kimberly Van Auken, Seth Carbon, Jim Balhoff, Laurent-Philippe Albou
  • 29. Building monsters GO:0086094 positive regulation of ryanodine-sensitive calcium-release channel activity by adrenergic receptor signaling pathway involved in positive regulation of cardiac muscle contraction GO:0010768 negative regulation of transcription from RNA polymerase II promoter in response to UV-induced DNA damage By Philippe Semeria