ONTO-ToolKit: enabling bio-ontology engineering via Galaxy Aravind Venkatesan,  ONTO-ToolKit: enabling bio-ontology engineering via Galaxy Aravind Venkatesan Systems Biology group, Department of Biology NTNU, Trondheim [email_address]
Overview Galaxy Ontology for Life Sciences ONTO-Toolkit Use  Cases Conclusion Future   Directions Acknowledgment References
Web application that allows flexible retrieval and analyses of the data. Integrated with other resources such the UCSC Genome browsers, BioMart. Galaxy environment aids biologists to manipulate, analyse and build workflows.  Is an open-source scalable framework for tool and data integration suitable for tool developers.
Tool pane – provides various functionality to handle data Data display area History pane –manipulate  uploaded data and build workflow Visit Galaxy!!   http:// galaxy.psu.edu /
Ontology for Life Sciences Ontologies aid in knowledge formalisation and machine interoperability The success of ontologies in the Life Sciences is marked by the wide spread use of Gene Ontology 1   (GO) Application ontologies such as the Cell Cycle Ontology 2 The OBO flat file format 3  (OBOF) and the Web Ontology Language 4   (OWL) have gained wide acceptance as knowledge representation languages.
ONTO-Toolkit Is a collection of tools to manage ontologies represented in the OBO file format within Galaxy environment The tools are wrappers for commonly used functions provided by  ONTO-PERL 5 ONTO-PERL was developed as part of the Semantic Systems Biology 6  (SSB) initiative ONTO-PERL (OBOF-centered PERL API) comprises of extensible set of (Object-oriented) PERL modules  These have an organised set of subroutines to deal with ontologies and is fully compatible with the current OBO specifications (ver. 1.2) The latest version (ver.1.22) of ONTO-PERL can be directly downloaded from CPAN,  http://search.cpan.org/dist/ONTO-PERL/  ONTO-PERL: An API supporting the development and analysis of bio-ontologies . Antezana E, Egana M, De Baets B, Kuiper M, Mironov V. Bioinformatics 2008; doi: 10.1093/bioinformatics/btn042
Examples of ONTO-PERL functionalities Scripts Functionality get_ancestor_terms.pl  Collects the ancestor terms (list of IDs) from a given term (existing ID) in the given OBO ontology. get_child_terms.pl Collects the child terms (list of term IDs and their names) from a given term (existing ID) in the given OBO ontology. get_descendent_terms.pl  Collects the descendent terms (list of IDs) from a given term (existing ID) in the given OBO ontology. get_subontology_from.pl  Extracts a subontology (in OBO format) of a given ontology having the given term ID as the root. get_intersection_ontology.pl Provides an intersection of the given ontologies (in OBO format) obo2owl.pl  OBO to OWL translator. obo2rdf.pl  OBO to RDF translator. obo_trimming.pl  This script trims a given branch of an OBO ontology.
ONTO-Toolkit - GALAXY Define arguments
ONTO-Toolkit - GALAXY
Use Cases To investigate similarities between given molecular functions Collecting all the upstream terms (ancestors) of two given molecular function terms and to identify common ancestors terms. Motivation: To demonstrate the functionality of ONTO-Toolkit in GALAXY To demonstrate the usefulness of ontology engineering in biological domain Use Case I : Chosen Ontology: Cell Cycle Ontology Chosen Terms: Term 1:  id: CCO:F0000004 name: trans-hexaprenyltranstransferase activity Term 2: id: CCO:F0000820 name: homogentisate 1,2-dioxygenase activity Term ID 1 Term ID 2
Use Case I  Uploading an obo ontology file – e.g.: cco_S_pombe
Conti… Molecular function Term ID: CCO:F0000004
This step is repeated for the second term - CCO:F0000820 List of ancestor terms for the given Molecular function Term 1 List of ancestor terms for Term 2
Common ancestor terms Gets the overlapping ancestor terms
Use Case II Identifying overlapping annotations for a given pair of distinct biological process terms Chosen Ontology: Cell Cycle Ontology Chosen Terms: Term 1 :  id: CCO:P0000005 name: cell cycle checkpoint Term 2 : id: CCO:P0000069 name: mitosis Term ID 1 Term ID 2
Use Case II Gets the sub-ontology for the given terms
Generated sub-ontology of  Term 1 : CCO:P0000005 Generated sub-ontology of  Term 2 : CCO:P0000069
Gets the intersection of the two sub-ontologies
Conclusion Use Case I – the results provides evidence  that the two molecular functions are unrelated as only the high level terms are shared by them. Use Case II – the results suggests the possibility  of an overlap between two distinct biological processes ONTO-Toolkit functionalities provides rich-ontology driven solutions within the Galaxy framework Future Directions Provide interface to perform SPARQL queries within Galaxy Provide visualisation module
Acknowledgment Dr. Erick Antezana, NTNU Dr. Vladimir Mironov, NTNU Dr. Martin Kuiper, NTNU References M. Ashburner, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25:25– 29, May 2000. The Cell Cycle Ontology,  http://www.semantic-systems-biology.org/cco The OBO Flat File Format Specification (ver.1.2),  http://www.geneontology.org/GO.format.obo-1_2.shtml OWL Web Ontology Language,  http://www.w3.org/TR/owl-semantics/ ONTO-PERL: An API supporting the development and analysis of bio-ontologies. Antezana E, Egana M, De Baets B, Kuiper M, Mironov V. Bioinformatics 2008; doi: 10.1093/bioinformatics/btn042  Semantic Systems Biology,  http://www.semantic-systems-biology.org/

Venkatesan bosc2010 onto-toolkit

  • 1.
    ONTO-ToolKit: enabling bio-ontologyengineering via Galaxy Aravind Venkatesan, ONTO-ToolKit: enabling bio-ontology engineering via Galaxy Aravind Venkatesan Systems Biology group, Department of Biology NTNU, Trondheim [email_address]
  • 2.
    Overview Galaxy Ontologyfor Life Sciences ONTO-Toolkit Use Cases Conclusion Future Directions Acknowledgment References
  • 3.
    Web application thatallows flexible retrieval and analyses of the data. Integrated with other resources such the UCSC Genome browsers, BioMart. Galaxy environment aids biologists to manipulate, analyse and build workflows. Is an open-source scalable framework for tool and data integration suitable for tool developers.
  • 4.
    Tool pane –provides various functionality to handle data Data display area History pane –manipulate uploaded data and build workflow Visit Galaxy!! http:// galaxy.psu.edu /
  • 5.
    Ontology for LifeSciences Ontologies aid in knowledge formalisation and machine interoperability The success of ontologies in the Life Sciences is marked by the wide spread use of Gene Ontology 1 (GO) Application ontologies such as the Cell Cycle Ontology 2 The OBO flat file format 3 (OBOF) and the Web Ontology Language 4 (OWL) have gained wide acceptance as knowledge representation languages.
  • 6.
    ONTO-Toolkit Is acollection of tools to manage ontologies represented in the OBO file format within Galaxy environment The tools are wrappers for commonly used functions provided by ONTO-PERL 5 ONTO-PERL was developed as part of the Semantic Systems Biology 6 (SSB) initiative ONTO-PERL (OBOF-centered PERL API) comprises of extensible set of (Object-oriented) PERL modules These have an organised set of subroutines to deal with ontologies and is fully compatible with the current OBO specifications (ver. 1.2) The latest version (ver.1.22) of ONTO-PERL can be directly downloaded from CPAN, http://search.cpan.org/dist/ONTO-PERL/ ONTO-PERL: An API supporting the development and analysis of bio-ontologies . Antezana E, Egana M, De Baets B, Kuiper M, Mironov V. Bioinformatics 2008; doi: 10.1093/bioinformatics/btn042
  • 7.
    Examples of ONTO-PERLfunctionalities Scripts Functionality get_ancestor_terms.pl Collects the ancestor terms (list of IDs) from a given term (existing ID) in the given OBO ontology. get_child_terms.pl Collects the child terms (list of term IDs and their names) from a given term (existing ID) in the given OBO ontology. get_descendent_terms.pl Collects the descendent terms (list of IDs) from a given term (existing ID) in the given OBO ontology. get_subontology_from.pl Extracts a subontology (in OBO format) of a given ontology having the given term ID as the root. get_intersection_ontology.pl Provides an intersection of the given ontologies (in OBO format) obo2owl.pl OBO to OWL translator. obo2rdf.pl OBO to RDF translator. obo_trimming.pl This script trims a given branch of an OBO ontology.
  • 8.
    ONTO-Toolkit - GALAXYDefine arguments
  • 9.
  • 10.
    Use Cases Toinvestigate similarities between given molecular functions Collecting all the upstream terms (ancestors) of two given molecular function terms and to identify common ancestors terms. Motivation: To demonstrate the functionality of ONTO-Toolkit in GALAXY To demonstrate the usefulness of ontology engineering in biological domain Use Case I : Chosen Ontology: Cell Cycle Ontology Chosen Terms: Term 1: id: CCO:F0000004 name: trans-hexaprenyltranstransferase activity Term 2: id: CCO:F0000820 name: homogentisate 1,2-dioxygenase activity Term ID 1 Term ID 2
  • 11.
    Use Case I Uploading an obo ontology file – e.g.: cco_S_pombe
  • 12.
    Conti… Molecular functionTerm ID: CCO:F0000004
  • 13.
    This step isrepeated for the second term - CCO:F0000820 List of ancestor terms for the given Molecular function Term 1 List of ancestor terms for Term 2
  • 14.
    Common ancestor termsGets the overlapping ancestor terms
  • 15.
    Use Case IIIdentifying overlapping annotations for a given pair of distinct biological process terms Chosen Ontology: Cell Cycle Ontology Chosen Terms: Term 1 : id: CCO:P0000005 name: cell cycle checkpoint Term 2 : id: CCO:P0000069 name: mitosis Term ID 1 Term ID 2
  • 16.
    Use Case IIGets the sub-ontology for the given terms
  • 17.
    Generated sub-ontology of Term 1 : CCO:P0000005 Generated sub-ontology of Term 2 : CCO:P0000069
  • 18.
    Gets the intersectionof the two sub-ontologies
  • 19.
    Conclusion Use CaseI – the results provides evidence that the two molecular functions are unrelated as only the high level terms are shared by them. Use Case II – the results suggests the possibility of an overlap between two distinct biological processes ONTO-Toolkit functionalities provides rich-ontology driven solutions within the Galaxy framework Future Directions Provide interface to perform SPARQL queries within Galaxy Provide visualisation module
  • 20.
    Acknowledgment Dr. ErickAntezana, NTNU Dr. Vladimir Mironov, NTNU Dr. Martin Kuiper, NTNU References M. Ashburner, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25:25– 29, May 2000. The Cell Cycle Ontology, http://www.semantic-systems-biology.org/cco The OBO Flat File Format Specification (ver.1.2), http://www.geneontology.org/GO.format.obo-1_2.shtml OWL Web Ontology Language, http://www.w3.org/TR/owl-semantics/ ONTO-PERL: An API supporting the development and analysis of bio-ontologies. Antezana E, Egana M, De Baets B, Kuiper M, Mironov V. Bioinformatics 2008; doi: 10.1093/bioinformatics/btn042 Semantic Systems Biology, http://www.semantic-systems-biology.org/

Editor's Notes

  • #4 Explains the motivation of Galaxy – services it provides
  • #5 Explains the basic features of Galaxy
  • #13 The step is repeated for the second OBO term