SlideShare a Scribd company logo
Marcus C. Chibucos, Ph.D.OntologyEvidenceAnnotationArabidopsis thaliana ATPaseHMA4 zinc binding domainGO:0006829 : zinc ion transport (BP)GO:0005886 : plasma membrane (CC)GO:0005515 : protein binding (MF)Gene Annotation And Ontology
Outline of this talk2Background: the language of biology
Gene Ontology: overview, terms & structure
Annotating with GO and Evidence
Using annotation to facilitate your researchAbout screenshots in this talk3AmiGO web-based ontology browserhttp://amigo.geneontology.orgOBO-Edit stand-alone editorhttp://oboedit.org
What is annotation? Who is involved?Term confusion (what’s in a name?)Scale: the sea of dataControlled vocabularies & ontologiesThe Gene Ontology ConsortiumBackground: the language of biology4
Annotation5annotate – to make or furnish critical or explanatory notes or comment.			(Merriam-Webster dictionary)genome annotation – the process of taking the raw DNA sequence produced by the genome-sequencing projects and adding the layers of analysis and interpretation necessary to extract its biological significance and place it into the context of our understanding of biological processes. 			(Lincoln Stein, PMID 11433356)Gene Ontology annotation – the process of assigning GO terms to gene products… according to two general principles: first, annotations should be attributed to a source; second, each annotation should indicate the evidence on which it is based. 			(http://www.geneontology.org)
Diverse parties involved6End-users, including various researchersSmall-scale laboratory projectsWhole genome sequencing projectsAnnotatorsFrom reading papers to computational analysis Ontology developersCreate terms that reflect scientific knowledgeMake interoperable ontologies, database linksDevelopers of tools & resourcesStandards for storing & sharing dataWeb interfaces for data analysis & sharingMany areas of expertiseLaboratory sciences – biology, chemistry, medicine, and many other disciplinesComputational science – bioinformatics, genomics, statisticsSoftware development & web designPhilosophy – ontology & logic
Term confusion: synonyms7Do biologists use precise & consistent language?Mutually understood concepts – DNA, RNA, or proteinSynonym (one thing known by more than one name) – translation and protein synthesisEnzyme Commission reactionsStandardized id, official name & alternative nameshttp://www.expasy.ch/enzyme/2.7.1.40
Term confusion: homonyms8Homonyms common in biology – different things known by the same nameSporulationVascular (plant vasculature, i.e. xylem & phloem, or vascular smooth muscle, i.e. blood vessels?)Endospore formation Bacillus anthracis“Sporulation”Reproductive sporulationAsci & ascospores, Morchellaelata(morel)http://www.microbelibrary.org/ASMOnly/details.asp?id=1426&Lang=©L Stauffer 2003 (accessed 17-Sep-09)http://en.wikipedia.org/wiki/File:Morelasci.jpg©PG Warner 2008 (accessed 17-Sep-09)
Term confusion: homonyms and biological complexity 9AmiGO query “vascular”  51 termsIn biology, many related phenomena are described with similar terminology
The problem of scale10Small data sets, small experiments & isolated scientific communities?
Enormous data sets
Microarray experiments
Whole genome sequencing projects
Comparative genomics of multiple diverse taxa
Computers don’t understand nuance
Millions of proteins to annotate
How to effectively search?
How to draw meaningful comparisons?http://en.wikipedia.org/wiki/File:Microarray2.gif(accessed 17-Sep-09)
The Gene Ontology (GO)11Way to address the problems of synonyms, homonyms, biological complexity, increasing glut of dataGO provides a common biological language for protein functional annotationwww.geneontology.org
Controlled vocabulary (CV)12An official list of precisely defined terms that can be used to classify information and facilitate its retrievalThink of flat list like a thesaurus or catalog Benefits of CVsAllow standardized descriptions of thingsRemedy synonym & homonym issuesCan be cross-referenced externallyFacilitate electronic searchingA CV can be “…used to index and retrieve a body of literature in a bibliographic, factual, or other database. An example is the MeSH controlled vocabulary used in MEDLINE and other MEDLARS databases of the NLM.”http://www.nlm.nih.gov/nichsr/hta101/ta101014.html
Ontology is a type of CV with defined relationships13Ontology – formalizes knowledge of a subject with precise textual definitionsNetworked terms where child more specific (“granular”) than parentLess specificGO terms describe biological attributes of gene products…More granular
How GO works14GO Consortium develops & maintains:Ontologies and cross-links between ontologies and different resourcesTools to develop and use the ontologiesSourceForge tracker for developmentPeople studying organisms at databases annotate gene products with GO termsGroups share files of annotation data about their respective organismsBecause a common language was used to describe gene products and this information was shared amongst databases…We can search uniformly across databasesDo comparative genomics of diverse taxa
GO on SourceForgesourceforge.net/projects/geneontology15
The Gene Ontology Consortium16Collaboration began 1998 among model organism databases mouse (MGI), fruit fly (FlyBase) and baker’s yeast (SGD)Michael Ashburner of FlyBase contributed the base vocabularyToday > 20 members & associatesFirst publication 2000 (PMID 10802651)Today, PubMed query “gene ontology” yields 3,347 papers (27-Jun-2011)Organisms represented by GO annotations from every kingdom of lifeMany groups use GO in many different ways for their researchAmong eight OBO-Foundry ontologiesZFINReactomeIGS
OBO Foundry ontologieswww.obofoundry.org17Collaboration among developers of science-based ontologiesEstablish principles for ontology developmentGoal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain.many others…
What the GO is notGO comprises three ontologiesAnatomy & storage of GO termsOntology structureDetail of a term in AmiGOTrue path ruleGene Ontology:overview, terms & structure18
Caveats – what GO is not19Not gene naming system or gene catalogGO describes attributes of biological objects – “oxidoreductase activity” not “cytochromec”The three ontologies have limitationsNo sequence attributes or structural featuresNo characteristics unique to mutants or diseaseNo environment, evolution or expressionNo anatomy features above cellular componentNot dictated standard or federated solutionDatabases share annotations as they see fitCurators evaluate differentlyGO is evolving as our knowledge evolvesNew terms added on daily basisIncorrect/poorly defined terms made obsoleteSecondary ids – terms with same meaning merged
GO comprises three ontologies20Cellular component ontology (CC) “cytoplasm”Molecular function ontology (MF)“protein binding”“peptidase activity”“cysteine-type endopeptidase activity”Biological process ontology (BP)“proteolysis”“apoptosis”Terms describe attributes of gene products (GPs)Any protein or RNA encoded by a geneSpecies-independent context, e.g. “ribosome”Could describe GPs found in limited taxa, e.g. “photosynthesis” or “lactation”One GP can be associated with ≥ 1 CC, BP, MFExample: Caspase-6 from Bostaurus
Cellular component ontology21Describes location at level of subcellular structure & macromolecular complexGP subcomponent of or located in particular cellular component, with some exceptions:No individual proteins or nucleic acidsNo multicellular anatomical termsFor annotation purposes, a GP can be associated with or located in ≥ one cellular componentMulti-subunit enzyme or protein complex
ribosome
proteasome
ubiquitinligase complex
Anatomical structure
rough endoplasmic reticulum
nucleus
nuclear inner membraneMolecular function ontology22Describe gene product activity at molecular levelDescribes attributes of entitiesAdenylate cyclase (E.C. 4.6.1.1)Catalyzes a specific reaction:ATP = 3',5'-cyclic AMP + diphosphateDescribed by the Gene Ontology term:“adenylate cyclase activity” (GO:0004016)http://www.ebi.ac.uk/pdbsum/1ab8[accessed 4-Feb-2010]Usually single GP, sometimes a complex
“ferritin receptor activity”
Definition: “combining with ferritin, an iron-storing protein complex, to initiate a change in cell activity”
Broad functions
“catalytic activity”
“transporter activity”
“binding”
Specific functions
“adenylatecyclase activity”
“protein-DNA complex transmembrane transporter activity”
“Fc-gamma receptor I complex binding”Biological process ontology23Describes recognized series of events or molecular functions with a defined beginning and end“GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway” (from GO documentation)Mutant phenotypes often reflect disruptions in BPSpecific process
“pyrimidine metabolism”
“α-glucosidase transportGeneral considerationsThe Cell CycleThe Development NodeMulti-Organism ProcessMetabolismRegulationDetection of and Response to StimuliSensory PerceptionSignaling PathwaysTransport and LocalizationTransporter activity (molecular function)Other Misc. Standard DefsBroad process
“cellular physiological process”
“signal transduction”http://www.geneontology.org/GO.process.guidelines.shtml
Anatomy of a GO term24Term namegoid (unique numerical identifier)Synonyms (broad or narrow) for searching, alternative names, misspellings… Precise textual definition with reference stating  sourceGO slimOntology placement
Storage and cross referencing of GO terms25Storage in flat file (text)
Database cross reference for mappings to GO
GO term identical to object in other databaseOntology structure:parent-child relationship26Parent term (broader)Child term (specialized)hexose metabolismmonosaccharide biosynthesishexose biosynthesisUp in the tree is more general; down in the tree is more specific:
Annotation of genes
Start with terms denoting broad functional categories
Use more specific term as knowledge warrantsOntology structure:terms arranged in DAGs27GO terms structured as hierarchical-like directed acyclic graphs (DAGs)Tree-like, but each term can have more than one parent (pseudo-hierarchy)Each term may have one or more child terms (“siblings” share same parent)parentschild termparentchild terms“siblings”
GO has three term relationships28is_a - child is instance of parent (“A is_a B”)Class-subclass relationshippart_of - child part of parent (“C part_of D”)When C present, part of D; but C not always presentNucleus always part_of cell; not all cells have nucleiregulatesChild term regulates parent term(Zoomed in view of biological process ontology depicted here.)
AmiGO for viewing terms29Open source HTML-based application developed by the GO ConsortiumInterface for browsing, querying and visualizing OBO dataUsers can search GO terms or annotationsAvailable via website or download for local installhttp://amigo.geneontology.orgExample query withkeyword “hemolysis” or goid GO:0019836GO:0019836
AmiGO search results30Click
Term information in AmiGO31Webpage continues…
AmiGO view continued32Several informative viewsClickNumber of gene products in GO annotation collection annotated to that term or one of its child termsRelationship between term and its parentOur term is much further down…
Graph view33Alternative view of network of termsA term with two parents34amine groupcarboxylic acid groupgeneric amino acidName:  amino acid transmembrane transporter activity
ID number:  GO:0015171
Definition:  Catalysis of the transfer of amino acids from one side of a membrane to the other. Amino acids are organic molecules that contain an amino group and a carboxyl group. [source: GOC:ai, GOC:mtg_transport, ISBN:0815340729]
parent term:  amine transmembrane transporter activity (GO:0005275)
relationship to parent:  “is_a”
parent term:  carboxylic acid transmembrane transporter activity (GO:0046943)

More Related Content

PDF
Ontologies for life sciences: examples from the gene ontology
Melanie Courtot
 
PPTX
Mungall keynote-biocurator-2017
Chris Mungall
 
PPTX
Collaboratively Creating the Knowledge Graph of Life
Chris Mungall
 
PPT
Use of data
Chris Evelo
 
PPT
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
Leighton Pritchard
 
PPTX
Representation of kidney structures in Uberon
Chris Mungall
 
PPTX
Using biological network approaches for dynamic extension of micronutrient re...
Chris Evelo
 
PPTX
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
AIMS (Agricultural Information Management Standards)
 
Ontologies for life sciences: examples from the gene ontology
Melanie Courtot
 
Mungall keynote-biocurator-2017
Chris Mungall
 
Collaboratively Creating the Knowledge Graph of Life
Chris Mungall
 
Use of data
Chris Evelo
 
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
Leighton Pritchard
 
Representation of kidney structures in Uberon
Chris Mungall
 
Using biological network approaches for dynamic extension of micronutrient re...
Chris Evelo
 
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
AIMS (Agricultural Information Management Standards)
 

What's hot (20)

PDF
Ontology-based data access and semantic mining with Aber-OWL
Robert Hoehndorf
 
PPT
UniProt-GOA
EBI
 
PPTX
Analysis with biological pathways:
Chris Evelo
 
PPTX
WikiPathways: how open source and open data can make omics technology more us...
Chris Evelo
 
PPT
Dynamic Semantic Metadata in Biomedical Communications
Tim Clark
 
PPTX
Using ontologies to do integrative systems biology
Chris Evelo
 
PPT
Biological databases
Prasanthperceptron
 
PDF
BM405 Lecture Slides 21/11/2014 University of Strathclyde
Leighton Pritchard
 
PPT
Biological databases
Sarfaraz Nasri
 
PPT
Proteome databases
Prasanthperceptron
 
PDF
Opening up pharmacological space, the OPEN PHACTs api
Chris Evelo
 
PPTX
Causal reasoning using the Relation Ontology
Chris Mungall
 
PPT
Computational approaches to cell cycle analysis: Data and databases
Lars Juhl Jensen
 
PPT
Introduction to Ontologies for Environmental Biology
Barry Smith
 
PPTX
Biological database
Iqbal college Peringammala TVM
 
PPTX
Phenopackets as applied to variant interpretation
mhaendel
 
PDF
100505 koenig biological_databases
Meetika Gupta
 
PPTX
Light Intro to the Gene Ontology
nniiicc
 
PPT
OBO Foundry
Chris Mungall
 
PPTX
NCBI Boot Camp for Beginners Slides
Jackie Wirz, PhD
 
Ontology-based data access and semantic mining with Aber-OWL
Robert Hoehndorf
 
UniProt-GOA
EBI
 
Analysis with biological pathways:
Chris Evelo
 
WikiPathways: how open source and open data can make omics technology more us...
Chris Evelo
 
Dynamic Semantic Metadata in Biomedical Communications
Tim Clark
 
Using ontologies to do integrative systems biology
Chris Evelo
 
Biological databases
Prasanthperceptron
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
Leighton Pritchard
 
Biological databases
Sarfaraz Nasri
 
Proteome databases
Prasanthperceptron
 
Opening up pharmacological space, the OPEN PHACTs api
Chris Evelo
 
Causal reasoning using the Relation Ontology
Chris Mungall
 
Computational approaches to cell cycle analysis: Data and databases
Lars Juhl Jensen
 
Introduction to Ontologies for Environmental Biology
Barry Smith
 
Biological database
Iqbal college Peringammala TVM
 
Phenopackets as applied to variant interpretation
mhaendel
 
100505 koenig biological_databases
Meetika Gupta
 
Light Intro to the Gene Ontology
nniiicc
 
OBO Foundry
Chris Mungall
 
NCBI Boot Camp for Beginners Slides
Jackie Wirz, PhD
 
Ad

Viewers also liked (9)

PPT
Spar конференция в норвегии
andrey123
 
PDF
Tutorial hotpotatoes
Rosalinda Díaz Castillo
 
PPT
Bbcg 9september2011
andrey123
 
PDF
Spar middle volga dusseldorf 2011
andrey123
 
PPT
путешествие по россии или развитие Spar
andrey123
 
PPT
Tyler presentation
Sucheta Tripathy
 
PPT
10th nov2010
Sucheta Tripathy
 
PPT
актуальный тренд российского ритейла[1]. союз независимых ритейлеров
andrey123
 
PPT
Tyler impact next gen fri 0900
Sucheta Tripathy
 
Spar конференция в норвегии
andrey123
 
Tutorial hotpotatoes
Rosalinda Díaz Castillo
 
Bbcg 9september2011
andrey123
 
Spar middle volga dusseldorf 2011
andrey123
 
путешествие по россии или развитие Spar
andrey123
 
Tyler presentation
Sucheta Tripathy
 
10th nov2010
Sucheta Tripathy
 
актуальный тренд российского ритейла[1]. союз независимых ритейлеров
andrey123
 
Tyler impact next gen fri 0900
Sucheta Tripathy
 
Ad

Similar to Chibucos annot go_final (20)

PPT
Basic Formal Ontology (BFO) and Disease
Barry Smith
 
PPT
Gene Ontology Project
vaibhavdeoda
 
PDF
bioinformatics enabling knowledge generation from agricultural omics data
International Institute of Tropical Agriculture
 
PPTX
The Gene Ontology & Gene Ontology Annotation resources
Melanie Courtot
 
PPTX
Computing on the shoulders of giants
Benjamin Good
 
PDF
Cross Product Extensions to the Gene Ontology
Chris Mungall
 
PPTX
Bioinf4 genedisco
Diego Mauricio Riano-Pachon
 
PDF
GoTermsAnalysisWithR
Aureliano Bombarely
 
PDF
BITS: Overview of important biological databases beyond sequences
BITS
 
PPTX
Why Life is Difficult, and What We MIght Do About It
Anita de Waard
 
PPTX
Ontologies: Necessary, but not sufficient
robertstevens65
 
PPT
Building and Using Ontologies to do biology
robertstevens65
 
PPT
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...
Barry Smith
 
PPTX
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Benjamin Good
 
PPTX
Ewan Birney Biocuration 2013
Iddo
 
PPTX
Gene Ontology WormBase Workshop International Worm Meeting 2015
raymond91105
 
PDF
Sabina Leonelli
Anita de Waard
 
PPT
Communications
somasushma
 
PPT
RML NCBI Resources
Jackie Wirz, PhD
 
Basic Formal Ontology (BFO) and Disease
Barry Smith
 
Gene Ontology Project
vaibhavdeoda
 
bioinformatics enabling knowledge generation from agricultural omics data
International Institute of Tropical Agriculture
 
The Gene Ontology & Gene Ontology Annotation resources
Melanie Courtot
 
Computing on the shoulders of giants
Benjamin Good
 
Cross Product Extensions to the Gene Ontology
Chris Mungall
 
Bioinf4 genedisco
Diego Mauricio Riano-Pachon
 
GoTermsAnalysisWithR
Aureliano Bombarely
 
BITS: Overview of important biological databases beyond sequences
BITS
 
Why Life is Difficult, and What We MIght Do About It
Anita de Waard
 
Ontologies: Necessary, but not sufficient
robertstevens65
 
Building and Using Ontologies to do biology
robertstevens65
 
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...
Barry Smith
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Benjamin Good
 
Ewan Birney Biocuration 2013
Iddo
 
Gene Ontology WormBase Workshop International Worm Meeting 2015
raymond91105
 
Sabina Leonelli
Anita de Waard
 
Communications
somasushma
 
RML NCBI Resources
Jackie Wirz, PhD
 

More from Sucheta Tripathy (20)

PPTX
Ramorum2016 final
Sucheta Tripathy
 
PPTX
Primer designgeneprediction
Sucheta Tripathy
 
PPTX
Motif andpatterndatabase
Sucheta Tripathy
 
PPTX
Databases ii
Sucheta Tripathy
 
PPTX
Snps and microarray
Sucheta Tripathy
 
PPTX
Stat2013
Sucheta Tripathy
 
PPTX
26 nov2013seminar
Sucheta Tripathy
 
PPTX
Stat2013
Sucheta Tripathy
 
PPTX
Presentation2013
Sucheta Tripathy
 
PPTX
Lecture7,8
Sucheta Tripathy
 
PPTX
Lecture5,6
Sucheta Tripathy
 
PPTX
Primer designgeneprediction
Sucheta Tripathy
 
PPTX
Lecture 3,4
Sucheta Tripathy
 
PPTX
Lecture 1,2
Sucheta Tripathy
 
PPTX
Sequence Alignment,Blast, Fasta, MSA
Sucheta Tripathy
 
PPTX
Databases Part II
Sucheta Tripathy
 
PPTX
Biological databases
Sucheta Tripathy
 
PPTX
Genome sequencingprojects
Sucheta Tripathy
 
PPTX
Human encodeproject
Sucheta Tripathy
 
Ramorum2016 final
Sucheta Tripathy
 
Primer designgeneprediction
Sucheta Tripathy
 
Motif andpatterndatabase
Sucheta Tripathy
 
Databases ii
Sucheta Tripathy
 
Snps and microarray
Sucheta Tripathy
 
26 nov2013seminar
Sucheta Tripathy
 
Presentation2013
Sucheta Tripathy
 
Lecture7,8
Sucheta Tripathy
 
Lecture5,6
Sucheta Tripathy
 
Primer designgeneprediction
Sucheta Tripathy
 
Lecture 3,4
Sucheta Tripathy
 
Lecture 1,2
Sucheta Tripathy
 
Sequence Alignment,Blast, Fasta, MSA
Sucheta Tripathy
 
Databases Part II
Sucheta Tripathy
 
Biological databases
Sucheta Tripathy
 
Genome sequencingprojects
Sucheta Tripathy
 
Human encodeproject
Sucheta Tripathy
 

Recently uploaded (20)

PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PPTX
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 

Chibucos annot go_final

  • 1. Marcus C. Chibucos, Ph.D.OntologyEvidenceAnnotationArabidopsis thaliana ATPaseHMA4 zinc binding domainGO:0006829 : zinc ion transport (BP)GO:0005886 : plasma membrane (CC)GO:0005515 : protein binding (MF)Gene Annotation And Ontology
  • 2. Outline of this talk2Background: the language of biology
  • 3. Gene Ontology: overview, terms & structure
  • 4. Annotating with GO and Evidence
  • 5. Using annotation to facilitate your researchAbout screenshots in this talk3AmiGO web-based ontology browserhttp://amigo.geneontology.orgOBO-Edit stand-alone editorhttp://oboedit.org
  • 6. What is annotation? Who is involved?Term confusion (what’s in a name?)Scale: the sea of dataControlled vocabularies & ontologiesThe Gene Ontology ConsortiumBackground: the language of biology4
  • 7. Annotation5annotate – to make or furnish critical or explanatory notes or comment. (Merriam-Webster dictionary)genome annotation – the process of taking the raw DNA sequence produced by the genome-sequencing projects and adding the layers of analysis and interpretation necessary to extract its biological significance and place it into the context of our understanding of biological processes. (Lincoln Stein, PMID 11433356)Gene Ontology annotation – the process of assigning GO terms to gene products… according to two general principles: first, annotations should be attributed to a source; second, each annotation should indicate the evidence on which it is based. (http://www.geneontology.org)
  • 8. Diverse parties involved6End-users, including various researchersSmall-scale laboratory projectsWhole genome sequencing projectsAnnotatorsFrom reading papers to computational analysis Ontology developersCreate terms that reflect scientific knowledgeMake interoperable ontologies, database linksDevelopers of tools & resourcesStandards for storing & sharing dataWeb interfaces for data analysis & sharingMany areas of expertiseLaboratory sciences – biology, chemistry, medicine, and many other disciplinesComputational science – bioinformatics, genomics, statisticsSoftware development & web designPhilosophy – ontology & logic
  • 9. Term confusion: synonyms7Do biologists use precise & consistent language?Mutually understood concepts – DNA, RNA, or proteinSynonym (one thing known by more than one name) – translation and protein synthesisEnzyme Commission reactionsStandardized id, official name & alternative nameshttp://www.expasy.ch/enzyme/2.7.1.40
  • 10. Term confusion: homonyms8Homonyms common in biology – different things known by the same nameSporulationVascular (plant vasculature, i.e. xylem & phloem, or vascular smooth muscle, i.e. blood vessels?)Endospore formation Bacillus anthracis“Sporulation”Reproductive sporulationAsci & ascospores, Morchellaelata(morel)http://www.microbelibrary.org/ASMOnly/details.asp?id=1426&Lang=©L Stauffer 2003 (accessed 17-Sep-09)http://en.wikipedia.org/wiki/File:Morelasci.jpg©PG Warner 2008 (accessed 17-Sep-09)
  • 11. Term confusion: homonyms and biological complexity 9AmiGO query “vascular”  51 termsIn biology, many related phenomena are described with similar terminology
  • 12. The problem of scale10Small data sets, small experiments & isolated scientific communities?
  • 16. Comparative genomics of multiple diverse taxa
  • 18. Millions of proteins to annotate
  • 20. How to draw meaningful comparisons?http://en.wikipedia.org/wiki/File:Microarray2.gif(accessed 17-Sep-09)
  • 21. The Gene Ontology (GO)11Way to address the problems of synonyms, homonyms, biological complexity, increasing glut of dataGO provides a common biological language for protein functional annotationwww.geneontology.org
  • 22. Controlled vocabulary (CV)12An official list of precisely defined terms that can be used to classify information and facilitate its retrievalThink of flat list like a thesaurus or catalog Benefits of CVsAllow standardized descriptions of thingsRemedy synonym & homonym issuesCan be cross-referenced externallyFacilitate electronic searchingA CV can be “…used to index and retrieve a body of literature in a bibliographic, factual, or other database. An example is the MeSH controlled vocabulary used in MEDLINE and other MEDLARS databases of the NLM.”http://www.nlm.nih.gov/nichsr/hta101/ta101014.html
  • 23. Ontology is a type of CV with defined relationships13Ontology – formalizes knowledge of a subject with precise textual definitionsNetworked terms where child more specific (“granular”) than parentLess specificGO terms describe biological attributes of gene products…More granular
  • 24. How GO works14GO Consortium develops & maintains:Ontologies and cross-links between ontologies and different resourcesTools to develop and use the ontologiesSourceForge tracker for developmentPeople studying organisms at databases annotate gene products with GO termsGroups share files of annotation data about their respective organismsBecause a common language was used to describe gene products and this information was shared amongst databases…We can search uniformly across databasesDo comparative genomics of diverse taxa
  • 26. The Gene Ontology Consortium16Collaboration began 1998 among model organism databases mouse (MGI), fruit fly (FlyBase) and baker’s yeast (SGD)Michael Ashburner of FlyBase contributed the base vocabularyToday > 20 members & associatesFirst publication 2000 (PMID 10802651)Today, PubMed query “gene ontology” yields 3,347 papers (27-Jun-2011)Organisms represented by GO annotations from every kingdom of lifeMany groups use GO in many different ways for their researchAmong eight OBO-Foundry ontologiesZFINReactomeIGS
  • 27. OBO Foundry ontologieswww.obofoundry.org17Collaboration among developers of science-based ontologiesEstablish principles for ontology developmentGoal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain.many others…
  • 28. What the GO is notGO comprises three ontologiesAnatomy & storage of GO termsOntology structureDetail of a term in AmiGOTrue path ruleGene Ontology:overview, terms & structure18
  • 29. Caveats – what GO is not19Not gene naming system or gene catalogGO describes attributes of biological objects – “oxidoreductase activity” not “cytochromec”The three ontologies have limitationsNo sequence attributes or structural featuresNo characteristics unique to mutants or diseaseNo environment, evolution or expressionNo anatomy features above cellular componentNot dictated standard or federated solutionDatabases share annotations as they see fitCurators evaluate differentlyGO is evolving as our knowledge evolvesNew terms added on daily basisIncorrect/poorly defined terms made obsoleteSecondary ids – terms with same meaning merged
  • 30. GO comprises three ontologies20Cellular component ontology (CC) “cytoplasm”Molecular function ontology (MF)“protein binding”“peptidase activity”“cysteine-type endopeptidase activity”Biological process ontology (BP)“proteolysis”“apoptosis”Terms describe attributes of gene products (GPs)Any protein or RNA encoded by a geneSpecies-independent context, e.g. “ribosome”Could describe GPs found in limited taxa, e.g. “photosynthesis” or “lactation”One GP can be associated with ≥ 1 CC, BP, MFExample: Caspase-6 from Bostaurus
  • 31. Cellular component ontology21Describes location at level of subcellular structure & macromolecular complexGP subcomponent of or located in particular cellular component, with some exceptions:No individual proteins or nucleic acidsNo multicellular anatomical termsFor annotation purposes, a GP can be associated with or located in ≥ one cellular componentMulti-subunit enzyme or protein complex
  • 38. nuclear inner membraneMolecular function ontology22Describe gene product activity at molecular levelDescribes attributes of entitiesAdenylate cyclase (E.C. 4.6.1.1)Catalyzes a specific reaction:ATP = 3',5'-cyclic AMP + diphosphateDescribed by the Gene Ontology term:“adenylate cyclase activity” (GO:0004016)http://www.ebi.ac.uk/pdbsum/1ab8[accessed 4-Feb-2010]Usually single GP, sometimes a complex
  • 40. Definition: “combining with ferritin, an iron-storing protein complex, to initiate a change in cell activity”
  • 47. “protein-DNA complex transmembrane transporter activity”
  • 48. “Fc-gamma receptor I complex binding”Biological process ontology23Describes recognized series of events or molecular functions with a defined beginning and end“GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway” (from GO documentation)Mutant phenotypes often reflect disruptions in BPSpecific process
  • 50. “α-glucosidase transportGeneral considerationsThe Cell CycleThe Development NodeMulti-Organism ProcessMetabolismRegulationDetection of and Response to StimuliSensory PerceptionSignaling PathwaysTransport and LocalizationTransporter activity (molecular function)Other Misc. Standard DefsBroad process
  • 53. Anatomy of a GO term24Term namegoid (unique numerical identifier)Synonyms (broad or narrow) for searching, alternative names, misspellings… Precise textual definition with reference stating sourceGO slimOntology placement
  • 54. Storage and cross referencing of GO terms25Storage in flat file (text)
  • 55. Database cross reference for mappings to GO
  • 56. GO term identical to object in other databaseOntology structure:parent-child relationship26Parent term (broader)Child term (specialized)hexose metabolismmonosaccharide biosynthesishexose biosynthesisUp in the tree is more general; down in the tree is more specific:
  • 58. Start with terms denoting broad functional categories
  • 59. Use more specific term as knowledge warrantsOntology structure:terms arranged in DAGs27GO terms structured as hierarchical-like directed acyclic graphs (DAGs)Tree-like, but each term can have more than one parent (pseudo-hierarchy)Each term may have one or more child terms (“siblings” share same parent)parentschild termparentchild terms“siblings”
  • 60. GO has three term relationships28is_a - child is instance of parent (“A is_a B”)Class-subclass relationshippart_of - child part of parent (“C part_of D”)When C present, part of D; but C not always presentNucleus always part_of cell; not all cells have nucleiregulatesChild term regulates parent term(Zoomed in view of biological process ontology depicted here.)
  • 61. AmiGO for viewing terms29Open source HTML-based application developed by the GO ConsortiumInterface for browsing, querying and visualizing OBO dataUsers can search GO terms or annotationsAvailable via website or download for local installhttp://amigo.geneontology.orgExample query withkeyword “hemolysis” or goid GO:0019836GO:0019836
  • 63. Term information in AmiGO31Webpage continues…
  • 64. AmiGO view continued32Several informative viewsClickNumber of gene products in GO annotation collection annotated to that term or one of its child termsRelationship between term and its parentOur term is much further down…
  • 65. Graph view33Alternative view of network of termsA term with two parents34amine groupcarboxylic acid groupgeneric amino acidName: amino acid transmembrane transporter activity
  • 66. ID number: GO:0015171
  • 67. Definition: Catalysis of the transfer of amino acids from one side of a membrane to the other. Amino acids are organic molecules that contain an amino group and a carboxyl group. [source: GOC:ai, GOC:mtg_transport, ISBN:0815340729]
  • 68. parent term: amine transmembrane transporter activity (GO:0005275)
  • 70. parent term: carboxylic acid transmembrane transporter activity (GO:0046943)
  • 71. relationship to parent: “is_a”Multiple paths to root:graphical view in OBO-Edit35
  • 72. “True path rule”36The pathway from a term all the way up to its top-level parent(s) must always be true for any gene product that could be annotated to that term (“if true for the child, then true for the parent”)Incorrect for Bacteriacell organellemitochondrion proton-transporting ATP synthase complexCorrect for Bacteria (and Eukaryotes)cell intracellular proton-transporting ATP synthase complex plasma membrane proton-transporting ATP synthase complex mitochondrial proton-transporting ATP synthase complex membrane plasma membrane plasma membrane proton-transporting ATP synthase complex organelle mitochondrion mitochondrial inner membrane mitochondrial proton-transporting ATP synthase complex(Abbreviated versions of the actualtrees)
  • 73. What is GO annotation?Literature curation at model organism databasesThe annotation fileEvidence – critical for annotationSequence similarity-based annotationAnnotation specificityAnnotating with GO and Evidence37
  • 74. GO annotation overview38Associating a GO term with a gene productGoal is to select GO terms from all three ontologies to represent what, where, and howLinking a GO term to a gene product asserts that it has that attributeFor example, 6-phosphofructokinaseMolecular functionGO:0003872 6-phosphofructokinase activityBiological processGO:0006096 glycolysisCellular componentGO:0005737 cytoplasmAnnotation, whether based on literature or computational methods, always involves:Learning something about a gene productSelecting an appropriate GO termProviding an appropriate evidence codeCiting a [preferably open access] referenceEntering information into GO annotation file
  • 75. Chaperone DnaK, one protein/multiple annotations39Molecular functionATP binding (GO:0005524)ATPase activity (GO:0016887)unfolded protein binding (GO:0051082)misfolded protein binding (GO:0051787)denatured protein binding (GO:0031249)Biological processprotein folding (GO:0006457)protein refolding (GO:0042026)protein stabilization (GO:0050821)response to stress (GO:0006950)Cellular componentcytoplasm (GO:0005737)
  • 76. Literature curation performed at model organism databases40From the abstract:
  • 77. Results section indicates a “direct assay” annotation41They document the findings of a direct assay performed on purified protein:They further document the methods used, and evaluate the findings in the Discussion section…
  • 78. Query AmiGO with “DNA ligase” & “DNA ligation”42All “ligation” in biological process ontology
  • 79. Resulting annotations43Name: DNA ligase (stated in paper)Gene symbol: ligA (stated in paper)EC: 6.5.1.2 (queried enzyme for “DNA ligase”)
  • 80. Gene annotation file captures annotations44Evidence
  • 81. Evidence45Essential to base annotation on evidenceConclusions more robust and traceableWith evidence, a GO annotation is standard operating procedure (SOP)-independentMany types of evidence existFor example, experiment described in literatureWhat method (e.g. direct assay, mutant phenotype, et cetera) was used?Did author cite references?Did author provide details of analyses?Perhaps you used a sequence-based methodWhat were the methods of manual curation?Give accession numbers of similar sequencesProvide any references describing methodsControlled vocabularies help here, too!
  • 82. GO standard references46GO_REF:0000011 A Hidden Markov Model (HMM) is a statistical representation of patterns found in a data set. When using HMMs with proteins, the HMM is a statistical model of the patterns of the amino acids found in a multiple alignment of a set of proteins called the "seed". Seed proteins are chosen based on sequence similarity to each other. Seed members can be chosen with different levels of relationship to each other... GO_REF:0000011 A Hidden Markov Model (HMM) is a statistical representation of patterns found in a data set. When using HMMs with proteins, the HMM is a statistical model of the patterns of the amino acids found in a multiple alignment of a set of proteins called the "seed". Seed proteins are chosen based on sequence similarity to each other. Seed members can be chosen with different levels of relationship to each other. They can be members of a superfamily (ex. ABC transporter, ATP-binding proteins), they can all share the same exact specific function (ex. biotin synthase) or they could share another type of relationship of intermediate specificity (ex. subfamily, domain). New proteins can be scored against the model generated from the seed according to how closely the patterns of amino acids in the new proteins match those in the seed. There are two scores assigned to the HMM which allow annotators to judge how well any new protein scores to the model. Proteins scoring above the "trusted cutoff" score can be assumed to be part of the group defined by the seed. Proteins scoring below the "noise cutoff" score can be assumed to NOT be a part of the group. Proteins scoring between the trusted and noise cutoffs may be part of the group but may not. One of the important features of HMMs is that they are built from a multiple alignment of protein sequences, not a pairwise alignment. This is significant, since shared similarity between many proteins is much more likely to indicate shared functional relationship than sequence similarity between just two proteins. The usefulness of an HMM is directly related to the amount of care that is taken in chosing the seed members, building a good multiple alignment of the seed members, assessing the level of specificity of the model, and choosing the cutoff scores correctly. In order to properly assess what functional relevance an above-trusted scoring HMM match has to a query, one must carefully determine what the functional scope of the HMM is. If the HMM models proteins that all share the same function then it is likely possible to assign a specific function to high-scoring match proteins based on the HMM. If the HMM models proteins that have a wide variety of functions, then it will not be possible to assign a specific function to the query based on the HMM match, however, depending on the nature of the HMM in question, it may be possible to assign a more general (family or subfamily level) function. In order to determine the functional scope of an HMM, one must carefully read the documentation associated with the HMM. The annotator must also consider whether the function attributed to the proteins in the HMM makes sense for the query based on what is known about the organism in which the query protein resides and in light of any other information that might be available about the query protein. After carefully considering all of these issues the annotator makes an annotation.
  • 83. GO evidence codeswww.geneontology.org/GO.evidence.shtml47EXP - inferred from experimentIDA - inferred from direct assayIEP inferred from expression patternIGI - inferred from genetic interactionIPI - inferred from physical interactionIMP - inferred from mutant phenotypeISS - inferred from sequence or structural similarityISA - inferred from sequence alignmentISO - inferred from sequence orthologyISM - inferred from sequence modelIGC - inferred from genomic contextND - no biological data availableIC - inferred by curatorTAS - traceable author statementNAS - non-traceable author statementIEA - inferred from electronic annotationGO codes are a subset of yet another ontology!
  • 84. Types of sequence similarity-based annotations48Find similarity between gene product & one that is experimentally characterizedBLAST-type alignmentsShared synteny to establish orthology of genomic regions between speciesFind similarity between gene product and defined protein familyHMMs (Pfam, TIGRFAMS)PrositeInterProFind motifs in gene product with prediction toolsTMHMM SignalPMany (most?) information you find is based on transitive annotation and much of it has never been looked at by a human being!
  • 85. Evaluation of sequence similarity-based information49Visually inspect alignments & criteriaLength & identityConservation of catalytic sitesCheck HMM scores with respect to cutoffLook at available metabolic analysisPathways, complexes?Information from neighboring genesGene in an operon (common prokaryotes) can supplement weak similarity evidenceSequence characteristicsTransmembraneregions?Signal peptide?Known motifs that give a clue to function?Paralogous family member
  • 86. An example: HI0678, a protein from H. influenzae…...high quality alignment to experimentally characterized triosephosphateisomerase from Vibrio marinus50
  • 87. Information from Swiss-Prot database on experimentally characterized match proteinfurther down the page51
  • 88. High quality…..…. full-length match, high percent identity (67.8%), conserved active and binding sites (boxed in red).52
  • 89. Resulting annotations53name:triosephosphateisomerasegene symbol:tpiAEC: 5.3.1.1(This, and the following annotations, came from the match protein.)
  • 90. KEGG pathway for glycolysis core54
  • 91. KEGG pathway for glycolysis core55
  • 93. And another annotation57The biologist knows that glycolysis takes place in the cytoplasm in bacteria, and so infers a cytoplasmic location for that protein (“inferred by curator” evidence code).
  • 94. Annotation specificity should reflect knowledge58GO trees (very abbreviated)Functioncatalytic activitykinase activity carbohydrate kinase activityribokinase activityglucokinase activityfructokinase activityProcess metabolism carbohydrate metabolism monosaccharide metabolismhexose metabolismglucose metabolism fructose metabolism pentose metabolism ribose metabolismAvailable evidence for three genes#1-good match to an HMM for “kinase”#2-good match to an HMM for “kinase”-a high-quality BER match to an experimentally characterized “glucokinase’ AND a ‘fructokinase’#3-good match to an HMM specific for “ribokinase”-a high-quality BER match to an experimentally characterized ribokinase#1#2#3#1#2#3
  • 95. Using shared annotationsSearch for GO terms at databasesSlims for broad classificationGO toolsWorking with GO-limited data setsSummaryUsing annotation to facilitate your research59
  • 96. Sharing annotations60Annotation file sent to GO, put in repositoryAll these data free to anyoneHundreds of thousands of GP annotationsAnnotation files all in same formatFacilitates easy use of data by everyoneMost of your favorite organism databases use these annotation files
  • 97. Searching for GO terms at EuPathDB61
  • 98. 62Ontology slimwww.geneontology.org/GO.slims.shtmlSlim is a distilled (reduced) ontology Made by manually pruning low-level terms with an ontology editorSelected high-level terms remainSlims reduce ontology complexityReduce clutter & see general trendsMicroarray experimentsComparative whole genome analysesRemove irrelevant termsLooking at specific taxa, such as yeast or plantGo offers script to bin more granular annotations up to higher levels
  • 99. Comparing genomes with a GO slim63High-levelbiological process terms used to compare Plasmodium and SaccharomycesMJ Gardner, et al. (2002) Nature 419:498-511
  • 100. GO slim: manual/orthology-based gene annotations64Nucleic Acids Res. 2010 January; 38(Database issue): D420–D427.
  • 101. GO toolswww.geneontology.org/GO.tools.shtml65The real challenge is finding the right one for your needsFor example, statistical representation of GO terms:http://go.princeton.edu/cgi-bin/GOTermFinder
  • 102. GO & analysis of RNA-seqdata66Young et al. Genome Biology 2010, 11:R14 http://genomebiology.com/2010/11/2/R14We present GOseq, an application for performing Gene Ontology (GO) analysis on RNA-seq data. GO analysis is widely used to reduce complexity and highlight biological processes in genome-wide expression studies, but standard methods give biased results on RNA-seq data due to over-detection of differential expression for long and highly expressed transcripts. Application of GOseq to a prostate cancer data set shows that GOseq dramatically changes the results, highlighting categories more consistent with the known biology.
  • 103. When GO is limited67Food for thought: what happens when we have limited GO (or other)annotation data?New and interesting genomes often see this problem
  • 104. Comparative analysis of orthologs in syntenic blocks68The more genomes we have at our disposal, the betterStructural rearrangements, absence of intron, gene duplication, intron structure, gene deletion/creationNucleic Acids Res. 2010 January; 38(Database issue): D420–D427.
  • 105. Summary GO analyses69GO remedies problems of synonyms & homonyms in biological nomenclatureQueries based on IDs linked to precise definitions, not less reliable text-matchingGO can help you to:Find all genes that share a particular function regardless of sequenceDo comparisons across any species annotated with GOSummarize major classes of genes in a newly sequenced genomeCharacterize expressed genes is a studyDrive hypotheses to test in the laboratoryGO is not a panacea but it should be a valuable tool in your genomics toolbox
  • 106. The title slide revisited…OntologyEvidenceAnnotationArabidopsis thaliana ATPaseHMA4 zinc binding domainGO:0006829 : zinc ion transport (BP)GO:0005886 : plasma membrane (CC)GO:0005515 : protein binding (MF)Thank you.

Editor's Notes

  • #41: In this report, we describe the cloning and expression of a Deinococcusradiodurans DNA ligase in Escherichia coli. This enzyme efficiently catalyses DNA ligation in the presence of Mn(II) and NAD+ as cofactors…