UNIVERSITY OF
CALIFORNIA

Web Apollo: lessons learned from
community-based biocuration efforts.
Monica Munoz-Torres, PhD
Biocurator & Bioinformatics Analyst | @monimunozto
Genomics Division, Lawrence Berkeley National Laboratory

31 October, 2013
Outline

1. What is Web Apollo?:
• Definition & working concept.

Lessons learned
from communitybased biocuration
efforts.

2. Community based curation from our
experience:
• Scope, stories to highlight.
3. Lessons Learned.

1. The power behind community-based
curation of biological data:
• International group efforts to
create, collect, maintain, and use
curation tools.

Outline

2
What is Web Apollo?
• Web Apollo is a web-based genomic annotation editing

platform.
We need annotation editing tools to modify and refine the
precise location and structure of the genome elements that
predictive algorithms cannot yet resolve automatically.

Find more about Web Apollo at
http://GenomeArchitect.org
and
Genome Biol 14:R93. (2013).

1. What is Web Apollo?

3
A little history about Apollo*:
Biologists could finally visualize computational analysis and
experimental evidence from genomic features and build
manually-curated consensus gene structures. Apollo became a
very popular, open source tool (insects, fish, mammals, birds, etc.).
a. Desktop:
one person at a time editing a
specific region, annotations
saved in local files; slowed down
collaboration.

*

b. Java Web Start:
users saved annotations directly
to a centralized database;
potential issues with stale
annotation data remained.
1. What is Web Apollo?

4
Web Apollo
• Browser-based; plugin for JBrowse.
• Allows for intuitive annotation creation and
editing, with gestures and pull-down menus to
create transcripts, add/delete/resize
exons, merge/split exons or transcripts, insert
comments
(CV, freeform text), etc.
• Edits in one client are instantly
pushed to all other clients.

1. What is Web Apollo?

5
Our Working
Concept

In the context of gene manual
annotation, curation tries to find the best
examples and/or eliminate (most) errors.
To conduct manual annotation efforts:

Automated gene models

Evidence:
cDNAs, HMM domain
searches, alignments with
assemblies or genes from other
species.
Manual annotation & curation

2. In our experience.

Gather and evaluate all available evidence
using quality-control metrics to
corroborate or modify automated
annotation predictions.
Perform sequence similarity searches
(phylogenetic framework) and use
literature and public databases to:
• Predict functional assignments from
experimental data.
• Distinguish orthologs from
paralogs, classify gene membership in
families and networks.
6
Dispersed, community-based gene
manual annotation efforts.
Using Web Apollo, we* trained
geographically dispersed scientific
communities to perform biologically
supported manual annotations, and
monitored their findings: ~80
institutions, 14 countries, hundreds of
scientists, and gate keepers.
Education and Training done through:
– Training workshops and geneborees.
– Tutorials with detailed instructions.
*Elsik Lab. Hymenoptera
Genome Database.
– Personalized user support.
Georgetown University.

2. In our experience.

7
What did we learn?
Harvesting expertise from dispersed researchers who
assigned functions to predicted and curated
peptides, we developed more interactive and
responsive tools, as well as better
visualization, editing, and analysis capabilities.
Assessment:
1. Was it helpful / productive to work together?
2. Were manual annotations improved?
3. Did the shared and distributed annotation effort help
improve the quality of scientific findings?
3. Lessons Learned

8
It was helpful to work together.
Scientific community efforts brought together domainspecific and natural history expertise that would have
otherwise remain disconnected.
Warning: community-wide periodic and frequent
updates, as well as coordination of organization and
report (publication) of findings are necessary.

3. Lessons Learned

9
Automated annotations were improved*
In many cases, automated annotations were improved.
Also, learned of the challenges with new sequencing
technologies, e.g.:
– Frameshifts and indel errors
– Split genes across scaffolds
– Highly repetitive sequences
To face these challenges, we trained annotators in
recovering coding sequences in agreement with all
available biological evidence.

3. Lessons Learned

10
Horizontal gene transfer from virus to
bacteria to insect.

Scientific
findings lead to
untold stories…

The genome of the jewel wasp (Nasonia
vitripennis) contains the highest reported
number of Ankyrin domains in insects.
Thirteen ANK repeat-bearing proteins also
contain C-terminal PRANC (Pox protein
repeats of ankyrin C-terminal) domains.
LGT between bacteria and animals: source of
evolutionary innovation.

Nasonia genome working group. 2010. Science, 327:343-347

3. Lessons Learned

11
Characterizing adaptive radiation.

… and lead to
uncovering old
stories…

Uncovering suspected candidates from 30
year old research, the Heliconius
melpomene community was able to
characterize the few Mendelian loci
controlling divergence in wing patterns.
Islands of genome divergence underlying
adaptive radiation were characterized.
A community effort developed a de novo
reference genome on a small budget.
Their findings were a milestone for
understanding the basics of ecological
adaptation.
Heliconius genome consortium. 2012. Nature doi:10.1038/nature11041.

3. Lessons Learned

12
Understanding the evolution of sociality.
Comparison of the genomes of 7 species of
ants contributed to a better understanding
… and groups of
of the evolution and organization of insect
societies at the molecular level.
communities told
Insights drawn mainly from six core aspects of
us even more!
ant biology:
1. Alternative morphological castes
2. Division of labor
3. Chemical Communication
4. Alternative social organization
5. Social immunity
6. Mutualism

Libbrecht et al. 2012. Genome Biology 2013, 14:212

3. Lessons Learned

13
More lessons learned
1. Next generation technologies brought many and
new challenges.
2. You must enforce strict rules and formats; it is
necessary to maintain consistency.
3. Be flexible and adaptable: study and incorporate
new data, and adapt to support new platforms to
keep pace and maintain the interest of scientific
community. Evolve with the data!
4. A little training goes a long way! With the right
tools, wet lab scientists make exceptional curators
who can easily learn to maximize the generation of
accurate, biologically supported gene models.
3. Lessons Learned

14
i5K
Collaborative efforts for genome sequencing projects
lead to the birth of i5K.
Small groups became large networks, which turned into
an international, multi-institutional effort to sequence
the genomes of 5 thousand arthropods.
Up next: four species ready for community manual
annotation using Web Apollo.

4. The power of biocuration

15
Gene Ontology Consortium
This community-based effort seeks to standardize the
representation of gene and gene product attributes
across species and databases.

Collaborative efforts are focused on:
1. Capturing all available information to describe gene
products (BP, CC, MF) in a species-independent
manner.

2. Reviewing and updating the relationships.
3. Continuously developing tools to facilitate
creation, maintenance and use of ontologies.
4. The power of biocuration

16
Integrating biocuration practices
Frequently, groups incorporate both bibliographical
curation and gene manual annotation efforts into their
work to improve the quality and informative power of
their data.

E.g. Dictybase, curating literature for slime
mold since 2004, serves interests in
cellular development, chemotaxis, cytokinesis
defects, etc. New to Web Apollo, ready to
harness their community’s knowledge.

4. The power of biocuration

17
International Society for Biocuration
ISB provides a forum for
biocurators, developers, researchers, and students
who are interested in:
• Using common tools (GO, Genotype & Phenotype
curation, GMOD tools, etc.).
• Generation of gold standards for databases, ensuring
that database and tools meet specific user needs.

• Defining the profession of biocuration with respect to
the scientific community and the granting agencies.
• Member of BioDBCore, GOBLET, BioCreative.
4. The power of biocuration

18
The power behind
community-based
curation of
biological data.

4. The power of biocuration

19
•

Berkeley Bioinformatics Open-source Projects
(BBOP), Berkeley Lab: Web Apollo and Gene
Ontology teams. Suzanna Lewis (PI).

•

The team at Hymenoptera Genome Database.
§U. of Missouri. Christine G. Elsik (PI).

•

Ian Holmes Lab (PI). *U. of California Berkeley.

•

Arthropod genomics community (e.g. Gene
Robinson, Juergen Gadau, Chris R Smith, Owen
McMillan, Owain Edwards, Kevin Hackett, and a
few hundred more) and i5K: Org. Committee, NAL
(USDA), HGSC-BCM, BGI, 1KITE.

Rob Buels *

•

International Society for Biocuration

Mitch Skinner *

•

Web Apollo is supported by NIH grants 5R01GM080203
from NIGMS, and 5R01HG004483 from NHGRI, and by
the Director, Office of Science, Office of Basic Energy
Sciences, of the U.S. Department of Energy under
Contract No. DE-AC02-05CH11231.

•

Images used with permission: AlexanderWild.com

Thanks!
BBOP
Web Apollo

Gene Ontology

Gregg Helt

Chris Mungall

Ed Lee

Seth Carbon

Justin Reese §

Heiko Dietze

Chris Childers §

Web Apollo: http://GenomeArchitect.org
GO: http://GeneOntology.org
i5K: http://arthropodgenomes.org/wiki/i5K
ISB: http://biocurator.org

Thank you.

• For your attention, thank you!
20

Web Apollo: Lessons learned from community-based biocuration efforts.

  • 1.
    UNIVERSITY OF CALIFORNIA Web Apollo:lessons learned from community-based biocuration efforts. Monica Munoz-Torres, PhD Biocurator & Bioinformatics Analyst | @monimunozto Genomics Division, Lawrence Berkeley National Laboratory 31 October, 2013
  • 2.
    Outline 1. What isWeb Apollo?: • Definition & working concept. Lessons learned from communitybased biocuration efforts. 2. Community based curation from our experience: • Scope, stories to highlight. 3. Lessons Learned. 1. The power behind community-based curation of biological data: • International group efforts to create, collect, maintain, and use curation tools. Outline 2
  • 3.
    What is WebApollo? • Web Apollo is a web-based genomic annotation editing platform. We need annotation editing tools to modify and refine the precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. Find more about Web Apollo at http://GenomeArchitect.org and Genome Biol 14:R93. (2013). 1. What is Web Apollo? 3
  • 4.
    A little historyabout Apollo*: Biologists could finally visualize computational analysis and experimental evidence from genomic features and build manually-curated consensus gene structures. Apollo became a very popular, open source tool (insects, fish, mammals, birds, etc.). a. Desktop: one person at a time editing a specific region, annotations saved in local files; slowed down collaboration. * b. Java Web Start: users saved annotations directly to a centralized database; potential issues with stale annotation data remained. 1. What is Web Apollo? 4
  • 5.
    Web Apollo • Browser-based;plugin for JBrowse. • Allows for intuitive annotation creation and editing, with gestures and pull-down menus to create transcripts, add/delete/resize exons, merge/split exons or transcripts, insert comments (CV, freeform text), etc. • Edits in one client are instantly pushed to all other clients. 1. What is Web Apollo? 5
  • 6.
    Our Working Concept In thecontext of gene manual annotation, curation tries to find the best examples and/or eliminate (most) errors. To conduct manual annotation efforts: Automated gene models Evidence: cDNAs, HMM domain searches, alignments with assemblies or genes from other species. Manual annotation & curation 2. In our experience. Gather and evaluate all available evidence using quality-control metrics to corroborate or modify automated annotation predictions. Perform sequence similarity searches (phylogenetic framework) and use literature and public databases to: • Predict functional assignments from experimental data. • Distinguish orthologs from paralogs, classify gene membership in families and networks. 6
  • 7.
    Dispersed, community-based gene manualannotation efforts. Using Web Apollo, we* trained geographically dispersed scientific communities to perform biologically supported manual annotations, and monitored their findings: ~80 institutions, 14 countries, hundreds of scientists, and gate keepers. Education and Training done through: – Training workshops and geneborees. – Tutorials with detailed instructions. *Elsik Lab. Hymenoptera Genome Database. – Personalized user support. Georgetown University. 2. In our experience. 7
  • 8.
    What did welearn? Harvesting expertise from dispersed researchers who assigned functions to predicted and curated peptides, we developed more interactive and responsive tools, as well as better visualization, editing, and analysis capabilities. Assessment: 1. Was it helpful / productive to work together? 2. Were manual annotations improved? 3. Did the shared and distributed annotation effort help improve the quality of scientific findings? 3. Lessons Learned 8
  • 9.
    It was helpfulto work together. Scientific community efforts brought together domainspecific and natural history expertise that would have otherwise remain disconnected. Warning: community-wide periodic and frequent updates, as well as coordination of organization and report (publication) of findings are necessary. 3. Lessons Learned 9
  • 10.
    Automated annotations wereimproved* In many cases, automated annotations were improved. Also, learned of the challenges with new sequencing technologies, e.g.: – Frameshifts and indel errors – Split genes across scaffolds – Highly repetitive sequences To face these challenges, we trained annotators in recovering coding sequences in agreement with all available biological evidence. 3. Lessons Learned 10
  • 11.
    Horizontal gene transferfrom virus to bacteria to insect. Scientific findings lead to untold stories… The genome of the jewel wasp (Nasonia vitripennis) contains the highest reported number of Ankyrin domains in insects. Thirteen ANK repeat-bearing proteins also contain C-terminal PRANC (Pox protein repeats of ankyrin C-terminal) domains. LGT between bacteria and animals: source of evolutionary innovation. Nasonia genome working group. 2010. Science, 327:343-347 3. Lessons Learned 11
  • 12.
    Characterizing adaptive radiation. …and lead to uncovering old stories… Uncovering suspected candidates from 30 year old research, the Heliconius melpomene community was able to characterize the few Mendelian loci controlling divergence in wing patterns. Islands of genome divergence underlying adaptive radiation were characterized. A community effort developed a de novo reference genome on a small budget. Their findings were a milestone for understanding the basics of ecological adaptation. Heliconius genome consortium. 2012. Nature doi:10.1038/nature11041. 3. Lessons Learned 12
  • 13.
    Understanding the evolutionof sociality. Comparison of the genomes of 7 species of ants contributed to a better understanding … and groups of of the evolution and organization of insect societies at the molecular level. communities told Insights drawn mainly from six core aspects of us even more! ant biology: 1. Alternative morphological castes 2. Division of labor 3. Chemical Communication 4. Alternative social organization 5. Social immunity 6. Mutualism Libbrecht et al. 2012. Genome Biology 2013, 14:212 3. Lessons Learned 13
  • 14.
    More lessons learned 1.Next generation technologies brought many and new challenges. 2. You must enforce strict rules and formats; it is necessary to maintain consistency. 3. Be flexible and adaptable: study and incorporate new data, and adapt to support new platforms to keep pace and maintain the interest of scientific community. Evolve with the data! 4. A little training goes a long way! With the right tools, wet lab scientists make exceptional curators who can easily learn to maximize the generation of accurate, biologically supported gene models. 3. Lessons Learned 14
  • 15.
    i5K Collaborative efforts forgenome sequencing projects lead to the birth of i5K. Small groups became large networks, which turned into an international, multi-institutional effort to sequence the genomes of 5 thousand arthropods. Up next: four species ready for community manual annotation using Web Apollo. 4. The power of biocuration 15
  • 16.
    Gene Ontology Consortium Thiscommunity-based effort seeks to standardize the representation of gene and gene product attributes across species and databases. Collaborative efforts are focused on: 1. Capturing all available information to describe gene products (BP, CC, MF) in a species-independent manner. 2. Reviewing and updating the relationships. 3. Continuously developing tools to facilitate creation, maintenance and use of ontologies. 4. The power of biocuration 16
  • 17.
    Integrating biocuration practices Frequently,groups incorporate both bibliographical curation and gene manual annotation efforts into their work to improve the quality and informative power of their data. E.g. Dictybase, curating literature for slime mold since 2004, serves interests in cellular development, chemotaxis, cytokinesis defects, etc. New to Web Apollo, ready to harness their community’s knowledge. 4. The power of biocuration 17
  • 18.
    International Society forBiocuration ISB provides a forum for biocurators, developers, researchers, and students who are interested in: • Using common tools (GO, Genotype & Phenotype curation, GMOD tools, etc.). • Generation of gold standards for databases, ensuring that database and tools meet specific user needs. • Defining the profession of biocuration with respect to the scientific community and the granting agencies. • Member of BioDBCore, GOBLET, BioCreative. 4. The power of biocuration 18
  • 19.
    The power behind community-based curationof biological data. 4. The power of biocuration 19
  • 20.
    • Berkeley Bioinformatics Open-sourceProjects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna Lewis (PI). • The team at Hymenoptera Genome Database. §U. of Missouri. Christine G. Elsik (PI). • Ian Holmes Lab (PI). *U. of California Berkeley. • Arthropod genomics community (e.g. Gene Robinson, Juergen Gadau, Chris R Smith, Owen McMillan, Owain Edwards, Kevin Hackett, and a few hundred more) and i5K: Org. Committee, NAL (USDA), HGSC-BCM, BGI, 1KITE. Rob Buels * • International Society for Biocuration Mitch Skinner * • Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. • Images used with permission: AlexanderWild.com Thanks! BBOP Web Apollo Gene Ontology Gregg Helt Chris Mungall Ed Lee Seth Carbon Justin Reese § Heiko Dietze Chris Childers § Web Apollo: http://GenomeArchitect.org GO: http://GeneOntology.org i5K: http://arthropodgenomes.org/wiki/i5K ISB: http://biocurator.org Thank you. • For your attention, thank you! 20