Toward Meaningful Whole-Genome
   Interpretation with Open Access Tools
   From the Genome Commons
   BioIT World Expo
   2010-04-22

   Reece Hart, Ph.D.
   Chief Scientist, Genome Commons
   QB3 / Center for Computational Biology
   UC Berkeley
   reece@berkeley.edu
                                            1
2010-04-22 11:43
What did we learn from their genomes?




             Not much.
                                        2
Can we agree to disagree? Probably not.




    Heart Attack Risk Prediction
    from Experimental Man, DE Duncan
    Gene                   Marker  Risk Allele   Genotype   Risk   Company
    CELSR2/PSEC1          rs599839     G           AG       0.86   deCodeMe
    CDKN2A/CDKN2B? rs10116277          T           GT          1   deCodeMe
    CDKN2A/CDKN2B? rs1333049           C           CC       1.72   Navigenics
    MTHFD1L              rs6922269     A           AA       1.53   Navigenics
    CDKN2A/CDKN2B? rs2383207           G           GG       1.22    23andme
                                                                                3
Trouble for direct-to-consumer testing.




                      http://blog.navigenics.com/articles/comments/an_open_letter_to_nature/   4
There's lots of good news, too.

➢   Disease diagnosis & prognosis

➢   Drug dosing and side effects

➢   Disease variant/gene identification

➢   Technological advances




                                           5
The Genome Commons seeks to build
open access, open source tools that
maximize the predictive, preventative,
and personalized value of genomic data.

  ●   Technical – organize date and streamline
       tools
  ●   Scientific – improve predictive accuracy
  ●   Clinical – engage clinicians and counselors
  ●   ELSI – address ineluctable ethical, legal, and
       social dilemmas

                                                       6
Collect data
in one place.


                7
Databases isolation impedes effective use.

   Data are studied, compiled, and stored gene-wise.
   That makes sense for collection, but not for genome-wide use.



                                                                        OMIM
                                                                                        GeneTests/
                                                                                       GeneReviews
                        935 genes
                                                                        LSDBs
1177 Locus-Specific Databases                                                            NHGRI GWAS
Source: http://www.hgvs.org/dblist/glsdb.html on Oct 15.
Some genes have multiple LSDBs.



                                                                                          PharmGKB

                                                           Literature
                                                           Literature
                                                                               dbSNP


                                                                                                      8
GCdb will be a repository of variants and traits.
                 OMIM from
                 dbSNP
     dbSNP                    Genome Commons                          GO

                                  Database
     LSDBs
             ⋮               variants
                                               pheno-
                                                types                ICD-10
 GeneTests


 PharmGKB         Automated bulk            Curated, high-quality,   UMLS
                  loading of                and traceable
                  structured data           association data


 ➢   Genotypes in standard              ➢   Up-to-date
     coordinates                        ➢   Quality-controlled
 ➢   Phenotype ontologies               ➢   Open access
 ➢   Asociations with                   ➢   Based on Unison
     likelihood, confidence,
     evidence, and severity
                                                                              9
Make genomic data
usable and useful.


                     10
The Navigator will integrate data and tools.


                     Infer variants in LD      Align variants to             Identify variants with         Facile user interfaces for basic research,
                     with typed markers        specified genome              known phenotypic impact        clinical application, drug development,
                                                                                                            epidemiology, and other uses.


                              Genome Commons Navigator
                                                                                   V
 Genotypes (e.g.,                                                                  a
                                            Imputer            Remapper                      Annotator
 by hybridization)                                                                 r
                                                                                                                        Variant
                                                                                   i
                                                                                                                      Annotation
                                                                                   a
                                                                                                                      Integrator
  Whole Genome/                         Assembler/                 Variant         n          Impact
Exome Sequences                          Aligner                   Caller          t         Predictor
                                                                                   s




                     Assemble genome                         Phased, aligned variants,      Infer effect of unclassified         Integrate and reconcile all
                     sequence and call variants              from genotyping,               genetic variants                     classified variants into a
                     (separately or jointly)                 imputation, or sequencing                                           comprehensive report




                                                                     External Data and Tools


                              Genome Commons Database



                                                                                                                                                               11
Improve variant
impact predictions.


                      12
CAGI – Critical Assessment of Genome Interpretation
A community assessment of the state-of-the-art in phenotype prediction.


➢   Follow the successful CASP framework
     ●   Solicit unpublished data
     ●   Collect blind predictions from participants
     ●   Assess against revealed annotations,
          mechanisms, and phenotypes

➢   Prediction Domains:
    Molecular phenotype    Cellular phenotype            Organismal phenotype
            A                      A                              A
                T                      T                              T




                                                With John Moult & Steven Brenner   13
MTHFR and Methylation
                                   exogenous
                                   folate              fol3




                                     met13




5,10-Methylene tetrahydrofolate (TH4) is required for the synthesis of nucleic acids, while 5-methyl TH4
is required for the formation of methionine from homocysteine. Methionine, in the form of S-
adenosylmethionine, is required for many biological methylation reactions, including DNA methylation.
Methylene TH4 reductase is a flavin-dependent enzyme required to catalyze the reduction of 5,10-
methylene TH4 to 5-methyl TH4.
                                                                               Linus Pauling Institute
                                                                               http://lpi.oregonstate.edu 14
Sequencing 18 Genes of Folate Pathway
                   Guthrie-Spot Sequencing Protocol

➢   250 NTD children and 250 case matched
    controls

➢   Protocol
    ●   2mm punch
    ●   Isolate genomic DNA
    ●   Amplification
    ●   Purification
    ●   Sequencing by JGI

➢   Variant calls of 238 exons in 18 genes
    ●   Analysis
    ●   Curate
    ●   QC
                                                      Jasper Rine 15
MTHFR variants exhibit 3 classes of effects.
                             S. cerevisiae growth with MTHFR knock-in mutants

                                                Severely Impaired                         Folate Remedial                                       No Effect
                                                  e.g., R134C                           e.g,. M110I, D223N                                     e.g., R519C
                                  0.6                                             0.7                                        0.6
                                                                                                            M110I                                        MTHFR
                                  0.5                                             0.6                                        0.5
                                                       MTHFR                      0.5
                  50 µg/ml




                                  0.4                                                                                        0.4
                             OD




                                                                                  0.4
                                  600                                             600                                        600
                                  0.3                                                                         D223N          0.3
                                  OD                                              OD
                                                                                  0.3                                        OD
                                                                    R134C                       MTHFR                                                    R519C
       [FOLINIC ACID]




                                  0.2                                                                                        0.2
                                                                                  0.2
                                  0.1                                             0.1                                        0.1
                                                                 met13
                                    0                                               0                                              0
                                            0    6                                      0   6                                          0   6
                                                     12 18 24 30 36 42 48 54 60                 12 18 24 30 36 42 48 54 60                     12 18 24 30 36 42 48 54 60
                                                           HOURS                                      HOURS                                          HOURS
                                  0.7                                             0.7                                        0.7
                                  0.6                                             0.6                                        0.6
25 µg/ml




                                  0.5                                             0.5                                        0.5
                             OD




                                  0.4                                             0.4                                        0.4
                                  600                                             600                                        600
                                  OD
                                  0.3                                             OD
                                                                                  0.3                                        OD
                                                                                                                             0.3

                                  0.2                                             0.2                                        0.2

                                  0.1                                             0.1                                        0.1

                                        0                                           0                                              0
                                            0    6                                      0   6                                          0   6
                                                     12 18 24 30 36 42 48 54 60                 12 18 24 30 36 42 48 54 60                     12 18 24 30 36 42 48 54 60
                                                           HOURS                                      HOURS                                          HOURS


                                                                                                     Time
                                                                                                                                                          Jasper Rine 16
Step 1: Collect predictions.




mutation     Team 1       Team 2
M110I        No Effect    Remediable
R134C        Impaired     Remediable
D223N        Remediable
R519C        No Effect    No Effect




                                          17
Step 2: Assess predictions.




mutation    Team 1        Team 2       Experiment
M110I      No Effect
                         Remediable   Remediable
R134C      Impaired     Remediable   Impaired
D223N      Remediable                 Remediable
R519C      No Effect     Effect
                          No           No Effect




                                                    18
Step 3: Celebrate and learn.
               It's not whether you win or lose...




mutation     Team 1             Team 2               Experiment
M110I       No Effect
                               Remediable           Remediable
R134C       Impaired          Remediable           Impaired
D223N       Remediable                              Remediable
R519C       No Effect        Effect
                              No                     No Effect




                                                                  19
Be clinically relevant.


                          20
Sequencing identifies clinically important associations.


                                   Concurrence among cases




              databases
              Intersection among




                                                             21
Do it
ethically.


             22
A few ineluctable ethical issues.

➢   How to fairly acknowledge aggregated
    data?
➢   Should scientifically suggestive results be
    used for clinical care?
➢   What is the balance between openness and
    preventing misinterpretation?
➢   What happens to confidentiality
    agreements during bankruptcy?
➢   How do we balance personal privacy with
    opportunities for public health advances?


                                             Bernard Lo 23
The Genome Commons




Jasper Rine    Steven Brenner   Bernie Lo   Robert Nussbaum




                                                              24
Nature. 2007 Mar 13;452(7184):151. 25

More Related Content

PDF
GeneArt® services - Gene synthesis through protein production
PDF
Multi-scale network biology model & the model library
PDF
High throughput genotyping
PPTX
The annotation of plant proteins in UniProtKB
 
PPTX
Detection of genomic homology in eukaryotic genomes
PPTX
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
PDF
Opella l3
PDF
IntOGen, Integrative Oncogenomics for Personal Cancer Genomes
GeneArt® services - Gene synthesis through protein production
Multi-scale network biology model & the model library
High throughput genotyping
The annotation of plant proteins in UniProtKB
 
Detection of genomic homology in eukaryotic genomes
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Opella l3
IntOGen, Integrative Oncogenomics for Personal Cancer Genomes

What's hot (20)

PDF
Poster64: QTL mapping of resitance to Thips palmi Karny in common bean (Phase...
PPTX
BRED and Butters Mountaintop Biology poster 2013 36x46(1)
PDF
Friend WIN Symposium 2012-06-28
PDF
Specificity Assessment At Santaris Pharma
PPT
Marker assisted whole genome selection in crop improvement
PDF
Friend Oslo 2012-09-09
PDF
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
PDF
Neuromics Presentation V4
PDF
Stephen Friend ICR UK 2012-06-18
PDF
PRODUCTION OF SEROTYPE 6-DERIVED RECOMBINANT ADENO-ASSOCIATED VIRUS IN SERUM-...
PDF
Smith,Jacob,MVB_Poster
PPTX
Proposal for student
PDF
H gh power resources
PDF
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
PDF
Mouse Genomes Project + RNA-Editing
PPTX
Marker Assisted Selection in Crop Breeding
PDF
2011 course on Molecular Diagnostic Automation - Part 3 - Detection
PDF
Stephen Friend Nature Genetics Colloquium 2012-03-24
PPTX
Marker assisted selection (2)
Poster64: QTL mapping of resitance to Thips palmi Karny in common bean (Phase...
BRED and Butters Mountaintop Biology poster 2013 36x46(1)
Friend WIN Symposium 2012-06-28
Specificity Assessment At Santaris Pharma
Marker assisted whole genome selection in crop improvement
Friend Oslo 2012-09-09
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Neuromics Presentation V4
Stephen Friend ICR UK 2012-06-18
PRODUCTION OF SEROTYPE 6-DERIVED RECOMBINANT ADENO-ASSOCIATED VIRUS IN SERUM-...
Smith,Jacob,MVB_Poster
Proposal for student
H gh power resources
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Mouse Genomes Project + RNA-Editing
Marker Assisted Selection in Crop Breeding
2011 course on Molecular Diagnostic Automation - Part 3 - Detection
Stephen Friend Nature Genetics Colloquium 2012-03-24
Marker assisted selection (2)
Ad

Similar to Bio-IT 2010 Genome Commons (20)

PPT
Trends in Annotation of Genomic Data
PDF
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
PPT
Experimentos de nubes científicas: Medical Genome Project
PPTX
Complete Human Genome Sequencing
PDF
Whole Genome Analysis
PPTX
Ismb2012_poster_cwu
PPTX
Fundamentals of Analysis of Exomes
PDF
Introduction to NGS
PPTX
Biocuration2012 Eugeni Belda
PPTX
Cool Informatics Tools and Services for Biomedical Research
PPTX
Church gmod2012 pt1
PDF
RNA-seq Analysis
PDF
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
PDF
Enabling Biobank-Scale Genomic Processing with Spark SQL
PPTX
Lecture5,6
PDF
Stephen Friend Cytoscape Retreat 2011-05-20
PPTX
CS Lecture 2017 04-11 from Data to Precision Medicine
PDF
Unison: Enabling easy, rapid, and comprehensive proteomic mining
PPTX
Church gmod2012 pt2
PPTX
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
Trends in Annotation of Genomic Data
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Experimentos de nubes científicas: Medical Genome Project
Complete Human Genome Sequencing
Whole Genome Analysis
Ismb2012_poster_cwu
Fundamentals of Analysis of Exomes
Introduction to NGS
Biocuration2012 Eugeni Belda
Cool Informatics Tools and Services for Biomedical Research
Church gmod2012 pt1
RNA-seq Analysis
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
Enabling Biobank-Scale Genomic Processing with Spark SQL
Lecture5,6
Stephen Friend Cytoscape Retreat 2011-05-20
CS Lecture 2017 04-11 from Data to Precision Medicine
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Church gmod2012 pt2
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
Ad

More from Reece Hart (13)

PDF
HGVS 2015 poster: hgvs, uta, variantanalyzer
PDF
Clinical significance of transcript alignment discrepancies gne - 20141016
PDF
The Clinical Significance of Transcript Alignment Discrepancies
PDF
Invitae PSB 2014 poster
PDF
AWS Life Sciences
PDF
ASHG 2012 Poster
PDF
Building a clinical genome interpretation services company
PDF
HVP Critical Assessment of Genome Interpretation
PDF
Introduction to and Applications of Unison, an Open Source Database for Targe...
PDF
A Tour of Research Computing at Genentech
PDF
Integrating Public and Private Data: Lessons Learned from Unison
PDF
Unison: An Integrated Platform for Computational Biology Discovery
PDF
Mining for Novel TNF Ligands
HGVS 2015 poster: hgvs, uta, variantanalyzer
Clinical significance of transcript alignment discrepancies gne - 20141016
The Clinical Significance of Transcript Alignment Discrepancies
Invitae PSB 2014 poster
AWS Life Sciences
ASHG 2012 Poster
Building a clinical genome interpretation services company
HVP Critical Assessment of Genome Interpretation
Introduction to and Applications of Unison, an Open Source Database for Targe...
A Tour of Research Computing at Genentech
Integrating Public and Private Data: Lessons Learned from Unison
Unison: An Integrated Platform for Computational Biology Discovery
Mining for Novel TNF Ligands

Bio-IT 2010 Genome Commons

  • 1. Toward Meaningful Whole-Genome Interpretation with Open Access Tools From the Genome Commons BioIT World Expo 2010-04-22 Reece Hart, Ph.D. Chief Scientist, Genome Commons QB3 / Center for Computational Biology UC Berkeley [email protected] 1 2010-04-22 11:43
  • 2. What did we learn from their genomes? Not much. 2
  • 3. Can we agree to disagree? Probably not. Heart Attack Risk Prediction from Experimental Man, DE Duncan Gene Marker Risk Allele Genotype Risk Company CELSR2/PSEC1 rs599839 G AG 0.86 deCodeMe CDKN2A/CDKN2B? rs10116277 T GT 1 deCodeMe CDKN2A/CDKN2B? rs1333049 C CC 1.72 Navigenics MTHFD1L rs6922269 A AA 1.53 Navigenics CDKN2A/CDKN2B? rs2383207 G GG 1.22 23andme 3
  • 4. Trouble for direct-to-consumer testing. http://blog.navigenics.com/articles/comments/an_open_letter_to_nature/ 4
  • 5. There's lots of good news, too. ➢ Disease diagnosis & prognosis ➢ Drug dosing and side effects ➢ Disease variant/gene identification ➢ Technological advances 5
  • 6. The Genome Commons seeks to build open access, open source tools that maximize the predictive, preventative, and personalized value of genomic data. ● Technical – organize date and streamline tools ● Scientific – improve predictive accuracy ● Clinical – engage clinicians and counselors ● ELSI – address ineluctable ethical, legal, and social dilemmas 6
  • 8. Databases isolation impedes effective use. Data are studied, compiled, and stored gene-wise. That makes sense for collection, but not for genome-wide use. OMIM GeneTests/ GeneReviews 935 genes  LSDBs 1177 Locus-Specific Databases NHGRI GWAS Source: http://www.hgvs.org/dblist/glsdb.html on Oct 15. Some genes have multiple LSDBs. PharmGKB Literature Literature dbSNP 8
  • 9. GCdb will be a repository of variants and traits. OMIM from dbSNP dbSNP Genome Commons GO Database LSDBs ⋮ variants pheno- types ICD-10 GeneTests PharmGKB Automated bulk Curated, high-quality, UMLS loading of and traceable structured data association data ➢ Genotypes in standard ➢ Up-to-date coordinates ➢ Quality-controlled ➢ Phenotype ontologies ➢ Open access ➢ Asociations with ➢ Based on Unison likelihood, confidence, evidence, and severity 9
  • 10. Make genomic data usable and useful. 10
  • 11. The Navigator will integrate data and tools. Infer variants in LD Align variants to Identify variants with Facile user interfaces for basic research, with typed markers specified genome known phenotypic impact clinical application, drug development, epidemiology, and other uses. Genome Commons Navigator V Genotypes (e.g., a Imputer Remapper Annotator by hybridization) r Variant i Annotation a Integrator Whole Genome/ Assembler/ Variant n Impact Exome Sequences Aligner Caller t Predictor s Assemble genome Phased, aligned variants, Infer effect of unclassified Integrate and reconcile all sequence and call variants from genotyping, genetic variants classified variants into a (separately or jointly) imputation, or sequencing comprehensive report External Data and Tools Genome Commons Database 11
  • 13. CAGI – Critical Assessment of Genome Interpretation A community assessment of the state-of-the-art in phenotype prediction. ➢ Follow the successful CASP framework ● Solicit unpublished data ● Collect blind predictions from participants ● Assess against revealed annotations, mechanisms, and phenotypes ➢ Prediction Domains: Molecular phenotype Cellular phenotype Organismal phenotype A A A T T T With John Moult & Steven Brenner 13
  • 14. MTHFR and Methylation exogenous folate fol3 met13 5,10-Methylene tetrahydrofolate (TH4) is required for the synthesis of nucleic acids, while 5-methyl TH4 is required for the formation of methionine from homocysteine. Methionine, in the form of S- adenosylmethionine, is required for many biological methylation reactions, including DNA methylation. Methylene TH4 reductase is a flavin-dependent enzyme required to catalyze the reduction of 5,10- methylene TH4 to 5-methyl TH4. Linus Pauling Institute http://lpi.oregonstate.edu 14
  • 15. Sequencing 18 Genes of Folate Pathway Guthrie-Spot Sequencing Protocol ➢ 250 NTD children and 250 case matched controls ➢ Protocol ● 2mm punch ● Isolate genomic DNA ● Amplification ● Purification ● Sequencing by JGI ➢ Variant calls of 238 exons in 18 genes ● Analysis ● Curate ● QC Jasper Rine 15
  • 16. MTHFR variants exhibit 3 classes of effects. S. cerevisiae growth with MTHFR knock-in mutants Severely Impaired Folate Remedial No Effect e.g., R134C e.g,. M110I, D223N e.g., R519C 0.6 0.7 0.6 M110I MTHFR 0.5 0.6 0.5 MTHFR 0.5 50 µg/ml 0.4 0.4 OD 0.4 600 600 600 0.3 D223N 0.3 OD OD 0.3 OD R134C MTHFR R519C [FOLINIC ACID] 0.2 0.2 0.2 0.1 0.1 0.1 met13 0 0 0 0 6 0 6 0 6 12 18 24 30 36 42 48 54 60 12 18 24 30 36 42 48 54 60 12 18 24 30 36 42 48 54 60 HOURS HOURS HOURS 0.7 0.7 0.7 0.6 0.6 0.6 25 µg/ml 0.5 0.5 0.5 OD 0.4 0.4 0.4 600 600 600 OD 0.3 OD 0.3 OD 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 0 6 0 6 0 6 12 18 24 30 36 42 48 54 60 12 18 24 30 36 42 48 54 60 12 18 24 30 36 42 48 54 60 HOURS HOURS HOURS Time Jasper Rine 16
  • 17. Step 1: Collect predictions. mutation Team 1 Team 2 M110I No Effect Remediable R134C Impaired Remediable D223N Remediable R519C No Effect No Effect 17
  • 18. Step 2: Assess predictions. mutation Team 1 Team 2 Experiment M110I No Effect Remediable Remediable R134C Impaired Remediable Impaired D223N Remediable Remediable R519C No Effect  Effect No No Effect 18
  • 19. Step 3: Celebrate and learn. It's not whether you win or lose... mutation Team 1 Team 2 Experiment M110I  No Effect  Remediable Remediable R134C  Impaired  Remediable Impaired D223N  Remediable Remediable R519C  No Effect  Effect No No Effect 19
  • 21. Sequencing identifies clinically important associations. Concurrence among cases databases Intersection among 21
  • 23. A few ineluctable ethical issues. ➢ How to fairly acknowledge aggregated data? ➢ Should scientifically suggestive results be used for clinical care? ➢ What is the balance between openness and preventing misinterpretation? ➢ What happens to confidentiality agreements during bankruptcy? ➢ How do we balance personal privacy with opportunities for public health advances? Bernard Lo 23
  • 24. The Genome Commons Jasper Rine Steven Brenner Bernie Lo Robert Nussbaum 24
  • 25. Nature. 2007 Mar 13;452(7184):151. 25