ASHG - GRC Workshop 
Tina Lindsay 
ASHG Oct 18, 2014
The Human Reference is Not Complete 
• Reference has been found to not be optimal in some 
regions 
• Structural variation makes it difficult to assemble a truly 
representative genome when using a diploid sample 
• Some regions were recalcitrant to closure with technology 
and resources available at the time 
• Additional sequences are needed to capture the full range 
of diversity in humans
UGT2B17 – Conflicting Alleles 
AC074378.4 
AC079749.5 
AC147055.2 
AC134921.2 
AC140484.1 
AC019173.4 
AC093720.2 
AC021146.7 
NCBI36 NC_000004.10 (chr4) Tiling Path 
TMPRSS11E TMPRSS11E2 
Xue Y et al, 2008 
GRCh37 NC_000004.11 (chr4) Tiling Path 
AC074378.4 
AC079749.5 
AC147055.2 
AC134921.1 
AC093720.2 
AC021146.7 
TMPRSS11E 
GRCh37: NT_167250.1 (UGT2B17 alternate locus) 
AC074378.4 
AC140484.1 
AC019173.4 
AC226496.2 
AC021146.7 
TMPRSS11E2 
G 
A 
P
Allelic Diversity vs. Segmental Duplication 
A 
A 
C 
T 
C 
G 
C 
C 
Repeat Copies (noted by color difference) 
Allelic 
Copies 
Diploid Genome 
With a diploid genome, there is significant ambiguity sorting allelic copies from repeat copies 
Haploid Genome 
A C C C 
Repeat Copies (ONLY but noted by color difference) 
With a haploid genome, allelic differences are eliminated, and base differences are likely 
indicative of repeat copies
Hydatidiform mole 
1. Fertilization of an oocyte without a nucleus 
2. Post-zygotic diploidization of triploid zygotes 
23x 
23X 
23X 23X 
? 
Oocyte Androgenetic HM
Initial Use Of CHM1 Source 
• CHORI-17 BAC Library 
• CHORI-17 BAC end sequences (n=325,659) 
• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) 
• CHORI-17 BACs 
• > 750 have been sequenced 
• 590 of them in Genbank as phase 3
SRGAP2 Homology between genes 
Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs 
Shows homology between SRGAP2B and SRGAP2C 
SRGAP2A 
SRGAP2B 
SRGAP2C 
Dennis, et.al. 2012
1q21 
1q32 1q21 1p21 
1q21 patch alignment to chromosome 1
IGH Region Highlights Allelic Differences 
Watson, et. al., 2013
Williams-Beuren Syndrome region 
Slide courtesy of Megan Dennis
Current status of CHM1 resources 
• CHORI-17 BAC Library (created from CHM1 cell line) 
• CHORI-17 BAC end sequences (n=325,659) 
• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) 
• CHORI-17 BACs (>750 have been sequenced, with 592 of them in 
Genbank as phase 3) 
• Active cell line 
• >100X coverage Illumina 100bp reads 
• 300, 500bp, 3kb inserts 
• Reference assisted assembly CHM1_1.1 
• BioNano genome map 
• >50X coverage of PacBio long read data
CHM1_1.1 Assembly 
• Reference-guided assembly – SRPRISM v2.3, R. Agarwala 
• Alignment of Illumina reads to GRCh37 primary assembly 
• CHORI-17 BAC clone tilepaths were then incorporated 
• 428 total clones 
• 324 clones in 45 tilepaths 
• 104 clones as singletons 
• Comparison back to GRCh37 reference to provide appropriate gaps 
sizes 
• Assembly submitted to Genbank 
• http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2 
• Paper to be published soon 
• Genome Research (in press) 
• biorxiv doi (doi: http://dx.doi.org/10.1101/006841)
CHM1_1.1 Assembly 
Total Sequence Length 3,037,866,619 bp 
Total Assembly Gap Length 210,229,812 bp 
Number of Scaffolds 163 
Scaffold N50 50,362,920 bp 
Number of Contigs 40,828 
Contig N50 143,936 bp 
CHM1_1.1 
GRCh3 
7
Incorporation of CHM1_1.1 Assembly Data in GRCh38
PacBio CHM1 Assembly potentially fills GRCh38 Gaps 
GRCh38 
PacBio CHM1
PacBio CHM1 Assembly Shows Data Not in GRCH38 
GRCh38 
PacBio CHM1 
Second Pass Alignment
CHM1 BioNano Genome Map Aligned to GRCh38 
GRCh38 
CHM1 BioNano Map 
~15kb additional data
BioNano SV Calls Identified a Assembly Problems 
Collapse 
Expansion 
in Assembly 
CHM1_1.1 Assembly Gap in Sequence 
CHM1 BioNano Map
Collapse in Sequence Data 
Thought to be missing ~100kb in sequenced clones 
GRCh38
Gap Sizing 
Chr8 – Stalled Gap 
Estimated at ~150kb 
GRCh38 
Sized using CHM1 Genome Map - >500 Kb
Future of CHM1 Assembly 
• Plan to make as contiguous and accurate as possible 
• Incorporate PacBio assembly where possible 
• Additional CH17 clones being sequenced through 
segmentally duplicated and structurally variant regions to 
provide local assembly benefits (isolates the repeats)
CYP2D6 – Providing Alternate Alleles 
ABC7 
(NA18517) 
ABC8 
(NA18507) 
ABC9 
(NA18956) 
ABC11 
(NA18555)
Future Directions 
• Continued Improvement on CHM1 Genome 
• Integration of Pacific Bioscience whole genome assembly 
• BioNano genome map data 
• Continue to add diversity to the reference by sequencing 
new samples that provide additional diversity than what is 
currently represented in GRCh38 
• Continued sequencing of CH17 single haplotype BAC 
tilepaths to better represent segmentally duplicated 
regions 
• Additional collaborations with the community to develop 
tools to more fully utilize the full reference assembly 
(alternate haplotypes)
Acknowledgements 
The Genome Institute at Washington 
University in St. Louis 
Rick Wilson 
Bob Fulton 
Wes Warren 
Karyn Meltz Steinberg 
Vince Magrini 
Derek Albracht 
Milinn Kremitzki 
Susan Rock 
Debbie Scheer 
Aye Wollam 
The Finishing and Bioinformatics Teams 
at The Genome Institute 
University of Washington 
Evan Eichler 
Megan Dennis 
Xander Nuttler 
NCBI 
Richa Argwala 
Valerie Schneider 
University of Pittsburgh 
School of Medicine (CHM1 cell line) 
Urvashi Surti 
Personalis 
Deanna Church 
BioNano Genomics 
Pacific Biosciences 
UCSF 
Pui-Yan Kwok 
Yvonne Lai 
Chin Lin 
CHORI Catherine Chu 
Pieter de Jong
Ashg grc workshop2014_tg

More Related Content

PPTX
GRCWorkshop_geval_1KG_slides
PPTX
ABGT 2016 Workshop Schneider
PPTX
Grc workshop agbt2015_tg
PPTX
Ashg2015 schneider final
PPTX
agbt 2016 workshop lindsay
PPTX
Ashg2014 grc workshop_schneider
PDF
Ashg2015 grc-pruitt
PPTX
Creating Reference-Grade Human Genome Assemblies
GRCWorkshop_geval_1KG_slides
ABGT 2016 Workshop Schneider
Grc workshop agbt2015_tg
Ashg2015 schneider final
agbt 2016 workshop lindsay
Ashg2014 grc workshop_schneider
Ashg2015 grc-pruitt
Creating Reference-Grade Human Genome Assemblies

What's hot (20)

PPTX
Ashg2017 workshop tg
PPTX
Creating Reference-Grade Human Genome Assemblies
PDF
Alignment Approaches II: Long Reads
PPTX
Understanding the reference assembly: CSHL Hackathon
PPTX
Agbt2015 workshop schneider
PDF
Ashg grc workshop2015_tg
PPTX
Previewing GRCm39: Assembly Updates from the GRC
PDF
AGBT2017 Reference Workshop: Lindsay
PPTX
hg19 (GRCh37) vs. hg38 (GRCh38)
PDF
Variation graphs and population assisted genome inference copy
PDF
Haplotype resolved structural variation assembly with long reads
PPTX
Exploiting long read sequencing technology to build a substantially improved ...
PPTX
Explaining the assembly model
PPTX
Getting the most from the reference assembly
PPTX
TAGC2016 schneider
PPTX
Schneider grc workshop_final
PPTX
Schneider_AGBT2014
PPTX
AGBT 2016 Workshop Magrini
PPTX
Ashg2017 workshop schneider
PPTX
AGBT2017 Reference Workshop: Schneider
Ashg2017 workshop tg
Creating Reference-Grade Human Genome Assemblies
Alignment Approaches II: Long Reads
Understanding the reference assembly: CSHL Hackathon
Agbt2015 workshop schneider
Ashg grc workshop2015_tg
Previewing GRCm39: Assembly Updates from the GRC
AGBT2017 Reference Workshop: Lindsay
hg19 (GRCh37) vs. hg38 (GRCh38)
Variation graphs and population assisted genome inference copy
Haplotype resolved structural variation assembly with long reads
Exploiting long read sequencing technology to build a substantially improved ...
Explaining the assembly model
Getting the most from the reference assembly
TAGC2016 schneider
Schneider grc workshop_final
Schneider_AGBT2014
AGBT 2016 Workshop Magrini
Ashg2017 workshop schneider
AGBT2017 Reference Workshop: Schneider
Ad

Similar to Ashg grc workshop2014_tg (19)

PPTX
PDF
The importance of high quality reference genome assemblies to personal and me...
PPTX
Advancements in the human genome reference assembly (GRCh38)
PPTX
Church iowa2013
PPTX
Telomere-to-telomere assembly of a complete human chromosomes
PDF
London Calling 2019: Karen Miga
PPTX
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
PPTX
What's new and what's next for the human reference assembly?
PDF
40 Years of Genome Assembly: Are We Done Yet?
PDF
Building a platinum human genome assembly from single haplotype human genomes...
PPTX
Church_GenomeAccess_2013_genome2013
PPTX
Church sfaf13
PDF
101717.kh miga ashg_grc
PPTX
Genome in a Bottle- reference materials to benchmark challenging variants and...
PPTX
Review of Liao et al - A draft human pangenome reference - Nature (2023)
PDF
Course on parsing methods for biologists with a focus on ChIP-seq data
PDF
Telomere-to-telomere assembly of a complete human X chromosome
PPTX
GIAB for AMP GeT-RM Forum
PPTX
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
The importance of high quality reference genome assemblies to personal and me...
Advancements in the human genome reference assembly (GRCh38)
Church iowa2013
Telomere-to-telomere assembly of a complete human chromosomes
London Calling 2019: Karen Miga
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
What's new and what's next for the human reference assembly?
40 Years of Genome Assembly: Are We Done Yet?
Building a platinum human genome assembly from single haplotype human genomes...
Church_GenomeAccess_2013_genome2013
Church sfaf13
101717.kh miga ashg_grc
Genome in a Bottle- reference materials to benchmark challenging variants and...
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Course on parsing methods for biologists with a focus on ChIP-seq data
Telomere-to-telomere assembly of a complete human X chromosome
GIAB for AMP GeT-RM Forum
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
Ad

More from Genome Reference Consortium (14)

PPTX
Genome variation graphs with the vg toolkit
PPTX
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
PPTX
Why graph genome storage and updating wakes me up at 4 am
PPTX
PPTX
Lrg and mane 16 oct 2018
PPTX
20181016 grc presentation-pa
PPTX
2018 1016 trio_binning_ashg_arhie_final
PPTX
Ashg sedlazeck grc_share
PPTX
171017 giab for giab grc workshop
PDF
AGBT2017 Reference Workshop: Fulton
PDF
Everyday de novo diploid assembly
PPTX
Genome in a Bottle
PPTX
ClinVar: Getting the most from the reference assembly and reference materials
PPTX
Graph and assembly strategies for the MHC and ribosomal DNA regions
Genome variation graphs with the vg toolkit
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Why graph genome storage and updating wakes me up at 4 am
Lrg and mane 16 oct 2018
20181016 grc presentation-pa
2018 1016 trio_binning_ashg_arhie_final
Ashg sedlazeck grc_share
171017 giab for giab grc workshop
AGBT2017 Reference Workshop: Fulton
Everyday de novo diploid assembly
Genome in a Bottle
ClinVar: Getting the most from the reference assembly and reference materials
Graph and assembly strategies for the MHC and ribosomal DNA regions

Ashg grc workshop2014_tg

  • 1. ASHG - GRC Workshop Tina Lindsay ASHG Oct 18, 2014
  • 2. The Human Reference is Not Complete • Reference has been found to not be optimal in some regions • Structural variation makes it difficult to assemble a truly representative genome when using a diploid sample • Some regions were recalcitrant to closure with technology and resources available at the time • Additional sequences are needed to capture the full range of diversity in humans
  • 3. UGT2B17 – Conflicting Alleles AC074378.4 AC079749.5 AC147055.2 AC134921.2 AC140484.1 AC019173.4 AC093720.2 AC021146.7 NCBI36 NC_000004.10 (chr4) Tiling Path TMPRSS11E TMPRSS11E2 Xue Y et al, 2008 GRCh37 NC_000004.11 (chr4) Tiling Path AC074378.4 AC079749.5 AC147055.2 AC134921.1 AC093720.2 AC021146.7 TMPRSS11E GRCh37: NT_167250.1 (UGT2B17 alternate locus) AC074378.4 AC140484.1 AC019173.4 AC226496.2 AC021146.7 TMPRSS11E2 G A P
  • 4. Allelic Diversity vs. Segmental Duplication A A C T C G C C Repeat Copies (noted by color difference) Allelic Copies Diploid Genome With a diploid genome, there is significant ambiguity sorting allelic copies from repeat copies Haploid Genome A C C C Repeat Copies (ONLY but noted by color difference) With a haploid genome, allelic differences are eliminated, and base differences are likely indicative of repeat copies
  • 5. Hydatidiform mole 1. Fertilization of an oocyte without a nucleus 2. Post-zygotic diploidization of triploid zygotes 23x 23X 23X 23X ? Oocyte Androgenetic HM
  • 6. Initial Use Of CHM1 Source • CHORI-17 BAC Library • CHORI-17 BAC end sequences (n=325,659) • CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) • CHORI-17 BACs • > 750 have been sequenced • 590 of them in Genbank as phase 3
  • 7. SRGAP2 Homology between genes Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs Shows homology between SRGAP2B and SRGAP2C SRGAP2A SRGAP2B SRGAP2C Dennis, et.al. 2012
  • 8. 1q21 1q32 1q21 1p21 1q21 patch alignment to chromosome 1
  • 9. IGH Region Highlights Allelic Differences Watson, et. al., 2013
  • 10. Williams-Beuren Syndrome region Slide courtesy of Megan Dennis
  • 11. Current status of CHM1 resources • CHORI-17 BAC Library (created from CHM1 cell line) • CHORI-17 BAC end sequences (n=325,659) • CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) • CHORI-17 BACs (>750 have been sequenced, with 592 of them in Genbank as phase 3) • Active cell line • >100X coverage Illumina 100bp reads • 300, 500bp, 3kb inserts • Reference assisted assembly CHM1_1.1 • BioNano genome map • >50X coverage of PacBio long read data
  • 12. CHM1_1.1 Assembly • Reference-guided assembly – SRPRISM v2.3, R. Agarwala • Alignment of Illumina reads to GRCh37 primary assembly • CHORI-17 BAC clone tilepaths were then incorporated • 428 total clones • 324 clones in 45 tilepaths • 104 clones as singletons • Comparison back to GRCh37 reference to provide appropriate gaps sizes • Assembly submitted to Genbank • http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2 • Paper to be published soon • Genome Research (in press) • biorxiv doi (doi: http://dx.doi.org/10.1101/006841)
  • 13. CHM1_1.1 Assembly Total Sequence Length 3,037,866,619 bp Total Assembly Gap Length 210,229,812 bp Number of Scaffolds 163 Scaffold N50 50,362,920 bp Number of Contigs 40,828 Contig N50 143,936 bp CHM1_1.1 GRCh3 7
  • 14. Incorporation of CHM1_1.1 Assembly Data in GRCh38
  • 15. PacBio CHM1 Assembly potentially fills GRCh38 Gaps GRCh38 PacBio CHM1
  • 16. PacBio CHM1 Assembly Shows Data Not in GRCH38 GRCh38 PacBio CHM1 Second Pass Alignment
  • 17. CHM1 BioNano Genome Map Aligned to GRCh38 GRCh38 CHM1 BioNano Map ~15kb additional data
  • 18. BioNano SV Calls Identified a Assembly Problems Collapse Expansion in Assembly CHM1_1.1 Assembly Gap in Sequence CHM1 BioNano Map
  • 19. Collapse in Sequence Data Thought to be missing ~100kb in sequenced clones GRCh38
  • 20. Gap Sizing Chr8 – Stalled Gap Estimated at ~150kb GRCh38 Sized using CHM1 Genome Map - >500 Kb
  • 21. Future of CHM1 Assembly • Plan to make as contiguous and accurate as possible • Incorporate PacBio assembly where possible • Additional CH17 clones being sequenced through segmentally duplicated and structurally variant regions to provide local assembly benefits (isolates the repeats)
  • 22. CYP2D6 – Providing Alternate Alleles ABC7 (NA18517) ABC8 (NA18507) ABC9 (NA18956) ABC11 (NA18555)
  • 23. Future Directions • Continued Improvement on CHM1 Genome • Integration of Pacific Bioscience whole genome assembly • BioNano genome map data • Continue to add diversity to the reference by sequencing new samples that provide additional diversity than what is currently represented in GRCh38 • Continued sequencing of CH17 single haplotype BAC tilepaths to better represent segmentally duplicated regions • Additional collaborations with the community to develop tools to more fully utilize the full reference assembly (alternate haplotypes)
  • 24. Acknowledgements The Genome Institute at Washington University in St. Louis Rick Wilson Bob Fulton Wes Warren Karyn Meltz Steinberg Vince Magrini Derek Albracht Milinn Kremitzki Susan Rock Debbie Scheer Aye Wollam The Finishing and Bioinformatics Teams at The Genome Institute University of Washington Evan Eichler Megan Dennis Xander Nuttler NCBI Richa Argwala Valerie Schneider University of Pittsburgh School of Medicine (CHM1 cell line) Urvashi Surti Personalis Deanna Church BioNano Genomics Pacific Biosciences UCSF Pui-Yan Kwok Yvonne Lai Chin Lin CHORI Catherine Chu Pieter de Jong