hgvs and uta:
Practical tools for manipulating sequence variants
Reece Hart1
, Tim Chiu1
, Vincent A Fusaro1
, John Garcia1
, Kevin B Jacobs2
, Geoffrey B Nilsen1
,
Veena Rajaraman1
, Meng Wang3
1
Invitae, 475 Brannan St, San Francisco, CA 941107
2
23andMe, 899 W Evelyn Ave, Mountain View, CA 941041
3
Peking University, 5 Yiheyuan Rd, Haidian, Beijing, China  reece@invitae.com
Abstract
The widespread use of HGVS
recommendations to report and share
sequence variants highlights the need for
freely-available software libraries and readily-
accessible, well-managed transcript
definitions. We developed two complementary
tools, hgvs and UTA, that facilitate the use of
HGVS variant nomenclature for literature
mining, clinical reporting, and data
aggregation. The tools are released under the
Apache open source license and available for
local installation.
UTA
Universal Transcript Archive
hgvs Python package
NCBI
CLINVITAE
Variant2PubmedReporting
hgvsWeb
UCSC Ensembl
Show me the code!
(Stephen Hawking once joked that every formula in a Brief History of Time would
halve his readership. Let's hope this doesn't apply to code examples on posters!)
Installation:
$ pip install hgvs
Well, that was easy. (See docs for details and local UTA installation.)
Variants are identified in
and computed on using
genome coordinates
Variants are communicated
and interpreted using
transcript coordinates
genome to
transcript
(g. to c./n.)
transcript
to genome
(c./n. to g.)
exon alignments
NM_01234.4 NC_000012.3 0 50=
NM_01234.4 NC_000012.3 1 100=3I49=
NM_01234.4 NC_000012.3 2 5=1I44=
transcript
NM_01234.4
NM_01234.4
NM_01234.5
NM_01234.5
NM_01234.5
NM_01234.5
ENST012345
ENST012345
reference
NM_01234.4
NC_000012.3
NM_01234.5
NC_000012.3
AC_45678.9
NC_000012.3
ENST012345
NC_000012.3
method
self
splign
self
splign
splign
blat
self
genebuild
exons
exon sets
Needleman-Wunsch
alignments use
coordinates from
source databases.
What's in UTA?
UTA is the Universal Transcript Archive. It stores versioned transcript and genomic
exon definitions and the alignments between them. Exon coordinates in UTA are
exactly as provided by the data source, not inferred or regenerated.
There many cases where the transcript and genomic exon sequences differ due to
sequencing errors or polymorphisms, including indels. Indel differences confound
simple mapping between c. and g. coordinates and occur in several clinically-
important genes. Having these alignments in UTA facilitates mapping variants even
when an indel or sequence variant is present.
Historical transcripts and their alignments are necessary to interpret older published
data. Multiple alignment methods allow detailed and comprehensive comparisons of
NCBI (splign) and UCSC (blat) exon structures, for example. UTA uses hashes of
sequences, exon sets and other entities to enable rapid determination of
equivalence within and across sources.
Features
⨀ Parse and generate most HGVS formatted
variants
⨀ Project variants between g, c, n, and p
coordinates using an indel-aware mapper
⨀ Normalize variants to standard forms
⨀ Validate variants
⨀ Easy local installation
⨀ Extensively tested using manually mapped
variants and comparisons with dbSNP and
Mutalyzer.
Interface for HGVS name analysis
• Utilises the hgvs Python package
• Input HGVS name into text-box
• Select alignment method
• Click Submit
Interface for HGVS name analysis
• Utilises the hgvs Python package
• Input HGVS name into text-box
• Select alignment method
• Click Submit
Tabular output of validated descriptions
• Links to fasta sequences and alignments
• Automated mapping between transcripts
• Intronic variant descriptions accepted
• Optional re-align with alternative method
Tabular output of validated descriptions
• Links to fasta sequences and alignments
• Automated mapping between transcripts
• Intronic variant descriptions accepted
• Optional re-align with alternative method
Code, documentation, issue tracker:
https://bitbucket.org/biocommons/hgvs/
Mailing list:
hgvs-discuss@groups.google.com
Citation:
A Python Package for Parsing, Validating, Mapping, and
Formatting Sequence Variants Using HGVS Nomenclature.
Hart RK, Rico R, Hare E, Garcia J, Westbrook J, Fusaro VA.
Bioinformatics. 2014 Sep 30.
CLINVITAE
http://clinvitae.invitae.com/
hgvsWeb VariantAnalyser
Peter. J. Freeman, Anthony. J. Brookes, Raymond Dalgleish
Department of Genetics, University of Leicester, Leicester, UK
https://www22.lamp.le.ac.uk/hgvs/variantanalyser
hgvs functionality
⨀ Object model – facile interface to HGVS
variant elements
⨀ Parser and Formatter – Converts HGVS
text to Python objects, and vice versa
⨀ Variant mapper – indel-aware mapping of
g. ↔ n. ↔ c. → p.
⨀ Validator – Verifies variant syntax and
limited semantics
⨀ Normalizer – Rewrites variants in
recommended forms

More Related Content

PDF
Invitae PSB 2014 poster
PDF
The Clinical Significance of Transcript Alignment Discrepancies
PPTX
Aug2015 Giab nist integration methods
PPTX
Quality Assessment of Biomedical Metadata using Topic Modeling
PDF
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
PDF
Cassava genome hub
PPTX
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
PPTX
Jan2016 horizon GIAB
Invitae PSB 2014 poster
The Clinical Significance of Transcript Alignment Discrepancies
Aug2015 Giab nist integration methods
Quality Assessment of Biomedical Metadata using Topic Modeling
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
Cassava genome hub
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Jan2016 horizon GIAB

Similar to HGVS 2015 poster: hgvs, uta, variantanalyzer (20)

PDF
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
PDF
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
PDF
B Chapman - Toolkit for variation comparison and analysis
PPTX
2015 functional genomics variant annotation and interpretation- tools and p...
PDF
2023 GIAB AMP Update
PPTX
Variation and Assembly Resources at EMBL-EBI
PPTX
171017 giab for giab grc workshop
PDF
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
PPTX
CS Lecture 2017 04-11 from Data to Precision Medicine
PPTX
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
PPTX
171017 giab for giab grc workshop
PDF
100,000 Genomes Project.
PPTX
Giab for jax long read 190917
PPTX
GIAB for AMP GeT-RM Forum
PPTX
GIAB Technical Germline Benchmark roadmap discussion
PPTX
Giab workshop intro 180125
PDF
Variant analysis and whole exome sequencing
PPTX
Knowing Your NGS Upstream: Alignment and Variants
PDF
Grammar-Based 
Interactive Visualization of Genomics Data
PDF
20140710 6 c_mason_ercc2.0_workshop
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
B Chapman - Toolkit for variation comparison and analysis
2015 functional genomics variant annotation and interpretation- tools and p...
2023 GIAB AMP Update
Variation and Assembly Resources at EMBL-EBI
171017 giab for giab grc workshop
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
CS Lecture 2017 04-11 from Data to Precision Medicine
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
171017 giab for giab grc workshop
100,000 Genomes Project.
Giab for jax long read 190917
GIAB for AMP GeT-RM Forum
GIAB Technical Germline Benchmark roadmap discussion
Giab workshop intro 180125
Variant analysis and whole exome sequencing
Knowing Your NGS Upstream: Alignment and Variants
Grammar-Based 
Interactive Visualization of Genomics Data
20140710 6 c_mason_ercc2.0_workshop
Ad

More from Reece Hart (12)

PDF
Clinical significance of transcript alignment discrepancies gne - 20141016
PDF
AWS Life Sciences
PDF
ASHG 2012 Poster
PDF
Building a clinical genome interpretation services company
PDF
Bio-IT 2010 Genome Commons
PDF
HVP Critical Assessment of Genome Interpretation
PDF
Introduction to and Applications of Unison, an Open Source Database for Targe...
PDF
Unison: Enabling easy, rapid, and comprehensive proteomic mining
PDF
A Tour of Research Computing at Genentech
PDF
Integrating Public and Private Data: Lessons Learned from Unison
PDF
Unison: An Integrated Platform for Computational Biology Discovery
PDF
Mining for Novel TNF Ligands
Clinical significance of transcript alignment discrepancies gne - 20141016
AWS Life Sciences
ASHG 2012 Poster
Building a clinical genome interpretation services company
Bio-IT 2010 Genome Commons
HVP Critical Assessment of Genome Interpretation
Introduction to and Applications of Unison, an Open Source Database for Targe...
Unison: Enabling easy, rapid, and comprehensive proteomic mining
A Tour of Research Computing at Genentech
Integrating Public and Private Data: Lessons Learned from Unison
Unison: An Integrated Platform for Computational Biology Discovery
Mining for Novel TNF Ligands
Ad

HGVS 2015 poster: hgvs, uta, variantanalyzer

  • 1. hgvs and uta: Practical tools for manipulating sequence variants Reece Hart1 , Tim Chiu1 , Vincent A Fusaro1 , John Garcia1 , Kevin B Jacobs2 , Geoffrey B Nilsen1 , Veena Rajaraman1 , Meng Wang3 1 Invitae, 475 Brannan St, San Francisco, CA 941107 2 23andMe, 899 W Evelyn Ave, Mountain View, CA 941041 3 Peking University, 5 Yiheyuan Rd, Haidian, Beijing, China  [email protected] Abstract The widespread use of HGVS recommendations to report and share sequence variants highlights the need for freely-available software libraries and readily- accessible, well-managed transcript definitions. We developed two complementary tools, hgvs and UTA, that facilitate the use of HGVS variant nomenclature for literature mining, clinical reporting, and data aggregation. The tools are released under the Apache open source license and available for local installation. UTA Universal Transcript Archive hgvs Python package NCBI CLINVITAE Variant2PubmedReporting hgvsWeb UCSC Ensembl Show me the code! (Stephen Hawking once joked that every formula in a Brief History of Time would halve his readership. Let's hope this doesn't apply to code examples on posters!) Installation: $ pip install hgvs Well, that was easy. (See docs for details and local UTA installation.) Variants are identified in and computed on using genome coordinates Variants are communicated and interpreted using transcript coordinates genome to transcript (g. to c./n.) transcript to genome (c./n. to g.) exon alignments NM_01234.4 NC_000012.3 0 50= NM_01234.4 NC_000012.3 1 100=3I49= NM_01234.4 NC_000012.3 2 5=1I44= transcript NM_01234.4 NM_01234.4 NM_01234.5 NM_01234.5 NM_01234.5 NM_01234.5 ENST012345 ENST012345 reference NM_01234.4 NC_000012.3 NM_01234.5 NC_000012.3 AC_45678.9 NC_000012.3 ENST012345 NC_000012.3 method self splign self splign splign blat self genebuild exons exon sets Needleman-Wunsch alignments use coordinates from source databases. What's in UTA? UTA is the Universal Transcript Archive. It stores versioned transcript and genomic exon definitions and the alignments between them. Exon coordinates in UTA are exactly as provided by the data source, not inferred or regenerated. There many cases where the transcript and genomic exon sequences differ due to sequencing errors or polymorphisms, including indels. Indel differences confound simple mapping between c. and g. coordinates and occur in several clinically- important genes. Having these alignments in UTA facilitates mapping variants even when an indel or sequence variant is present. Historical transcripts and their alignments are necessary to interpret older published data. Multiple alignment methods allow detailed and comprehensive comparisons of NCBI (splign) and UCSC (blat) exon structures, for example. UTA uses hashes of sequences, exon sets and other entities to enable rapid determination of equivalence within and across sources. Features ⨀ Parse and generate most HGVS formatted variants ⨀ Project variants between g, c, n, and p coordinates using an indel-aware mapper ⨀ Normalize variants to standard forms ⨀ Validate variants ⨀ Easy local installation ⨀ Extensively tested using manually mapped variants and comparisons with dbSNP and Mutalyzer. Interface for HGVS name analysis • Utilises the hgvs Python package • Input HGVS name into text-box • Select alignment method • Click Submit Interface for HGVS name analysis • Utilises the hgvs Python package • Input HGVS name into text-box • Select alignment method • Click Submit Tabular output of validated descriptions • Links to fasta sequences and alignments • Automated mapping between transcripts • Intronic variant descriptions accepted • Optional re-align with alternative method Tabular output of validated descriptions • Links to fasta sequences and alignments • Automated mapping between transcripts • Intronic variant descriptions accepted • Optional re-align with alternative method Code, documentation, issue tracker: https://bitbucket.org/biocommons/hgvs/ Mailing list: [email protected] Citation: A Python Package for Parsing, Validating, Mapping, and Formatting Sequence Variants Using HGVS Nomenclature. Hart RK, Rico R, Hare E, Garcia J, Westbrook J, Fusaro VA. Bioinformatics. 2014 Sep 30. CLINVITAE http://clinvitae.invitae.com/ hgvsWeb VariantAnalyser Peter. J. Freeman, Anthony. J. Brookes, Raymond Dalgleish Department of Genetics, University of Leicester, Leicester, UK https://www22.lamp.le.ac.uk/hgvs/variantanalyser hgvs functionality ⨀ Object model – facile interface to HGVS variant elements ⨀ Parser and Formatter – Converts HGVS text to Python objects, and vice versa ⨀ Variant mapper – indel-aware mapping of g. ↔ n. ↔ c. → p. ⨀ Validator – Verifies variant syntax and limited semantics ⨀ Normalizer – Rewrites variants in recommended forms