A bioinformatics approach for the prioritization of
disease candidate human mtDNA mutations
BiP-Day 2014 Seconda Giornata della Bioinformatica Pugliese
Bari, 19 dicembre 2014
Mariangela Santorsola
mar. ’15 Mariangela Santorsola
1 Mutations spreading in one or more populations and/or
haplogroup-associated events
2 Rare mutations occurring in highly conserved sites lying to
functional and selective constraints (somatic and/or germline)
Potentially affecting function
mitochondrial mutations
Mitochondrial mutations
MToolBox(1) functional annotation
Patho Table of all possible
non-synonymous mutations
• Nucleotide variability values (SiteVar
alghoritm (2))
• Six pathogenicity predictions from:
1 MutPred (3)
2 Polyphen-2 (4)
3 SNPs&GO (5)
1 Calabrese et al., 2014
2 Pesole and Saccone 2001.
3 Li et al., 2009
4 Adzhubei et al., 2010
5 Capriotti et al., 2013
mar. ’15 Mariangela Santorsola
The discrepancy of pathogenicity predictions by different methods requires
the use of a single score able to summarize such predictions and define a
mitochondrial non-synonymous mutation as ’disease' or ’benign'.
mar. ’15 Mariangela Santorsola
The weighted mean of the probabilities to be deleterious provided by
pathogenicity predictor methods for each i-th non-synonymous mutation
Disease Score
DSi =
(Pi
MP*WMP)+(Pi
PPD*WPPD)+(Pi
PPV*WPPV)+(Pi
PT*WPT)+(Pi
PS*WPS)+(Pi
SG*WSG)
WMP+WPPD+WPPV+WPT+WPS+WSG
PMP = MutPred probability WMP= MutPred weight
PPPD = Polyphen-2 HumDiv probability WPPD = Polyphen-2 HumDiv weight
PPPV = Polyphen-2 HumVar probability WPPV = Polyphen-2 HumVar weight
PPT = PANTHER probability WPT = PANTHER weight
PPS = PhD-SNP probability WPS = PhD-SNP weight
PSG = SNPs&GO probability WSG = SNPs&GO weight
Ranging between 0 and 1
Weight
W=(hp+rp)/2n
• hp = number of times the method
provides the higher probability
• rp = number of times the method
provides the right prediction
(“affecting function or disease”)
• n = number of training mutations
• An ideal method which provides n
times the higher probability and n
times the right prediction for n
mutations would have weight 1
mar. ’15 Mariangela Santorsola
Disease Score
Training dataset
53 non-synonymous mutations
previously validated as affecting
function
• 28 disease-associated mutations,
annotated in Mitomap as
‘confirmed’ pathogenic by at
least two or more independent
laboratories (1)
• 25 cancer-associated mutations
previously validated (2)
1 http://www.mitomap.org
2 Pereira et al., 2012
mar. ’15 Mariangela Santorsola
Min 0.66
Median 0.87
Max 0.92
Min 0.05
Median 0.13
Max 0.43
Disease Benign
Disease
Benign
Disease scores distribution of observed non-synonymous
mutations predicted as ‘Benign’ or ‘Disease’ by the six
pathogenicity predictors at the same time.
>=0.4311
Disease score cutoff
Bimodal distribution of disease scores for 1872 observed non-synonymous
mutations observed in 15385 mtDNA genomes from healthy individuals
stored in HmtDB (1) (Last update May 2014)
The disease score value for which the probability of belonging to the
second ‘disease’ component of mixture model was ten times greater than
the probability of belonging to the first ‘neutral’.
1 Rubino et al., 2012
Disease Scores
Frequency
mar. ’15 Mariangela Santorsola
Disease
Benign
mar. ’15 Mariangela Santorsola
Potentially affecting function mitochondrial mutations
Low nucleotide variability value
High disease score
Nucleotide variability/Disease score correlation of all possible
non-synonymous mutations
mar. ’15 Mariangela Santorsola
Nucleotide variability cutoff
Nucleotide variability
Frequency
The nucleotide variability cutoff below which a mutation may be considered
potentially deleterious was determined as the third quartile of the
distribution of variability values associated to the 816 non-synonymous
events featuring disease score above the established DS-cutoff.
3rd Qu. <= 0.0026
mar. ’15 Mariangela Santorsola
Polymorphic and haplogroup-associated vs
rare mutations
MToolBox Functional annotation
• Hg_MHCS (Major Haplogroup
Consensus Sequences(1) )
• rCRS (revised Cambridge
Reference Sequence(2))
• RSRS (Reconstructed Sapiens
Reference Sequence(3))
Phylogenetic relationships among virtual Major Haplogroup Consensus Sequences and
two real mitochondrial sequences (Phylotree(4)) for each haplogroup
1 Calabrese et al., 2014
2 Anderson et al., 1981
3 Behar et al., 2012
4 van Oven and Kayser 2009
Prioritization criteria of mtDNA non-synonymous
mutations affecting-function for future analysis
mar. ’15 Mariangela Santorsola
Recognized by three reference sequences
Occurring in non-haplogroup defining sites
Featuring nucleotide variability values <= 0.0026
Featuring Disease score >= 0.4311
Heteroplasmy level (*)
mar. ’15 Mariangela Santorsola
Application of MToolBox and prioritization criteria
Check of tumor-specific nature
by sequencing mtDNA from blood tissues of the same individuals
• 77.78% of prioritized variants were tumor-specific
21 ovarian tumor pre-chemio
mtDNA samples
Sample Variant Allele HF Locus AA Change Nt Var Disease score
Tumor-
specific/Germline
EOC5 3380A 0.75 MT-ND1 R25Q 0.0003 0.8764 tumor-specific
EOC40 14969C 0.50 MT-CYB Y75H 0.0003 0.8526 tumor-specific
EOC16 9837A 0.45 MT-CO3 G211S 0.0000 0.8379 tumor-specific
EOC20 15255C 0.80 MT-CYB V170A 0.0000 0.8195 tumor-specific
EOC20 10696T 0.75 MT-ND4L A76V 0.0000 0.7810 tumor-specific
EOC14 6121C 0.45 MT-CO1 I73T 0.0007 0.7054 tumor-specific
EOC5 8412C 1.00 MT-ATP8 M16T 0.0023 0.6587 germline
EOC32 14249A 1.00 MT-ND6 A142V 0.0020 0.4498 germline
List of 8/268 prioritized non-synonymous affecting function mutations
6/21 mutated samples (33%)
• All synonymous mutations, occurring in site showing variability
values below the variability cutoff, resulted to be germline
Acknowledgements
mar. ’15 Mariangela Santorsola
Department of Medical and Surgical Sciences
University of Bologna
Giuseppe Gasparre
Claudia Calabrese
Rosanna Clima
Giulia Girolimetti
Department of Biosciences, Biotechnologies and Biopharmaceutics,
University of Bari
Prof Marcella Attimonelli
Saverio Vicario
Domenico Simone
Maria Angela Diroma
BiPday 2014 -- Santorsola Mariangela

More Related Content

PPTX
BiPday 2014 --Creanza Teresa
PDF
Hamilton.nature.comms
PDF
Geveart Lab SIMR Paper
PPT
11.06.13 - 2013 NIH Research Festival Poster
PDF
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
PDF
miRNA Breast Cancer Prognosis -- Ingenuity Systems
PDF
20140711 5 s_pond_ercc2.0_workshop
PPTX
Gtc presentation
BiPday 2014 --Creanza Teresa
Hamilton.nature.comms
Geveart Lab SIMR Paper
11.06.13 - 2013 NIH Research Festival Poster
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
miRNA Breast Cancer Prognosis -- Ingenuity Systems
20140711 5 s_pond_ercc2.0_workshop
Gtc presentation

What's hot (20)

PPTX
Shotmap meta center_2014
PDF
Personalizing Oncology with Genomics
PDF
Characterization of microRNA expression profiles in normal human tissues
PDF
A computational framework for large-scale analysis of TCRβ immune repertoire ...
PPT
03.29.12 - SLU PhD Admissions Seminar
PDF
Published-PageOne
PDF
The Role of MicroRNAs in the Progression, Prognostication, and Treatment of B...
PPTX
NetBioSIG2014-Talk by Traver Hart
PDF
Puja das 2
PDF
Art%3 a10.1186%2fs12935 015-0185-1
PDF
Ransbotyn et al PUBLISHED (1)
PPTX
An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...
PDF
A Next-Generation Sequencing Assay to Estimate Tumor Mutation Load at > 5% Al...
PDF
MicroRNA Profiling of Hepatocellular Carcinomas in B6C3F1 Mice Treated with G...
PPT
Una revisión de los conocimientos fundamentales de la biología de la célula. ...
PPTX
Single cell pcr
PPTX
Developing a framework for for detection of low frequency somatic genetic alt...
PDF
Poster_PCR_2016-2
PDF
Loss of Connectivity in Cancer Co-Expression Networks - PLoS ONE 9(1): e87075...
Shotmap meta center_2014
Personalizing Oncology with Genomics
Characterization of microRNA expression profiles in normal human tissues
A computational framework for large-scale analysis of TCRβ immune repertoire ...
03.29.12 - SLU PhD Admissions Seminar
Published-PageOne
The Role of MicroRNAs in the Progression, Prognostication, and Treatment of B...
NetBioSIG2014-Talk by Traver Hart
Puja das 2
Art%3 a10.1186%2fs12935 015-0185-1
Ransbotyn et al PUBLISHED (1)
An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...
A Next-Generation Sequencing Assay to Estimate Tumor Mutation Load at > 5% Al...
MicroRNA Profiling of Hepatocellular Carcinomas in B6C3F1 Mice Treated with G...
Una revisión de los conocimientos fundamentales de la biología de la célula. ...
Single cell pcr
Developing a framework for for detection of low frequency somatic genetic alt...
Poster_PCR_2016-2
Loss of Connectivity in Cancer Co-Expression Networks - PLoS ONE 9(1): e87075...
Ad

Similar to BiPday 2014 -- Santorsola Mariangela (20)

PDF
Digging into thousands of variants to find disease genes in Mendelian and com...
PDF
How to transform genomic big data into valuable clinical information
PPTX
Bioinformatics
PDF
Sequencing 60,000 Samples: An Innovative Large Cohort Study for Breast Cancer...
PDF
Bioinformatics in dermato-oncology
PPTX
Identification of pathological mutations from the single-gene case to exome p...
PPTX
Deep phenotyping to aid identification of coding & non-coding rare disease v...
PDF
Forum on Personalized Medicine: Challenges for the next decade
PPT
Genetic Testing (Eastern Biotech & Life Sciences)
PPTX
BiPday 2014 -- Clima Rosanna
PDF
Pattern Recognition in clinical data
PDF
Pattern Recognition in Clinical Data
PDF
Bioinformatics and NGS for advancing in hearing loss research
PPTX
GA4GH Monarch Driver Project Introduction
PPTX
Towards Replicable and Genereralizable Genomic Prediction Models
PDF
Michael Liebman (IPQ Analytics) Applying Digital Technologies to Rare Disea...
PPTX
Template_Congreso_Variant_Interpre.pptxpp
PDF
Multi-trait analysis informs genetic disease studies (IIBMP 2020)
PPTX
Evaluating Oncogenicity in VSClinical
PPTX
Diagnostic Testing for Mitochondrial Disease
Digging into thousands of variants to find disease genes in Mendelian and com...
How to transform genomic big data into valuable clinical information
Bioinformatics
Sequencing 60,000 Samples: An Innovative Large Cohort Study for Breast Cancer...
Bioinformatics in dermato-oncology
Identification of pathological mutations from the single-gene case to exome p...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Forum on Personalized Medicine: Challenges for the next decade
Genetic Testing (Eastern Biotech & Life Sciences)
BiPday 2014 -- Clima Rosanna
Pattern Recognition in clinical data
Pattern Recognition in Clinical Data
Bioinformatics and NGS for advancing in hearing loss research
GA4GH Monarch Driver Project Introduction
Towards Replicable and Genereralizable Genomic Prediction Models
Michael Liebman (IPQ Analytics) Applying Digital Technologies to Rare Disea...
Template_Congreso_Variant_Interpre.pptxpp
Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Evaluating Oncogenicity in VSClinical
Diagnostic Testing for Mitochondrial Disease
Ad

More from eventi-ITBbari (20)

PPTX
BiPday 2014 -- Vicario Saverio
PPTX
BiPday 2014 -- Tulipano Angelica
PDF
BiPday 2014 -- Pesole Graziano
PDF
BiPday 2014 -- Notarangelo Pasquale
PDF
BiPday 2014 -- Donvito Giacinto
PPTX
BiPday 2014 -- De Molfetta Rita
PDF
BiPday 2014 -- Ceci Michelangelo
PPT
IBM Italia, Bari – La Bioinformatica nelle prospettive della Bioeconomy
PPT
Exprivia – Incorporazione ed utilizzo di dati genomici nella cartella clinica...
PDF
Maria A. Diroma – MEWAs: sviluppo di un sistema bioinformatico per studi di a...
PPTX
Massimo Carella – Analisi delle varianti genomiche da metodiche high-throughp...
PDF
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
PDF
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
PPTX
Maria Svelto – il Distretto H-BIO Puglia: sfide ed opportunità per la Bioinfo...
PPTX
Elvira Tarsitano – Bioinformatica e scienze omiche, il ruolo della formazione...
PPTX
Pasquale Saldarelli – La piattaforma genomica di sequenziamento massivo della...
PPTX
Domenico Catalano – Bioinformatica applicata a dati di genomica e trascrittom...
PPTX
Piero Larizza – “La Robotica nella Bioinformatica”
PPTX
Eusoft scegliere un LIMS per la ricerca NGS
PDF
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
BiPday 2014 -- Vicario Saverio
BiPday 2014 -- Tulipano Angelica
BiPday 2014 -- Pesole Graziano
BiPday 2014 -- Notarangelo Pasquale
BiPday 2014 -- Donvito Giacinto
BiPday 2014 -- De Molfetta Rita
BiPday 2014 -- Ceci Michelangelo
IBM Italia, Bari – La Bioinformatica nelle prospettive della Bioeconomy
Exprivia – Incorporazione ed utilizzo di dati genomici nella cartella clinica...
Maria A. Diroma – MEWAs: sviluppo di un sistema bioinformatico per studi di a...
Massimo Carella – Analisi delle varianti genomiche da metodiche high-throughp...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
Maria Svelto – il Distretto H-BIO Puglia: sfide ed opportunità per la Bioinfo...
Elvira Tarsitano – Bioinformatica e scienze omiche, il ruolo della formazione...
Pasquale Saldarelli – La piattaforma genomica di sequenziamento massivo della...
Domenico Catalano – Bioinformatica applicata a dati di genomica e trascrittom...
Piero Larizza – “La Robotica nella Bioinformatica”
Eusoft scegliere un LIMS per la ricerca NGS
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...

BiPday 2014 -- Santorsola Mariangela

  • 1. A bioinformatics approach for the prioritization of disease candidate human mtDNA mutations BiP-Day 2014 Seconda Giornata della Bioinformatica Pugliese Bari, 19 dicembre 2014 Mariangela Santorsola
  • 2. mar. ’15 Mariangela Santorsola 1 Mutations spreading in one or more populations and/or haplogroup-associated events 2 Rare mutations occurring in highly conserved sites lying to functional and selective constraints (somatic and/or germline) Potentially affecting function mitochondrial mutations Mitochondrial mutations
  • 3. MToolBox(1) functional annotation Patho Table of all possible non-synonymous mutations • Nucleotide variability values (SiteVar alghoritm (2)) • Six pathogenicity predictions from: 1 MutPred (3) 2 Polyphen-2 (4) 3 SNPs&GO (5) 1 Calabrese et al., 2014 2 Pesole and Saccone 2001. 3 Li et al., 2009 4 Adzhubei et al., 2010 5 Capriotti et al., 2013 mar. ’15 Mariangela Santorsola The discrepancy of pathogenicity predictions by different methods requires the use of a single score able to summarize such predictions and define a mitochondrial non-synonymous mutation as ’disease' or ’benign'.
  • 4. mar. ’15 Mariangela Santorsola The weighted mean of the probabilities to be deleterious provided by pathogenicity predictor methods for each i-th non-synonymous mutation Disease Score DSi = (Pi MP*WMP)+(Pi PPD*WPPD)+(Pi PPV*WPPV)+(Pi PT*WPT)+(Pi PS*WPS)+(Pi SG*WSG) WMP+WPPD+WPPV+WPT+WPS+WSG PMP = MutPred probability WMP= MutPred weight PPPD = Polyphen-2 HumDiv probability WPPD = Polyphen-2 HumDiv weight PPPV = Polyphen-2 HumVar probability WPPV = Polyphen-2 HumVar weight PPT = PANTHER probability WPT = PANTHER weight PPS = PhD-SNP probability WPS = PhD-SNP weight PSG = SNPs&GO probability WSG = SNPs&GO weight Ranging between 0 and 1
  • 5. Weight W=(hp+rp)/2n • hp = number of times the method provides the higher probability • rp = number of times the method provides the right prediction (“affecting function or disease”) • n = number of training mutations • An ideal method which provides n times the higher probability and n times the right prediction for n mutations would have weight 1 mar. ’15 Mariangela Santorsola Disease Score Training dataset 53 non-synonymous mutations previously validated as affecting function • 28 disease-associated mutations, annotated in Mitomap as ‘confirmed’ pathogenic by at least two or more independent laboratories (1) • 25 cancer-associated mutations previously validated (2) 1 http://www.mitomap.org 2 Pereira et al., 2012
  • 6. mar. ’15 Mariangela Santorsola Min 0.66 Median 0.87 Max 0.92 Min 0.05 Median 0.13 Max 0.43 Disease Benign Disease Benign Disease scores distribution of observed non-synonymous mutations predicted as ‘Benign’ or ‘Disease’ by the six pathogenicity predictors at the same time.
  • 7. >=0.4311 Disease score cutoff Bimodal distribution of disease scores for 1872 observed non-synonymous mutations observed in 15385 mtDNA genomes from healthy individuals stored in HmtDB (1) (Last update May 2014) The disease score value for which the probability of belonging to the second ‘disease’ component of mixture model was ten times greater than the probability of belonging to the first ‘neutral’. 1 Rubino et al., 2012 Disease Scores Frequency mar. ’15 Mariangela Santorsola Disease Benign
  • 8. mar. ’15 Mariangela Santorsola Potentially affecting function mitochondrial mutations Low nucleotide variability value High disease score Nucleotide variability/Disease score correlation of all possible non-synonymous mutations
  • 9. mar. ’15 Mariangela Santorsola Nucleotide variability cutoff Nucleotide variability Frequency The nucleotide variability cutoff below which a mutation may be considered potentially deleterious was determined as the third quartile of the distribution of variability values associated to the 816 non-synonymous events featuring disease score above the established DS-cutoff. 3rd Qu. <= 0.0026
  • 10. mar. ’15 Mariangela Santorsola Polymorphic and haplogroup-associated vs rare mutations MToolBox Functional annotation • Hg_MHCS (Major Haplogroup Consensus Sequences(1) ) • rCRS (revised Cambridge Reference Sequence(2)) • RSRS (Reconstructed Sapiens Reference Sequence(3)) Phylogenetic relationships among virtual Major Haplogroup Consensus Sequences and two real mitochondrial sequences (Phylotree(4)) for each haplogroup 1 Calabrese et al., 2014 2 Anderson et al., 1981 3 Behar et al., 2012 4 van Oven and Kayser 2009
  • 11. Prioritization criteria of mtDNA non-synonymous mutations affecting-function for future analysis mar. ’15 Mariangela Santorsola Recognized by three reference sequences Occurring in non-haplogroup defining sites Featuring nucleotide variability values <= 0.0026 Featuring Disease score >= 0.4311 Heteroplasmy level (*)
  • 12. mar. ’15 Mariangela Santorsola Application of MToolBox and prioritization criteria Check of tumor-specific nature by sequencing mtDNA from blood tissues of the same individuals • 77.78% of prioritized variants were tumor-specific 21 ovarian tumor pre-chemio mtDNA samples Sample Variant Allele HF Locus AA Change Nt Var Disease score Tumor- specific/Germline EOC5 3380A 0.75 MT-ND1 R25Q 0.0003 0.8764 tumor-specific EOC40 14969C 0.50 MT-CYB Y75H 0.0003 0.8526 tumor-specific EOC16 9837A 0.45 MT-CO3 G211S 0.0000 0.8379 tumor-specific EOC20 15255C 0.80 MT-CYB V170A 0.0000 0.8195 tumor-specific EOC20 10696T 0.75 MT-ND4L A76V 0.0000 0.7810 tumor-specific EOC14 6121C 0.45 MT-CO1 I73T 0.0007 0.7054 tumor-specific EOC5 8412C 1.00 MT-ATP8 M16T 0.0023 0.6587 germline EOC32 14249A 1.00 MT-ND6 A142V 0.0020 0.4498 germline List of 8/268 prioritized non-synonymous affecting function mutations 6/21 mutated samples (33%) • All synonymous mutations, occurring in site showing variability values below the variability cutoff, resulted to be germline
  • 13. Acknowledgements mar. ’15 Mariangela Santorsola Department of Medical and Surgical Sciences University of Bologna Giuseppe Gasparre Claudia Calabrese Rosanna Clima Giulia Girolimetti Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari Prof Marcella Attimonelli Saverio Vicario Domenico Simone Maria Angela Diroma