SlideShare a Scribd company logo
João André Carriço, PhD
Microbiology Institute/Institute for Molecular Medicine
Faculty of Medicine, University of Lisbon
Portugal
Integrating phylogenetic inference and
metadata visualization for NGS data
http://im.fm.ul.pt
http://imm.fm.ul.pt
http://www.joaocarrico.info
Workshop 20:
Typing of Bacterial Pathogens in 2015:
Expanding the scope of NGS
Conflicts of Interest
NOTHING TO DISCLOSE
Charles Darwin ‘s “tree of life” in
Notebook B, 1837-1838
Darwin and the tree of life
Phylogenetics methods aim to infer the
relationships between the taxa trying to define
the common ancestors between taxa
Assumptions: the characters being compared
are homologous and independent, i.e. they had
shared a common ancestor and each character
suffered evolutive forces individually
Phylogenetic Inference
ATTGGGG ATGGGGG
AT?GGGG
Software for Phylogenetic trees: based
on sequence alignments• MEGA
• http://www.megasoftware.net/
• Splitstree
• http://www.splitstree.org/
• Geneious (http://www.geneious.com/)
• www.geneious.com
• FastTree
• http://www.microbesonline.org/fasttree
• RAxML
• http://sco.h-
its.org/exelixis/web/software/raxml/index.html
• PHYLIP
• http://evolution.genetics.washington.edu/phylip.ht
ml
• BEAST
• http://beast.bio.ed.ac.uk/
And many many others…
Sequence Alignment methods
Kos, V.N. et al., 2012. Comparative genomics of vancomycin-resistant Staphylococcus aureus
strains and their positions within the clade most commonly associated with Methicillin-resistant S.
aureus hospital-acquired infection in the United States. mBio, 3(3).
Maximum Likelihood tree of concatenated SICOs
Sequence Alignment methods
Maximum Likelihood tree of concatenated SICOs
Caveats:
• Computationally intensive: some methods can’t be
applied to hundreds to thousands of strains
• Require specialized method and software
knowledge for parameter definition
• Some phenomena violate the assumptions
(recombination, convergent evolution,etc)
Sequence Based Typing Methodsx
Strain genomic information encoded as a numeric
sequence
Sanger sequencing:
MLST: Gene allele identifier
MLVA: Number of repeats
NGS approaches:
Gene-by-Gene / allele based:
wgMLST: core + pan genome genes are represented
cgMLST: just core genome
SNP Typing : Polymorphism
To each unique gene sequence
(allele)
is attributed an integer ID,
by comparison with online DBs 
Allelic profile: 
   12 - 9 - 11 - 7 - 11 - 20 - 3
 
Each allelic profile, aka ST, is
unequivocally identified by an
integer.
Single locus variant (SLV):
Double locus variant (DLV):
Triple locus variant (TLV):
12
12
10
- 10
- 10
- 10
- 11
- 11
- 11
- 7
- 11
- 11
- 11
- 11
- 11
- 20
- 20
- 2
- 3
- 3
- 3
Bacterial 
chromosome
MLST
SNP NGS Approach
Good approach in Monomorphic species.
For non-monomorphic species , SNPs in genome areas where
recombination was detected need to be removed to avoid confounding the
phylogenetic signal.
sample
NGS
WGS
reads
Mapping to reference
Fasta File with SNPs
fastq files
BAM files
VCF files
Gene by Gene NGS Approach
Software currently available:
BIGSDB (Jolley, K.A. & Maiden, M.C.J., 2010. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics)
RIDOM™ SEQSPHERE+ (http://www.ridom.com/seqsphere/)
Central nomenclature server:
Schemas, Allele definitions and identifiers
sample
NGS
WGS
reads
assembly
contigs
Output :Allelic Profile
Algorithms for Phylogenetic Inference
Based on the distance matrix:
•Hierarchical clustering methods: UPGMA, Single Linkage
and Complete linkage
•Neighbor-joining
•Minimum Spanning Trees
Maximum Parsimony methods
Based on rules (Graphic Matroids)
•goeBURST
Maximum Likelihood methods
Bayesian inference methods
Sequence alignments
Sequence alignments
Sequence alignments
Sequence alignments
Allelic Profiles
Allelic Profiles
Infering phylogeny from allelic profiles
Assume that you have only 3 genes and each number corresponds to a different
allele for each gene. The minimum assumption is assuming that a SLV may
correspond to a possible phylogenetic descent.
1-1-1 1-1-2 1-2-1 1-2-2 1-2-3
SLV SLV SLV
SLV SLV
SLV
11 possible
trees….
eBURST model
More similar STs should denote closely related strains
from an evolutionary point of view.
STs with more SLVs can be regarded has a common
ancestor.
Links between STs depict descent relations.
With these assumptions, connected STs should share an
evolutionary path.
Maynard Smith J., et al. 2000. Bioessays 22:1115-
eBURST
Feil E. et al, J Bac 2004
1-1-1
1-1-2
1-2-1
1-2-2
1-2-3
goeBURST
#SLVs #DLVs #TLVs Freq STid
2 2 0 1 1
2 2 0 1 2
3 1 0 1 3
3 1 0 1 4
2 2 0 1 5
Implementation of the eBURST rules as a graphic
matroid problem, allows for a globally optimal solution of
the placement of the ST links.
Francisco et al, BMC Bioinf, 2009
More SLVs / lower ID
Connects to ST4 because #SLVs
Final goeBURST tree :
unique solution
guaranteed
Applying goeBURST
1-1-1 1-1-2 1-2-1 1-2-2 1-2-3
SLV SLV SLV
SLV SLV
SLV
11 possible
trees….
All these are valid goeBURST solutions. The
tie break would need to be the ST ID if all of
them would have the same frequency in the
dataset
goeBURST output examples
Largest S. aureus
MLST CC
1067 of 2650 STs total
2nd
largest S. aureus CC
252 Sts
goeBURST FULL MST
• The goeBURST rules can be expanded to any number of
loci while maintaining the same assumptions of the
evolutionary model behind
• Adds an evolutionary model to the basic Minnimum
Spanning Tree approach
• Advantage: very fast to calculate compared to phylogenetic
analysis algorithms
• Advantage: If the strains are closely related we have the
internal nodes defined as strains as opposed to any
traditional phylogenetic methodology
• Disadvantage: does not create internal nodes as putative
recent common ancestral
Allelic profiles
Accessory data
(“metadata”)
Antibiogram
Serotype
Origin info (patient)
….
Analysis
(goeBURST)
Other typing method
Present the data in a meaningful way
Integrating Data Analysis and Visualization
Using Phyloviz (http://www.phyloviz.net)
PHYLOViZ
Can be easily applied to:
-MLST
-MLVA
-SNP data*
-Gene Presence/absence
*Conversion of VCF to PHYLOViZ:
https://github.com/nickloman/misc-genomics-tools/blob/master/scripts/vcf2phyloviz.py
(Thanks Nick!)
PHYLOViZ
Example of visualization with MLST+ (core genome) data of
VRSA and MRSA strains
Core genome comparison - Workflow
Core genome from all available fully sequenced S.aureus Strains in NCBI
Using strain COL genes as reference
1866 target loci found for a cgMLST schema (RIDOM Seqsphere+)
Call alleles for strains under study
Removing loci with missing data in the strains under analysis
1542 target genes kept for whole genome comparison
goeBURST Minimum Spanning Tree of the resulting allelic profiles
(PHYLOViZ software)
Core genome comparison
VRSA
NCBI strains
US VRSA strains (Kos et al)
HSM strains
MRSA srp
VRS5
MLST+: 1542 genes
Core genome genes found in all strains
65
“Live”
Demonstration
PHYLOViZ
PROs:
Handles thousands of profiles
Fast calculation
Easy to annotate and explore metadata
Allows for basic statistics on profiles and metadata
Allows for advanced statistics on MSTs
(PLoS One. 2015 Mar 23;10(3):e0119315)
Exports high quality graphical formats
Allows plugin development
CONs:
goeBURST and goeBURST MST only
(Neighbour Joining and UPGMA soon)
JAVA knowledge to code new plugins
Final Remarks
Phylogenetic inference has always an underlying model. The
choice of method depends on what data is being analyzed and
the underlying question
With the increasing availability of bacterial genomes, the methods
that allow their comparison need to be efficient and scalable
Metadata should always be use to evaluate the algorithm results
PHYLOViZ provides a visualization framework to
analyze inferred patterns of descent based on goeBURST ,
including detailed statistics and allows easy integration of
metadata on algorithm results
Any sequence-based typing method that generates allelic profiles
can be analyzed by this framework, including any NGS derived
schema (ie cgMLST, SNPs)
Ongoing Phyloviz work
Modular plugin architecture
  Allows expansion and addition of new
capabilities
  Other analysis algorithms/ custom rules
 
New visualization modules
 Allow the analysis of other data types
 Complementary statistics modules
 
Try to address user’s needs…
  We need your feedback!
 Phyloviz is open-source freeware software
 
Alexandre Francisco
 Cátia Vaz 
Pedro Monteiro
Mário Ramirez
 José Melo-Cristino 
Acknowledgements
Initial funding from Fundação para a Ciência e Tecnologia
Draft Scientific Programme:
Plenaries:
1)Small Scale Microbial Epidemiology
2)Large Scale Microbial Epidemiology
3)Bioinformatics for Genome-based Microbial Epidemiology
4)Population Genetics: Pathogen Emergence
5)Population Dynamics : Transmission networks and
surveillance
6)Molecular Epidemiology for Global Health and One Health
Parallel Sessions
1)Food and Environmental pathogens
2)Microbial Forensics
3)Virus
4)Fungi and Yeasts
5)Novel Diagnostics methodologies
6)Novel Typing approaches
7)Phylogenetic Inference
8)Interactive Illustration Platforms
Save thedate !
Phyloviz Visualization Examples
Phyloviz
Burkholderia pseudomallei
Clinical
animal 
NA 
community
Hospital
Surv/Outb 
Enterococcus faecium
Streptococcus pneumoniae CC90
Coloured by country of origin
Streptococcus pneumoniae
10 largest clonal complexes coloured by 
serotype

More Related Content

PPTX
ECCMID 2016 - How to build actionable virulome databases
João André Carriço
 
PPTX
Computational Resources In Infectious Disease
João André Carriço
 
PPTX
Eccmid meet the expert 2015
João André Carriço
 
PPTX
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
João André Carriço
 
PPTX
Common languages in genomic epidemiology: from ontologies to algorithms
João André Carriço
 
PPTX
Software Pipelines: The Good, The Bad and The Ugly
João André Carriço
 
PPTX
Making Use of NGS Data: From Reads to Trees and Annotations
João André Carriço
 
PPT
How to compare typing techniques: do’s and Don’t’s
João André Carriço
 
ECCMID 2016 - How to build actionable virulome databases
João André Carriço
 
Computational Resources In Infectious Disease
João André Carriço
 
Eccmid meet the expert 2015
João André Carriço
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
João André Carriço
 
Common languages in genomic epidemiology: from ontologies to algorithms
João André Carriço
 
Software Pipelines: The Good, The Bad and The Ugly
João André Carriço
 
Making Use of NGS Data: From Reads to Trees and Annotations
João André Carriço
 
How to compare typing techniques: do’s and Don’t’s
João André Carriço
 

What's hot (20)

PDF
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Nathan Olson
 
PDF
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
Torsten Seemann
 
PDF
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
Torsten Seemann
 
PPTX
Choosing the Right Microbial Typing Method: A Quantitative Approach
João André Carriço
 
PDF
Introduction to 16S Microbiome Analysis
Bioinformatics and Computational Biosciences Branch
 
PDF
NGS and the molecular basis of disease: a practical view
Vall d'Hebron Institute of Research (VHIR)
 
PPTX
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
nist-spin
 
PPTX
Haendel clingenetics.3.14.14
mhaendel
 
PPTX
Next generation sequencing by Muhammad Abbas
MuhammadAbbaskhan9
 
PDF
zandona14nipsA0
Pia Sen
 
PDF
SPIN Workshop Microbial Genomics @NIST
Nathan Olson
 
PDF
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
VHIR Vall d’Hebron Institut de Recerca
 
PPTX
Cell Authentication By STR Profiling
Creative-Bioarray
 
PPTX
GMI proficiency testing- Progress report 2016
ExternalEvents
 
PPTX
Data analytics challenges in genomics
mikaelhuss
 
PDF
Pattemore 2015
Julie Pattemore
 
PDF
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Nathan Olson
 
PPTX
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Luca Cozzuto
 
PDF
Introduction to 16S Analysis with NGS - BMR Genomics
Andrea Telatin
 
PPT
Metagenomics sequencing
cdgenomics525
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Nathan Olson
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
Torsten Seemann
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
Torsten Seemann
 
Choosing the Right Microbial Typing Method: A Quantitative Approach
João André Carriço
 
Introduction to 16S Microbiome Analysis
Bioinformatics and Computational Biosciences Branch
 
NGS and the molecular basis of disease: a practical view
Vall d'Hebron Institute of Research (VHIR)
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
nist-spin
 
Haendel clingenetics.3.14.14
mhaendel
 
Next generation sequencing by Muhammad Abbas
MuhammadAbbaskhan9
 
zandona14nipsA0
Pia Sen
 
SPIN Workshop Microbial Genomics @NIST
Nathan Olson
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
VHIR Vall d’Hebron Institut de Recerca
 
Cell Authentication By STR Profiling
Creative-Bioarray
 
GMI proficiency testing- Progress report 2016
ExternalEvents
 
Data analytics challenges in genomics
mikaelhuss
 
Pattemore 2015
Julie Pattemore
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Nathan Olson
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Luca Cozzuto
 
Introduction to 16S Analysis with NGS - BMR Genomics
Andrea Telatin
 
Metagenomics sequencing
cdgenomics525
 
Ad

Similar to Integrating phylogenetic inference and metadata visualization for NGS data (20)

PPTX
Toolbox for bacterial population analysis using NGS
Mirko Rossi
 
PDF
MIB200A at UCDavis Module: Microbial Phylogeny; Class 3
Jonathan Eisen
 
PDF
07_Phylogeny_2022.pdf
Kristen DeAngelis
 
PDF
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
European Centre for Disease Prevention and Control (ECDC)
 
PDF
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Jonathan Eisen
 
PDF
Genomic inference of the evolution of life history traits
boussau
 
PPT
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Joe Parker
 
PPT
Interpreting ‘tree space’ in the context of very large empirical datasets
Joe Parker
 
PPT
Bioinformatica 24-11-2011-t6-phylogenetics
Prof. Wim Van Criekinge
 
PPTX
Zunera-Lecture-Introduction to Phylogenetic Analysis-V1.pptx
MuhammadHassan592508
 
PDF
2015 12-09 nmdd
Karin Lagesen
 
PPSX
Barker immemxi final March 2016
IRIDA_community
 
PDF
Talk by J. Eisen for NZ Computational Genomics meeting
Jonathan Eisen
 
PPTX
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
Nick Loman
 
PPTX
phy prAC.pptx
sworna kumari chithiraivelu
 
PDF
EVE161: Microbial Phylogenomics - Class 1 - Introduction
Jonathan Eisen
 
PDF
Lecture 02 (2 04-2021) phylogeny
Kristen DeAngelis
 
PPTX
Lec (7) - PHYLOGENETIC_ANALYSIS.pptx
samikhlil
 
PDF
Genomic surveillance of Rift Valley fever virus
ILRI
 
PPTX
Beiko cms final
beiko
 
Toolbox for bacterial population analysis using NGS
Mirko Rossi
 
MIB200A at UCDavis Module: Microbial Phylogeny; Class 3
Jonathan Eisen
 
07_Phylogeny_2022.pdf
Kristen DeAngelis
 
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
European Centre for Disease Prevention and Control (ECDC)
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Jonathan Eisen
 
Genomic inference of the evolution of life history traits
boussau
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Joe Parker
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Joe Parker
 
Bioinformatica 24-11-2011-t6-phylogenetics
Prof. Wim Van Criekinge
 
Zunera-Lecture-Introduction to Phylogenetic Analysis-V1.pptx
MuhammadHassan592508
 
2015 12-09 nmdd
Karin Lagesen
 
Barker immemxi final March 2016
IRIDA_community
 
Talk by J. Eisen for NZ Computational Genomics meeting
Jonathan Eisen
 
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
Nick Loman
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
Jonathan Eisen
 
Lecture 02 (2 04-2021) phylogeny
Kristen DeAngelis
 
Lec (7) - PHYLOGENETIC_ANALYSIS.pptx
samikhlil
 
Genomic surveillance of Rift Valley fever virus
ILRI
 
Beiko cms final
beiko
 
Ad

Recently uploaded (20)

PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PPTX
How to Apply for a Job From Odoo 18 Website
Celine George
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
Basics and rules of probability with real-life uses
ravatkaran694
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
How to Apply for a Job From Odoo 18 Website
Celine George
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 

Integrating phylogenetic inference and metadata visualization for NGS data

  • 1. João André Carriço, PhD Microbiology Institute/Institute for Molecular Medicine Faculty of Medicine, University of Lisbon Portugal Integrating phylogenetic inference and metadata visualization for NGS data http://im.fm.ul.pt http://imm.fm.ul.pt http://www.joaocarrico.info Workshop 20: Typing of Bacterial Pathogens in 2015: Expanding the scope of NGS
  • 3. Charles Darwin ‘s “tree of life” in Notebook B, 1837-1838 Darwin and the tree of life
  • 4. Phylogenetics methods aim to infer the relationships between the taxa trying to define the common ancestors between taxa Assumptions: the characters being compared are homologous and independent, i.e. they had shared a common ancestor and each character suffered evolutive forces individually Phylogenetic Inference ATTGGGG ATGGGGG AT?GGGG
  • 5. Software for Phylogenetic trees: based on sequence alignments• MEGA • http://www.megasoftware.net/ • Splitstree • http://www.splitstree.org/ • Geneious (http://www.geneious.com/) • www.geneious.com • FastTree • http://www.microbesonline.org/fasttree • RAxML • http://sco.h- its.org/exelixis/web/software/raxml/index.html • PHYLIP • http://evolution.genetics.washington.edu/phylip.ht ml • BEAST • http://beast.bio.ed.ac.uk/ And many many others…
  • 6. Sequence Alignment methods Kos, V.N. et al., 2012. Comparative genomics of vancomycin-resistant Staphylococcus aureus strains and their positions within the clade most commonly associated with Methicillin-resistant S. aureus hospital-acquired infection in the United States. mBio, 3(3). Maximum Likelihood tree of concatenated SICOs
  • 7. Sequence Alignment methods Maximum Likelihood tree of concatenated SICOs Caveats: • Computationally intensive: some methods can’t be applied to hundreds to thousands of strains • Require specialized method and software knowledge for parameter definition • Some phenomena violate the assumptions (recombination, convergent evolution,etc)
  • 8. Sequence Based Typing Methodsx Strain genomic information encoded as a numeric sequence Sanger sequencing: MLST: Gene allele identifier MLVA: Number of repeats NGS approaches: Gene-by-Gene / allele based: wgMLST: core + pan genome genes are represented cgMLST: just core genome SNP Typing : Polymorphism
  • 9. To each unique gene sequence (allele) is attributed an integer ID, by comparison with online DBs  Allelic profile:     12 - 9 - 11 - 7 - 11 - 20 - 3   Each allelic profile, aka ST, is unequivocally identified by an integer. Single locus variant (SLV): Double locus variant (DLV): Triple locus variant (TLV): 12 12 10 - 10 - 10 - 10 - 11 - 11 - 11 - 7 - 11 - 11 - 11 - 11 - 11 - 20 - 20 - 2 - 3 - 3 - 3 Bacterial  chromosome MLST
  • 10. SNP NGS Approach Good approach in Monomorphic species. For non-monomorphic species , SNPs in genome areas where recombination was detected need to be removed to avoid confounding the phylogenetic signal. sample NGS WGS reads Mapping to reference Fasta File with SNPs fastq files BAM files VCF files
  • 11. Gene by Gene NGS Approach Software currently available: BIGSDB (Jolley, K.A. & Maiden, M.C.J., 2010. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics) RIDOM™ SEQSPHERE+ (http://www.ridom.com/seqsphere/) Central nomenclature server: Schemas, Allele definitions and identifiers sample NGS WGS reads assembly contigs Output :Allelic Profile
  • 12. Algorithms for Phylogenetic Inference Based on the distance matrix: •Hierarchical clustering methods: UPGMA, Single Linkage and Complete linkage •Neighbor-joining •Minimum Spanning Trees Maximum Parsimony methods Based on rules (Graphic Matroids) •goeBURST Maximum Likelihood methods Bayesian inference methods Sequence alignments Sequence alignments Sequence alignments Sequence alignments Allelic Profiles Allelic Profiles
  • 13. Infering phylogeny from allelic profiles Assume that you have only 3 genes and each number corresponds to a different allele for each gene. The minimum assumption is assuming that a SLV may correspond to a possible phylogenetic descent. 1-1-1 1-1-2 1-2-1 1-2-2 1-2-3 SLV SLV SLV SLV SLV SLV 11 possible trees….
  • 14. eBURST model More similar STs should denote closely related strains from an evolutionary point of view. STs with more SLVs can be regarded has a common ancestor. Links between STs depict descent relations. With these assumptions, connected STs should share an evolutionary path. Maynard Smith J., et al. 2000. Bioessays 22:1115- eBURST Feil E. et al, J Bac 2004
  • 15. 1-1-1 1-1-2 1-2-1 1-2-2 1-2-3 goeBURST #SLVs #DLVs #TLVs Freq STid 2 2 0 1 1 2 2 0 1 2 3 1 0 1 3 3 1 0 1 4 2 2 0 1 5 Implementation of the eBURST rules as a graphic matroid problem, allows for a globally optimal solution of the placement of the ST links. Francisco et al, BMC Bioinf, 2009 More SLVs / lower ID Connects to ST4 because #SLVs Final goeBURST tree : unique solution guaranteed
  • 16. Applying goeBURST 1-1-1 1-1-2 1-2-1 1-2-2 1-2-3 SLV SLV SLV SLV SLV SLV 11 possible trees…. All these are valid goeBURST solutions. The tie break would need to be the ST ID if all of them would have the same frequency in the dataset
  • 17. goeBURST output examples Largest S. aureus MLST CC 1067 of 2650 STs total 2nd largest S. aureus CC 252 Sts
  • 18. goeBURST FULL MST • The goeBURST rules can be expanded to any number of loci while maintaining the same assumptions of the evolutionary model behind • Adds an evolutionary model to the basic Minnimum Spanning Tree approach • Advantage: very fast to calculate compared to phylogenetic analysis algorithms • Advantage: If the strains are closely related we have the internal nodes defined as strains as opposed to any traditional phylogenetic methodology • Disadvantage: does not create internal nodes as putative recent common ancestral
  • 19. Allelic profiles Accessory data (“metadata”) Antibiogram Serotype Origin info (patient) …. Analysis (goeBURST) Other typing method Present the data in a meaningful way Integrating Data Analysis and Visualization
  • 21. PHYLOViZ Can be easily applied to: -MLST -MLVA -SNP data* -Gene Presence/absence *Conversion of VCF to PHYLOViZ: https://github.com/nickloman/misc-genomics-tools/blob/master/scripts/vcf2phyloviz.py (Thanks Nick!)
  • 22. PHYLOViZ Example of visualization with MLST+ (core genome) data of VRSA and MRSA strains
  • 23. Core genome comparison - Workflow Core genome from all available fully sequenced S.aureus Strains in NCBI Using strain COL genes as reference 1866 target loci found for a cgMLST schema (RIDOM Seqsphere+) Call alleles for strains under study Removing loci with missing data in the strains under analysis 1542 target genes kept for whole genome comparison goeBURST Minimum Spanning Tree of the resulting allelic profiles (PHYLOViZ software)
  • 24. Core genome comparison VRSA NCBI strains US VRSA strains (Kos et al) HSM strains MRSA srp VRS5 MLST+: 1542 genes Core genome genes found in all strains 65
  • 26. PHYLOViZ PROs: Handles thousands of profiles Fast calculation Easy to annotate and explore metadata Allows for basic statistics on profiles and metadata Allows for advanced statistics on MSTs (PLoS One. 2015 Mar 23;10(3):e0119315) Exports high quality graphical formats Allows plugin development CONs: goeBURST and goeBURST MST only (Neighbour Joining and UPGMA soon) JAVA knowledge to code new plugins
  • 27. Final Remarks Phylogenetic inference has always an underlying model. The choice of method depends on what data is being analyzed and the underlying question With the increasing availability of bacterial genomes, the methods that allow their comparison need to be efficient and scalable Metadata should always be use to evaluate the algorithm results PHYLOViZ provides a visualization framework to analyze inferred patterns of descent based on goeBURST , including detailed statistics and allows easy integration of metadata on algorithm results Any sequence-based typing method that generates allelic profiles can be analyzed by this framework, including any NGS derived schema (ie cgMLST, SNPs)
  • 28. Ongoing Phyloviz work Modular plugin architecture   Allows expansion and addition of new capabilities   Other analysis algorithms/ custom rules   New visualization modules  Allow the analysis of other data types  Complementary statistics modules   Try to address user’s needs…   We need your feedback!  Phyloviz is open-source freeware software  
  • 30. Draft Scientific Programme: Plenaries: 1)Small Scale Microbial Epidemiology 2)Large Scale Microbial Epidemiology 3)Bioinformatics for Genome-based Microbial Epidemiology 4)Population Genetics: Pathogen Emergence 5)Population Dynamics : Transmission networks and surveillance 6)Molecular Epidemiology for Global Health and One Health Parallel Sessions 1)Food and Environmental pathogens 2)Microbial Forensics 3)Virus 4)Fungi and Yeasts 5)Novel Diagnostics methodologies 6)Novel Typing approaches 7)Phylogenetic Inference 8)Interactive Illustration Platforms Save thedate !

Editor's Notes

  • #18: Redo Examples
  • #28: Add non-phyloviz comments