SNP Mining in Cereal Crops
Presented By:
SAURABH PANDEY
PALB-3252
Sr. M.Sc.(Ag.)
Seminar flow
• Introduction
• What is SNPs
-Detection methods,
-Techniques used for detection
• Different software used
• Crop lists where SNPs were detected
• Improvement works done
• Case Study
09-05-2015 Department of Plant Biotechnology 2
Introduction
09-05-2015 Department of Plant Biotechnology 3
Scenario of Molecular Markers
09-05-2015 Department of Plant Biotechnology 4
Molecular markers
First Generation
Second Generation
Third Generation
•RFLP-
Hybridization
based
•RAPD-PCR based
•AFLPs
•SSRs
•ESTs-used in
functional
genomics
•SNPs
09-05-2015 Department of Plant Biotechnology 5
Single
Nucleotide
Polymorphisms
DNA sequence variations that occur when a single
nucleotide (A, T, C or G) in the genome sequence is
altered.
SNP: Single DNA base variation found >1%
Mutation: Single DNA base variation found <1%
SNPs
09-05-2015 Department of Plant Biotechnology 6
Coding SNPs cSNP Positions that fall within the coding
regions of genes
Regulatory SNPs rSNP Positions that fall in regulatory
regions of genes
Synonymous SNPs sSNP Positions in exons that do not change
the codon to substitute an amino
acid
Non-synonymous SNPs nsSNP Positions that incur an amino acid
substitution
Intronic SNPs iSNP Positions that fall within introns
SNP functional classes
Mooney et al, 2005
09-05-2015 Department of Plant Biotechnology 7
SNPs Importance
Abundant in the genome and frequency of SNPs is higher i.e.
 Maize has 1SNP/31bp in non-coding, 1SNP/124bp in coding
region (Ching et al, 2002)
 In rice whole genome shotgun sequences for japonica and indica
3-27 SNP/kb
In Soybean 280SNPs/76.3kb of genomic DNA
Automation is easy. Due to biallelic nature of these molecular
markers and these are directly based on DNA sequence.
• Millions of sequences are available in public database for
several crops i.e. wheat has 132,000 ESTs,
• Till now no. of SNPs has been discovered i.e. Barley have
29447 SNPs , Rice 64837 SNPs
(Yu et.al, 2005)
(Zhu et al,2003)
09-05-2015 Department of Plant Biotechnology 8
SNPs Application
• Comparative mapping is performed using datbases.
• Linkage Disequilibrium based Association Studies- eg. Arabidopsis
thaliana
• Marker Assisted Selection(MAS)- used in soybean by identification of SNP
markers with GmNARK gene. Gene indicates hypernodulating mutation
(Kim et al, 2005)
• Genetic diversity studies- In maize using SNPs at 21 loci in chromosome1
(Tenalillon et al, 2005)
• High resolution genetic mapping -5 SNPs are genetically mapped in melon
(Morales et al, 2004)
• Mapping of EST sequences can be done with SNPs(Davis G et al, 2001)
• Identification of cloned genes with SNPs- Secondary application of SNPs
(Jehan and Lakhanpaul, 2006)
09-05-2015 Department of Plant Biotechnology 9
Crop Gene function Trait associated
with SNP
Utility of SNP
Barley (Hordeum
vulgare)
B-Amylase gene Degradation of
starch
Enzyme
thermostability
To select barley seedling carying
superior allele of B-Amylase
Wild Barley (H.
spontaneum)
Dhn1 & Dhn5
(Dehydrin)
Adaptive response of
plant to enviromental
stress
Resistance to water
stress
For water stress adaptation
Rice (Oryza
sativa)
1) Wx (waxy)
gene
2) Sd-1(semi-
dwarfing
)gene
1) Control amylose
synthesis by coding
starch synthase
enyme
2) Dwarfism
1) Amylose content
2) Dwarfism
1) For development of new
cultivar
2) Selection of sd-1 in breeding
programme
Wheat (Triticum
aestivum )
1) Pin b
(Puroindolin
b)
2) Rht 1 & Rht
2 gene
1) Thicken the coat
2) Dwarfism
1) Grain hardiness
2) Dwarfism
Breeding program
Soybean (Glycine
max)
Rhg 1& Rhg 4 Soyabean Cyst
nematode resistance
allele
SCN resistance Breeding programme
Onion (Allium
cepa)
SNP allele in
Plastosome
Responsible for
CMS
Cytoplasmic male
sterility and fertility
For development of CMS lines
Musturd (Brassica
juncea )
FAE 1 gene Fatty acid elongase Erucic acid content Breeding programme
09-05-2015 Department of Plant Biotechnology 10
CROP Citation No.of SNPs detected
Alfalfa Li et al 2012 872.384
Cotton Bayers et al 2012
Zhu et al 2014
151,712
40,503
Peanut Khera et al 2013
Zhou et al 2014
8,486
1,765
Potato Uitdewilligen
et al., 2013
42,625
Rapeseed Trick et al 2009
Hu et al 2012
Huang et al 2013
41,593
655
892,803
Wheat Allen, 2013
Cavanagh et al., 2013
10,251
25,454
White Clover Nagy et al.,2013 208,854
SNP identification in polyploids
09-05-2015 Department of Plant Biotechnology 11
 Approximately 18.9 million single nucleotide polymorphisms (SNPs) in rice were
discovered when aligned to the reference genome of the temperate japonica variety,
Nipponbare.
 Phylogenetic analyses based on SNP data confirmed differentiation of the O. sativa
gene pool into 5 varietal groups – indica, aus/boro, basmati/sadri, tropical japonica and
temperate japonica.
09-05-2015 Department of Plant Biotechnology 12
The trends of crop sequencing toward breeding practice
Yang et al., 2015
09-05-2015 Department of Plant Biotechnology 13
SNPs Discovery
• Two approaches have been adopted for the
discovery of Novel SNPs
 In- vitro Discovery (new sequence data is generated)
 In-silico methods (analysis of available sequence
data)
09-05-2015 Department of Plant Biotechnology 14
In vitro approaches
Non sequencing
based methods
Sequencing
based methods Resequencing
based methods
09-05-2015 Department of Plant Biotechnology 15
Non sequencing methods
Restriction based techniques DNA conformation technique
RFLP (Restriction Fragment
Length Polymorphism)
CAPS(Cleaved Amplified
Polymorphic Sequence)
dCAPS(Derived Cleaved
Amplified Polymorphic
Sequence
Chip Based methods
Use of probes for hybridization of
whole genome
SSCP (Single Strand Conformational
Polymorphism)
DGGE (Denaturing Gradient Gel
Electrophoresis)
TGGE (Temperature Gradient Gel
Electrophoresis)
Heteroduplex Analysis
TILLING
Target Induced Local Lesion IN Genome-
Cel I Endonuclease
09-05-2015 Department of Plant Biotechnology 16
Based on Sequencing methods
Locus specific PCR amplification
Alignment of available genomic sequences
Whole genome shotgun method
Overlapping region BACs and PACs
Reduced Representation Shotgun(RRS)
09-05-2015 Department of Plant Biotechnology 17
Finding SNPs: Sequence-based SNP Mining
RANDOM Sequence Overlap - SNP Discovery
GTTACGCCAATACAGGATCCAGGAGATTACC
GTTACGCCAATACAGCATCCAGGAGATTACC
Genomic
RRS
Library
Shotgun
Overlap
BAC
Library
BAC
Overlap
DNA
SEQUENCING
mRNA
cDNA
Library
EST
Overlap
Random
Shotgun
Align to
Reference
> 11 Million SNPs
G
C
Validated - 5..6 MILLON SNPS
Resequencing methods
Pyrosequencing Mass Array
•Sequence by synthesis
•Primer +dNTP  Primer+1N+ PPi
•APS + PPi  ATP
Adenosine Phospho Sulphate
•Luciferin + ATP  Oxyluciferien + light
• Based on MALDI-TOF MS
Matrix Assisted Laser
Desorption/Ionization –Time
of Flight Mass
Spectrophotometery
•Based on variation in
molecular wt. of 4 nucleotides
09-05-2015 Department of Plant Biotechnology 19
In silico methods
 Use of software for discovery of SNP from available
sequence database
Manually is not possible, to find out SNP or single
nucleotide difference large no. of sequences
Hence various software i.e. Phred, PolyBayes, autoSNPdb,
SNPserver, etc
09-05-2015 Department of Plant Biotechnology 20
09-05-2015 Department of Plant Biotechnology 21
Seal et al., 2014
09-05-2015 Department of Plant Biotechnology 22
09-05-2015 Department of Plant Biotechnology 23
09-05-2015 Department of Plant Biotechnology 24
09-05-2015 Department of Plant Biotechnology 25
Method for SNP
Identification
(Ganal et al.,2009)
Prerequisite Current
false
discovery
rate(%)
Specifics, Limitations
EST Sequence data Large number of
available EST-
sequences
15-50 Dependent on the expression level or need for normalized
libraries, difficulties in the discrimination of orthologous
from paralogous sequences, low sequence quality
Array analysis Unigene sets based
on
EST-sequences,
array
technology
>20 Not all SNPs identified, large genomes require complexity
reduction
Amplicon Resequencing Unigene sets based
on
EST-sequences,
amplification
primers for
many individual
genes
<5 High reliability but costly, detailed haplotype analysis
possible, many lines can be compared, allele frequency
data with pools of DNA
No Genomic Sequence and
Next Generation Sequencing
Technologies
Novel sequencing
technologies,
complexity
reduction methods,
bioinformatic tools
15-25 Generates large amounts of data, costly bioinformatics,
false discovery rate for genomes without full sequence is
relatively high
Genomic Sequence is available
either through conventional
sequencing or next generation
sequencing
Reference genome,
bioinformatic tools
<5-10 Small genomes can be fully sequenced and compared for
SNPs, for large genomes targeted approaches will be
necessary (e.g. exon capture and multiplex amplification)
09-05-2015 Department of Plant Biotechnology 26
SNP Genotyping
For SNPs detection there are more than 30 techniques
available. i.e. Molecular beacons, Padlock probe, Invader
assay etc
All are based on basic principles i.e.
Hybridization
Direct Sequencing
Allele specific primer extension
Single base extension
Endonuclease Cleavage / Ligation
09-05-2015 Department of Plant Biotechnology 27
• Illumina GoldenGate Assay
• Sequenom
• Affymetrix/GeneChip
• SNaPshot
• SNPlex
• Taqman
• Dye Terminator Sequencing
• Pyrosequencing (454)
• Reverse-nucleotide sequencing(Solexa)
• Sequencing by ligation (ABI)
SNPs Genotyping platforms
09-05-2015 Department of Plant Biotechnology 28
Crop Specific SNP
marker system
development
Non Sequencing
method
Resequencing
methods
Sequencing
methods
In silico method
Crop specific SNP
genotyping
platform
development
Application of SNP
genotyping
Oligonucleotide
ligation assay
Invader assay
Molecular
Beacons
Padlock probes
Marker Assisted
Selection
Gene tagging
Fine mapping Association studies
09-05-2015 Department of Plant Biotechnology 29
Case study
09-05-2015 Department of Plant Biotechnology 30
Introduction
• Genome analysis in bread wheat poses substantial challenges
• Single nucleotide polymorphisms (SNPs) represent the most frequent
type of genetic polymorphism and can therefore allow the development
of the highest density of molecular markers.
• Whole-genome Illumina paired read sequence data were generated from
16 Australian bread wheat varieties.
• After filtering to remove poor quality and clonal reads, a total of 13 642
million read pairs remained.
• Alignment of these read pairs to the wheat group 7 and 4AL chromosome
assemblies using strict parameters resulted in 3.05%, 3.76% and 3.43% of
read pairs mapping uniquely to chromosomes 7A, 7B and 7D, respectively.
• SNP calling using the SGSautoSNP pipeline predicted a total of 4 018 311
intervarietal SNPs.
09-05-2015 Department of Plant Biotechnology 31
Material and Methods
• SNP prediction
SNP predication was performed using SGSautoSNP (Lorenc et al., 2012),
with output in snp format for subsequent analysis and gff format for
presentation on a GBrowse genome viewer at www.wheatgenome.info.
• SNP matrix production and transition/transversion ratio analysis
The snp files generated by SGSautoSNP were parsed using a custom
Python script to generate the SNP matrix file
The transition/transversion ratio for each chromosome was calculated
based on bins of 500 SNPs using VCFtools.
• SNP density and gene analysis
The SNP density plots for each chromosome were generated using a
custom Python script that calculates relative density based on a window
size of 50 000 bp.
09-05-2015 Department of Plant Biotechnology 32
Genes identified as being in low-SNP-density regions were
compared with the Swissprot database using BLASTX (BLASTALL
2.2.6) with an E value cut-off 1e-5. The genes with minimum E-
value has been identified in low/high SNP density regions with
UniProtKB entry ID and protein names.
• Validation
A total of 22 SNPs were selected from the three group 7 reference
genomes for validation.
 PCR amplification of the 22 loci was performed using primers
designed to bind to conserved sequence surrounding the SNPs
The purified PCR products were Sanger-sequenced using Big-Dye
3.1 (PerkinElmer, Waltham, MA), using forward and reverse PCR
primers, and analysed using an ABI3730xl.
09-05-2015 Department of Plant Biotechnology 33
Results
Whole-genome Illumina paired read sequence data
After filtering 13 642 million read pairs remained
Alignment to the wheat group 7 and 4AL chromosome assemblies
3.05%, 3.76% and 3.43% of read pairs mapping uniquely to
chromosomes 7A, 7B and 7D, respectively
SNP calling using the SGSautoSNP pipeline
total of 4 018 311 intervarietal SNPs
09-05-2015 Department of Plant Biotechnology 34
• The majority of SNPs were identified on contigs which do not
form part of the syntenic builds and are predominantly within
intergenic regions.
09-05-2015 Department of Plant Biotechnology 35
Figure 2: Ts/Tv ratio across the 7A, 7B and 7D syntenic builds
09-05-2015 Department of Plant Biotechnology 36
Phylogenetic relationships of 16 Australian wheat varieties based on SNP data
obtained in this study.
SNP variation is
146171
SNP variation is
968088
Avg. no. of SNPs between
varieties 465278
09-05-2015 Department of Plant Biotechnology 37
Figure 3: SNP density across the 7A, 7B and 7D syntenic builds.09-05-2015 Department of Plant Biotechnology 38
• A total of 146 genes were predicted to be in low-SNP-density regions, representing
40, 27 and 79 genes on the A, B and D genomes, respectively.
 these genes include MADS box and Myb transcription factors, signal transduction
pathway genes, a sodium transporter, an ironresponsive transcription factor, a
potassium transporter, callose synthase, sucrose synthase and sugar transporters.
• A total of 14 genes were predicted to be in high-SNP-density regions, representing
10, 3 and 1 gene(s) on the A, B and D genomes, respectively.
 These genes include cellulose synthase, argonaute and ethylene response factors.
09-05-2015 Department of Plant Biotechnology 39
The SNPs from the recently published wheat Infinium array (Wang et al. , 2014) were
compared to those predicted by SGSautoSNP. A total of 850 SNPs were identified as
having a match on the group 7 chromosomes at the same position as predicted in our
study . Of these, 482 (57%) were classified as polymorphic single locus, 316 (37%) as
being polymorphic multilocus, while only 52 (6%) were monomorphic.09-05-2015 Department of Plant Biotechnology 40
Conclusion
• This study has revealed a vast number of polymorphisms
occurring within the chromosome 7 homoeologues of
hexaploid wheat among elite Australian varieties.
• This resource is publically available to assist additional
genetic analysis and breeding.
• Furthermore, observed patterns of SNPs across the
homoeologous group 7 chromosomes have provided insight
into the molecular consequences of the evolution and
selection that resulted in modern hexaploid wheat.
09-05-2015 Department of Plant Biotechnology 41
Summary
• SNP are future markers, having high density in
genome.
• Although thousands of SNP markers are widely
used in animal and human genome analysis, their
use in plants is still in its infancy.
• SNP mining can provide better understanding of
crops at the gene level , for the detailed analysis of
germplasm and ultimately for the efficient
management of genetic diversity within plant
breeding on a whole genome level.
09-05-2015 Department of Plant Biotechnology 42
09-05-2015 Department of Plant Biotechnology 43
Thank you

More Related Content

PPTX
Use of SNP-HapMaps in plant breeding
PPTX
Allele mining
PPTX
Association mapping
PPTX
Association mapping
PPT
Marker Assisted Gene Pyramiding for Disease Resistance in Rice
PDF
Omics for crop improvement (new)
PPTX
Genotyping by Sequencing
PPTX
Tilling and eco tilling
Use of SNP-HapMaps in plant breeding
Allele mining
Association mapping
Association mapping
Marker Assisted Gene Pyramiding for Disease Resistance in Rice
Omics for crop improvement (new)
Genotyping by Sequencing
Tilling and eco tilling

What's hot (20)

PPTX
TILLING & ECO-TILLING
PPTX
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
PPTX
Genomic selection
PPTX
Genotyping by sequencing
PDF
Genome wide association mapping
PPTX
Marker free transgenics: concept and approaches
PPT
Omics in plant breeding
PDF
Mapping and QTL
 
PPTX
Molecular markers: Outlook
PPTX
Credit seminar on rice genomics crrected
PPT
Diversity Array technology
PPTX
Bioinformatics intervention in crop improvement
PPTX
Balanced tertiary trismoics - Hybrid seed production
PPTX
PDF
SNP Genotyping Technologies
PPTX
MAGIC POPULATION
PPTX
Molecular Markers
PDF
MAGIC populations and its role in crop improvement
PPTX
Magic population
PPTX
Association mapping
TILLING & ECO-TILLING
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
Genomic selection
Genotyping by sequencing
Genome wide association mapping
Marker free transgenics: concept and approaches
Omics in plant breeding
Mapping and QTL
 
Molecular markers: Outlook
Credit seminar on rice genomics crrected
Diversity Array technology
Bioinformatics intervention in crop improvement
Balanced tertiary trismoics - Hybrid seed production
SNP Genotyping Technologies
MAGIC POPULATION
Molecular Markers
MAGIC populations and its role in crop improvement
Magic population
Association mapping
Ad

Viewers also liked (20)

PDF
Single Nucleotide Polymorphism Analysis (SNPs)
PPTX
The Wheat Genome
PPTX
Genotyping in Breeding programs
PPT
PPTX
Association mapping
PPTX
Comparative genomics presentation
PPTX
What is comparative genomics
PPSX
Functional genomics
PPTX
Comparative genomics
PPTX
Types of genomics ppt
PPTX
Association mapping in plants
PPTX
Homologous Recombination (HR)
PPTX
Single nucleotide polymorphism
PPTX
Different pcr techniques and their application
PPTX
MENDEL; 150 years on
PDF
MixSIH: a mixture model for single individual haplotyping
PPT
Cisgenics- Clean marker assisted Technology
PPTX
cisgenesis and intragenesis
PDF
Comparative Genomics for Marker Development in Cassava
Single Nucleotide Polymorphism Analysis (SNPs)
The Wheat Genome
Genotyping in Breeding programs
Association mapping
Comparative genomics presentation
What is comparative genomics
Functional genomics
Comparative genomics
Types of genomics ppt
Association mapping in plants
Homologous Recombination (HR)
Single nucleotide polymorphism
Different pcr techniques and their application
MENDEL; 150 years on
MixSIH: a mixture model for single individual haplotyping
Cisgenics- Clean marker assisted Technology
cisgenesis and intragenesis
Comparative Genomics for Marker Development in Cassava
Ad

Similar to SNp mining in crops (20)

PPTX
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
PDF
Molecular markers types and applications
 
PPT
DNA Markers Techniques for Plant Varietal Identification
PPTX
Genome in a bottle april 30 2015 hvp Leiden
PPT
15 molecular markers techniques
PPTX
Molecular markers by tahura mariyam ansari
PDF
DNA markers in plant breeding hhhhhhhhhhh
PDF
Development of SSR markers in mungbean
PPTX
Present status and recent developments on available molecular marker.pptx
PPTX
Marker and marker assisted breeding in flower crops
PDF
Application of Genetic Analyzer in AFLP Technique
 
PPTX
2013 Cornell's Plant Breeding and Genetic Seminar Series
PPTX
Catalyzing Plant Science Research with RNA-seq
PPTX
PDF
MGG2003-cDNA-AFLP
PPTX
Next Generation Sequencing
PPTX
Molecular detection of food borne pathogens-presentation
PPTX
Allele mining in crop improvement
PPTX
Snp genotyping
PPTX
Nextgenerationsequencing ngs 131218163555-phpapp02
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Molecular markers types and applications
 
DNA Markers Techniques for Plant Varietal Identification
Genome in a bottle april 30 2015 hvp Leiden
15 molecular markers techniques
Molecular markers by tahura mariyam ansari
DNA markers in plant breeding hhhhhhhhhhh
Development of SSR markers in mungbean
Present status and recent developments on available molecular marker.pptx
Marker and marker assisted breeding in flower crops
Application of Genetic Analyzer in AFLP Technique
 
2013 Cornell's Plant Breeding and Genetic Seminar Series
Catalyzing Plant Science Research with RNA-seq
MGG2003-cDNA-AFLP
Next Generation Sequencing
Molecular detection of food borne pathogens-presentation
Allele mining in crop improvement
Snp genotyping
Nextgenerationsequencing ngs 131218163555-phpapp02

Recently uploaded (20)

PDF
Metabolic Acidosis. pa,oakw,llwla,wwwwqw
PDF
Sumer, Akkad and the mythology of the Toradja Sa'dan.pdf
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPTX
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
PPTX
AP CHEM 1.2 Mass spectroscopy of elements
PPTX
bone as a tissue presentation micky.pptx
PPTX
Understanding the Circulatory System……..
PPTX
02_OpenStax_Chemistry_Slides_20180406 copy.pptx
PDF
CuO Nps photocatalysts 15156456551564161
PPTX
Toxicity Studies in Drug Development Ensuring Safety, Efficacy, and Global Co...
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
congenital heart diseases of burao university.pptx
PDF
Sustainable Biology- Scopes, Principles of sustainiability, Sustainable Resou...
PPTX
Substance Disorders- part different drugs change body
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PDF
Science Form five needed shit SCIENEce so
PPTX
2currentelectricity1-201006102815 (1).pptx
Metabolic Acidosis. pa,oakw,llwla,wwwwqw
Sumer, Akkad and the mythology of the Toradja Sa'dan.pdf
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
AP CHEM 1.2 Mass spectroscopy of elements
bone as a tissue presentation micky.pptx
Understanding the Circulatory System……..
02_OpenStax_Chemistry_Slides_20180406 copy.pptx
CuO Nps photocatalysts 15156456551564161
Toxicity Studies in Drug Development Ensuring Safety, Efficacy, and Global Co...
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
congenital heart diseases of burao university.pptx
Sustainable Biology- Scopes, Principles of sustainiability, Sustainable Resou...
Substance Disorders- part different drugs change body
Enhancing Laboratory Quality Through ISO 15189 Compliance
Science Form five needed shit SCIENEce so
2currentelectricity1-201006102815 (1).pptx

SNp mining in crops

  • 1. SNP Mining in Cereal Crops Presented By: SAURABH PANDEY PALB-3252 Sr. M.Sc.(Ag.)
  • 2. Seminar flow • Introduction • What is SNPs -Detection methods, -Techniques used for detection • Different software used • Crop lists where SNPs were detected • Improvement works done • Case Study 09-05-2015 Department of Plant Biotechnology 2
  • 4. Scenario of Molecular Markers 09-05-2015 Department of Plant Biotechnology 4
  • 5. Molecular markers First Generation Second Generation Third Generation •RFLP- Hybridization based •RAPD-PCR based •AFLPs •SSRs •ESTs-used in functional genomics •SNPs 09-05-2015 Department of Plant Biotechnology 5
  • 6. Single Nucleotide Polymorphisms DNA sequence variations that occur when a single nucleotide (A, T, C or G) in the genome sequence is altered. SNP: Single DNA base variation found >1% Mutation: Single DNA base variation found <1% SNPs 09-05-2015 Department of Plant Biotechnology 6
  • 7. Coding SNPs cSNP Positions that fall within the coding regions of genes Regulatory SNPs rSNP Positions that fall in regulatory regions of genes Synonymous SNPs sSNP Positions in exons that do not change the codon to substitute an amino acid Non-synonymous SNPs nsSNP Positions that incur an amino acid substitution Intronic SNPs iSNP Positions that fall within introns SNP functional classes Mooney et al, 2005 09-05-2015 Department of Plant Biotechnology 7
  • 8. SNPs Importance Abundant in the genome and frequency of SNPs is higher i.e.  Maize has 1SNP/31bp in non-coding, 1SNP/124bp in coding region (Ching et al, 2002)  In rice whole genome shotgun sequences for japonica and indica 3-27 SNP/kb In Soybean 280SNPs/76.3kb of genomic DNA Automation is easy. Due to biallelic nature of these molecular markers and these are directly based on DNA sequence. • Millions of sequences are available in public database for several crops i.e. wheat has 132,000 ESTs, • Till now no. of SNPs has been discovered i.e. Barley have 29447 SNPs , Rice 64837 SNPs (Yu et.al, 2005) (Zhu et al,2003) 09-05-2015 Department of Plant Biotechnology 8
  • 9. SNPs Application • Comparative mapping is performed using datbases. • Linkage Disequilibrium based Association Studies- eg. Arabidopsis thaliana • Marker Assisted Selection(MAS)- used in soybean by identification of SNP markers with GmNARK gene. Gene indicates hypernodulating mutation (Kim et al, 2005) • Genetic diversity studies- In maize using SNPs at 21 loci in chromosome1 (Tenalillon et al, 2005) • High resolution genetic mapping -5 SNPs are genetically mapped in melon (Morales et al, 2004) • Mapping of EST sequences can be done with SNPs(Davis G et al, 2001) • Identification of cloned genes with SNPs- Secondary application of SNPs (Jehan and Lakhanpaul, 2006) 09-05-2015 Department of Plant Biotechnology 9
  • 10. Crop Gene function Trait associated with SNP Utility of SNP Barley (Hordeum vulgare) B-Amylase gene Degradation of starch Enzyme thermostability To select barley seedling carying superior allele of B-Amylase Wild Barley (H. spontaneum) Dhn1 & Dhn5 (Dehydrin) Adaptive response of plant to enviromental stress Resistance to water stress For water stress adaptation Rice (Oryza sativa) 1) Wx (waxy) gene 2) Sd-1(semi- dwarfing )gene 1) Control amylose synthesis by coding starch synthase enyme 2) Dwarfism 1) Amylose content 2) Dwarfism 1) For development of new cultivar 2) Selection of sd-1 in breeding programme Wheat (Triticum aestivum ) 1) Pin b (Puroindolin b) 2) Rht 1 & Rht 2 gene 1) Thicken the coat 2) Dwarfism 1) Grain hardiness 2) Dwarfism Breeding program Soybean (Glycine max) Rhg 1& Rhg 4 Soyabean Cyst nematode resistance allele SCN resistance Breeding programme Onion (Allium cepa) SNP allele in Plastosome Responsible for CMS Cytoplasmic male sterility and fertility For development of CMS lines Musturd (Brassica juncea ) FAE 1 gene Fatty acid elongase Erucic acid content Breeding programme 09-05-2015 Department of Plant Biotechnology 10
  • 11. CROP Citation No.of SNPs detected Alfalfa Li et al 2012 872.384 Cotton Bayers et al 2012 Zhu et al 2014 151,712 40,503 Peanut Khera et al 2013 Zhou et al 2014 8,486 1,765 Potato Uitdewilligen et al., 2013 42,625 Rapeseed Trick et al 2009 Hu et al 2012 Huang et al 2013 41,593 655 892,803 Wheat Allen, 2013 Cavanagh et al., 2013 10,251 25,454 White Clover Nagy et al.,2013 208,854 SNP identification in polyploids 09-05-2015 Department of Plant Biotechnology 11
  • 12.  Approximately 18.9 million single nucleotide polymorphisms (SNPs) in rice were discovered when aligned to the reference genome of the temperate japonica variety, Nipponbare.  Phylogenetic analyses based on SNP data confirmed differentiation of the O. sativa gene pool into 5 varietal groups – indica, aus/boro, basmati/sadri, tropical japonica and temperate japonica. 09-05-2015 Department of Plant Biotechnology 12
  • 13. The trends of crop sequencing toward breeding practice Yang et al., 2015 09-05-2015 Department of Plant Biotechnology 13
  • 14. SNPs Discovery • Two approaches have been adopted for the discovery of Novel SNPs  In- vitro Discovery (new sequence data is generated)  In-silico methods (analysis of available sequence data) 09-05-2015 Department of Plant Biotechnology 14
  • 15. In vitro approaches Non sequencing based methods Sequencing based methods Resequencing based methods 09-05-2015 Department of Plant Biotechnology 15
  • 16. Non sequencing methods Restriction based techniques DNA conformation technique RFLP (Restriction Fragment Length Polymorphism) CAPS(Cleaved Amplified Polymorphic Sequence) dCAPS(Derived Cleaved Amplified Polymorphic Sequence Chip Based methods Use of probes for hybridization of whole genome SSCP (Single Strand Conformational Polymorphism) DGGE (Denaturing Gradient Gel Electrophoresis) TGGE (Temperature Gradient Gel Electrophoresis) Heteroduplex Analysis TILLING Target Induced Local Lesion IN Genome- Cel I Endonuclease 09-05-2015 Department of Plant Biotechnology 16
  • 17. Based on Sequencing methods Locus specific PCR amplification Alignment of available genomic sequences Whole genome shotgun method Overlapping region BACs and PACs Reduced Representation Shotgun(RRS) 09-05-2015 Department of Plant Biotechnology 17
  • 18. Finding SNPs: Sequence-based SNP Mining RANDOM Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC Genomic RRS Library Shotgun Overlap BAC Library BAC Overlap DNA SEQUENCING mRNA cDNA Library EST Overlap Random Shotgun Align to Reference > 11 Million SNPs G C Validated - 5..6 MILLON SNPS
  • 19. Resequencing methods Pyrosequencing Mass Array •Sequence by synthesis •Primer +dNTP  Primer+1N+ PPi •APS + PPi  ATP Adenosine Phospho Sulphate •Luciferin + ATP  Oxyluciferien + light • Based on MALDI-TOF MS Matrix Assisted Laser Desorption/Ionization –Time of Flight Mass Spectrophotometery •Based on variation in molecular wt. of 4 nucleotides 09-05-2015 Department of Plant Biotechnology 19
  • 20. In silico methods  Use of software for discovery of SNP from available sequence database Manually is not possible, to find out SNP or single nucleotide difference large no. of sequences Hence various software i.e. Phred, PolyBayes, autoSNPdb, SNPserver, etc 09-05-2015 Department of Plant Biotechnology 20
  • 21. 09-05-2015 Department of Plant Biotechnology 21
  • 22. Seal et al., 2014 09-05-2015 Department of Plant Biotechnology 22
  • 23. 09-05-2015 Department of Plant Biotechnology 23
  • 24. 09-05-2015 Department of Plant Biotechnology 24
  • 25. 09-05-2015 Department of Plant Biotechnology 25
  • 26. Method for SNP Identification (Ganal et al.,2009) Prerequisite Current false discovery rate(%) Specifics, Limitations EST Sequence data Large number of available EST- sequences 15-50 Dependent on the expression level or need for normalized libraries, difficulties in the discrimination of orthologous from paralogous sequences, low sequence quality Array analysis Unigene sets based on EST-sequences, array technology >20 Not all SNPs identified, large genomes require complexity reduction Amplicon Resequencing Unigene sets based on EST-sequences, amplification primers for many individual genes <5 High reliability but costly, detailed haplotype analysis possible, many lines can be compared, allele frequency data with pools of DNA No Genomic Sequence and Next Generation Sequencing Technologies Novel sequencing technologies, complexity reduction methods, bioinformatic tools 15-25 Generates large amounts of data, costly bioinformatics, false discovery rate for genomes without full sequence is relatively high Genomic Sequence is available either through conventional sequencing or next generation sequencing Reference genome, bioinformatic tools <5-10 Small genomes can be fully sequenced and compared for SNPs, for large genomes targeted approaches will be necessary (e.g. exon capture and multiplex amplification) 09-05-2015 Department of Plant Biotechnology 26
  • 27. SNP Genotyping For SNPs detection there are more than 30 techniques available. i.e. Molecular beacons, Padlock probe, Invader assay etc All are based on basic principles i.e. Hybridization Direct Sequencing Allele specific primer extension Single base extension Endonuclease Cleavage / Ligation 09-05-2015 Department of Plant Biotechnology 27
  • 28. • Illumina GoldenGate Assay • Sequenom • Affymetrix/GeneChip • SNaPshot • SNPlex • Taqman • Dye Terminator Sequencing • Pyrosequencing (454) • Reverse-nucleotide sequencing(Solexa) • Sequencing by ligation (ABI) SNPs Genotyping platforms 09-05-2015 Department of Plant Biotechnology 28
  • 29. Crop Specific SNP marker system development Non Sequencing method Resequencing methods Sequencing methods In silico method Crop specific SNP genotyping platform development Application of SNP genotyping Oligonucleotide ligation assay Invader assay Molecular Beacons Padlock probes Marker Assisted Selection Gene tagging Fine mapping Association studies 09-05-2015 Department of Plant Biotechnology 29
  • 30. Case study 09-05-2015 Department of Plant Biotechnology 30
  • 31. Introduction • Genome analysis in bread wheat poses substantial challenges • Single nucleotide polymorphisms (SNPs) represent the most frequent type of genetic polymorphism and can therefore allow the development of the highest density of molecular markers. • Whole-genome Illumina paired read sequence data were generated from 16 Australian bread wheat varieties. • After filtering to remove poor quality and clonal reads, a total of 13 642 million read pairs remained. • Alignment of these read pairs to the wheat group 7 and 4AL chromosome assemblies using strict parameters resulted in 3.05%, 3.76% and 3.43% of read pairs mapping uniquely to chromosomes 7A, 7B and 7D, respectively. • SNP calling using the SGSautoSNP pipeline predicted a total of 4 018 311 intervarietal SNPs. 09-05-2015 Department of Plant Biotechnology 31
  • 32. Material and Methods • SNP prediction SNP predication was performed using SGSautoSNP (Lorenc et al., 2012), with output in snp format for subsequent analysis and gff format for presentation on a GBrowse genome viewer at www.wheatgenome.info. • SNP matrix production and transition/transversion ratio analysis The snp files generated by SGSautoSNP were parsed using a custom Python script to generate the SNP matrix file The transition/transversion ratio for each chromosome was calculated based on bins of 500 SNPs using VCFtools. • SNP density and gene analysis The SNP density plots for each chromosome were generated using a custom Python script that calculates relative density based on a window size of 50 000 bp. 09-05-2015 Department of Plant Biotechnology 32
  • 33. Genes identified as being in low-SNP-density regions were compared with the Swissprot database using BLASTX (BLASTALL 2.2.6) with an E value cut-off 1e-5. The genes with minimum E- value has been identified in low/high SNP density regions with UniProtKB entry ID and protein names. • Validation A total of 22 SNPs were selected from the three group 7 reference genomes for validation.  PCR amplification of the 22 loci was performed using primers designed to bind to conserved sequence surrounding the SNPs The purified PCR products were Sanger-sequenced using Big-Dye 3.1 (PerkinElmer, Waltham, MA), using forward and reverse PCR primers, and analysed using an ABI3730xl. 09-05-2015 Department of Plant Biotechnology 33
  • 34. Results Whole-genome Illumina paired read sequence data After filtering 13 642 million read pairs remained Alignment to the wheat group 7 and 4AL chromosome assemblies 3.05%, 3.76% and 3.43% of read pairs mapping uniquely to chromosomes 7A, 7B and 7D, respectively SNP calling using the SGSautoSNP pipeline total of 4 018 311 intervarietal SNPs 09-05-2015 Department of Plant Biotechnology 34
  • 35. • The majority of SNPs were identified on contigs which do not form part of the syntenic builds and are predominantly within intergenic regions. 09-05-2015 Department of Plant Biotechnology 35
  • 36. Figure 2: Ts/Tv ratio across the 7A, 7B and 7D syntenic builds 09-05-2015 Department of Plant Biotechnology 36
  • 37. Phylogenetic relationships of 16 Australian wheat varieties based on SNP data obtained in this study. SNP variation is 146171 SNP variation is 968088 Avg. no. of SNPs between varieties 465278 09-05-2015 Department of Plant Biotechnology 37
  • 38. Figure 3: SNP density across the 7A, 7B and 7D syntenic builds.09-05-2015 Department of Plant Biotechnology 38
  • 39. • A total of 146 genes were predicted to be in low-SNP-density regions, representing 40, 27 and 79 genes on the A, B and D genomes, respectively.  these genes include MADS box and Myb transcription factors, signal transduction pathway genes, a sodium transporter, an ironresponsive transcription factor, a potassium transporter, callose synthase, sucrose synthase and sugar transporters. • A total of 14 genes were predicted to be in high-SNP-density regions, representing 10, 3 and 1 gene(s) on the A, B and D genomes, respectively.  These genes include cellulose synthase, argonaute and ethylene response factors. 09-05-2015 Department of Plant Biotechnology 39
  • 40. The SNPs from the recently published wheat Infinium array (Wang et al. , 2014) were compared to those predicted by SGSautoSNP. A total of 850 SNPs were identified as having a match on the group 7 chromosomes at the same position as predicted in our study . Of these, 482 (57%) were classified as polymorphic single locus, 316 (37%) as being polymorphic multilocus, while only 52 (6%) were monomorphic.09-05-2015 Department of Plant Biotechnology 40
  • 41. Conclusion • This study has revealed a vast number of polymorphisms occurring within the chromosome 7 homoeologues of hexaploid wheat among elite Australian varieties. • This resource is publically available to assist additional genetic analysis and breeding. • Furthermore, observed patterns of SNPs across the homoeologous group 7 chromosomes have provided insight into the molecular consequences of the evolution and selection that resulted in modern hexaploid wheat. 09-05-2015 Department of Plant Biotechnology 41
  • 42. Summary • SNP are future markers, having high density in genome. • Although thousands of SNP markers are widely used in animal and human genome analysis, their use in plants is still in its infancy. • SNP mining can provide better understanding of crops at the gene level , for the detailed analysis of germplasm and ultimately for the efficient management of genetic diversity within plant breeding on a whole genome level. 09-05-2015 Department of Plant Biotechnology 42
  • 43. 09-05-2015 Department of Plant Biotechnology 43 Thank you