SlideShare a Scribd company logo
7
Most read
8
Most read
10
Most read
The Smith-Waterman algorithm

                    Dr Avril Coghlan
                   alc@sanger.ac.uk

Note: this talk contains animations which can only be seen by
downloading and using ‘View Slide show’ in Powerpoint
Global versus Local Alignment
• A global alignment covers the entire lengths of the
  sequences involved
  The Needleman-Wunsch algorithm finds the best global alignment
  between 2 sequences
• A local alignment only covers parts of the sequences
  The Smith-Waterman algorithm finds the best local   alignment
  between 2 sequences


  Global alignment       Q K E S G P S S S Y C
                         |   | | |           |
                       V Q Q E S G L V R T T C
  Local alignment              E S G
                               | | |
                               E S G
Local alignment
• The concept of ‘local alignment’ was introduced by
  Smith & Waterman in 1981
• A local alignment of 2 sequences is an alignment
  between parts of the 2 sequences
  Two proteins may one share one stretch of high sequence
  similarity,      but be very dissimilar outside that region
  A global (N-W) alignment of such sequences would have:
   (i) lots of matches in the region of high sequence similarity
  (ii) lots of mismatches & gaps (insertions/deletions) outside the region
          of similarity
  It makes sense to find the best local alignment instead
Real data: fruitfly & human Eyeless
                    • This is a global
                      alignment of human
                      & fruitfly Eyeless

                     Do you think it’s
                     sensible to make a
                     global alignment of
                     these two sequences?
Real data: fruitfly & human Eyeless
                     There are 2 short
                     regions of high
                     similarity

                     Outside those regions,
                     there are many
                     mismatches and gaps

                     It might be more
                     sensible to make local
                     alignments of one or
                     both of the regions of
                     high similarity
Real data: fruitfly & human Eyeless
                     • This is a local
                       alignment of human
                       & fruitfly Eyeless

                       What parts of the
                       sequences were
                       used in the local
                       alignment?
The Smith-Waterman algorithm
• S-W is mathematically proven to find the best
  (highest-scoring) local alignment of 2 sequences
  The best local alignment is the best alignment of all possible
  subsequences (parts) of sequences S1 and S2
  The 0th row and 0th column of T are first filled with zeroes
  The recurrence relation used to fill table T is:
                 T(i-1, j-1) + σ(S1(i), S2(j))
  T(i, j) = max  T(i-1, j) + gap penalty
                 T(i, j-1) + gap penalty                A 4th possibility (unlike
                 0                                      N-W)
  The traceback starts at the highest scoring cell in the matrix T, and travels
  up/left while the score is still positive
  (While in N-W, traceback starts at the bottom right, & ends at the top
        left, which ensures it’s a global alignment)
• eg., to find the best local alignment of sequences
  “ACCTAAGG” and “GGCTCAATCA”, using +2 for a
  match, -1 for a mismatch, and -2 for a gap:
  We first make matrix T (as in N-W):
  The 0th row and 0th column of T are filled with zeroes
  The recurrence relation is then used to fill the matrix T
                     G   G   C   T   C   A   A   T   C   A
                0    0   0   0   0   0   0   0   0   0   0
            A   0
            C   0
            C   0
            T   0
            A   0
            A   0
            G   0
            G   0
We first calculate T(1,1) using the recurrence relation:
           T(i-1, j-1) + σ(S1(i), S2(j)) = 0 – 1 = -1
    T(i, j) = max       T(i-1, j) + gap penalty = 0 -2 = -2
     T(i, j-1) + gap penalty = 0 -2 = -2
     0
    The maximum value is 0, so we set T(1,1) to 0
        G   G   C   T   C    A   A   T   C   A
    0   0   0   0   0   0    0   0   0   0   0
                                                 We next calculate T(2,1)…
A   0   0
        ?   ?
C   0
C   0
T   0
A   0
A   0
G   0
G   0
You fill in the whole of T, recording the previous cell (if any)   used
to calculate the value of each T(i, j):
                 G
                 G   G
                     G   C
                         C    T
                              T   C
                                  C   A
                                      A   A
                                          A   T
                                              T   C
                                                  C    A
                                                       A
             0   0   0   0    0   0   0   0   0   0    0
         A   0   0   0   0    0   0   2   2   0   0    2

         C   0   0   0   2    0   2   0   1   1   2    0
         C   0   0   0   2    1   2   1   0   0   3    1
         T   0   0   0   0    4   2   1   0   2   1    2
         A
         A   0   0   0   0    2   3   4   3   1   1    3
         A
         A   0   0   0   0    0   1   5   6   4   2    3
         G
         G   0   2   2   0    0   0   3   4   5   3    1
         G
         G   0   2   4   2    0   0   1   2   3   4    2
G   G   C   T   C   A   A   T   C   A
             0   0   0   0   0   0   0   0   0   0   0
         A   0   0   0   0   0   0   2   2   0   0   2
         C   0   0   0   2   0   2   0   1   1   2   0
         C   0   0   0   2   1   2   1   0   0   3   1
         T   0   0   0   0   4   2   1   0   2   1   2
         A   0   0   0   0   2   3   4   3   1   1   3
         A   0   0   0   0   0   1   5   6   4   2   3
         G   0   2   2   0   0   0   3   4   5   3   1
         G   0   2   4   2   0   0   1   2   3   4   2

You work out the best local alignment from the traceback (just like in N-
W):                          C T C A A
                             | |    | |
                             C T - A A
Software for making alignments
• For Smith-Waterman pairwise alignment
  pairwiseAlignment() in the “Biostrings” R library
  the EMBOSS (emboss.sourceforge.net/) water program
Problem
• Find the best local alignment between
  “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2
  for a mismatch, and -2 for a gap.
Answer
• Find the best local alignment between
  “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2
  for a mismatch, and -2 for a gap
  Matrix T looks like this, with the pink traceback:
           T   C   A   G   T   T   G   C   C
       0   0   0   0   0   0   0   0   0   0
   A   0   0   0   1   0   0   0   0   0   0
                                                       Alignment:

   G   0   0   0   0   2   0   0   1   0   0
                                                       G T T G
   G   0   0   0   0   1   0   0   1   0   0           | | | |
   T   0   1   0   0   0   2   1   0   0   0           G T T G

   T   0   1   0   0   0   1   3   1   0   0      (Pink traceback)

   G   0   0   0   0   1   0   1   4   2   0
Further Reading
•   Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn
•   Chapter 6 in Deonier et al Computational Genome Analysis
•   Practical on pairwise alignment in R in the Little Book of R for
    Bioinformatics:
    https://a-little-book-of-r-for-
    bioinformatics.readthedocs.org/en/latest/src/chapter4.html

More Related Content

What's hot (20)

DOCX
Open Reading Frames
Osama Zahid
 
PPTX
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 
PPTX
Scoring matrices
Ashwini
 
PPTX
Entrez databases
Hafiz Muhammad Zeeshan Raza
 
PPTX
TrEMBL
Ankit Alankar
 
PPTX
smith - waterman algorithm.pptx
Dr. Vimal Priya subramanian
 
PPTX
Sequence Submission Tools
RishikaMaji
 
PPTX
BLAST
Anushi Jain
 
PDF
Protein Structure Prediction
Balachandramohan Bcm
 
PPTX
Genome Database Systems
Harindu Chathuranga Korala
 
PPT
Pairwise sequence alignment
avrilcoghlan
 
PPTX
SEQUENCE ANALYSIS
prashant tripathi
 
PPTX
Multiple sequence alignment
Ramya S
 
PPTX
Protein protein interactions
SHRIKANT YANKANCHI
 
PPTX
Needleman-wunch algorithm harshita
Harshita Bhawsar
 
PDF
Sequence alignment
Vidya Kalaivani Rajkumar
 
PPTX
Dynamic programming
Zohaib HUSSAIN
 
PPTX
Cath
Ramya S
 
PPTX
2d Page
microbiology Notes
 
PPTX
Protein Databases
SATHIYA NARAYANAN
 
Open Reading Frames
Osama Zahid
 
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 
Scoring matrices
Ashwini
 
Entrez databases
Hafiz Muhammad Zeeshan Raza
 
smith - waterman algorithm.pptx
Dr. Vimal Priya subramanian
 
Sequence Submission Tools
RishikaMaji
 
Protein Structure Prediction
Balachandramohan Bcm
 
Genome Database Systems
Harindu Chathuranga Korala
 
Pairwise sequence alignment
avrilcoghlan
 
SEQUENCE ANALYSIS
prashant tripathi
 
Multiple sequence alignment
Ramya S
 
Protein protein interactions
SHRIKANT YANKANCHI
 
Needleman-wunch algorithm harshita
Harshita Bhawsar
 
Sequence alignment
Vidya Kalaivani Rajkumar
 
Dynamic programming
Zohaib HUSSAIN
 
Cath
Ramya S
 
Protein Databases
SATHIYA NARAYANAN
 

Similar to The Smith Waterman algorithm (20)

PDF
D028036046
inventionjournals
 
PPTX
Global and local alignment (bioinformatics)
Pritom Chaki
 
PPTX
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
AIST
 
PPT
Asymptotic Analysis.ppt
abhishekchakraborty261420
 
PPT
Robotics_Introduction_to_Jacobian_part1.ppt
amitshahtech
 
PDF
A new six point finite difference scheme for nonlinear waves interaction model
Alexander Decker
 
PDF
Sequence Alignment
Alexander Niema Moshiri
 
PPT
The fourier series signals and systems by R ismail
Rumaisa35
 
PPT
17330361.ppt
AffanWaheed6
 
PPTX
Lecture 23 loop transfer function
bennedy ningthoukhongjam
 
PDF
Spatially resolved pair correlation functions for point cloud data
Tony Fast
 
PDF
Epidemic processes on switching networks
Naoki Masuda
 
PDF
A common unique random fixed point theorem in hilbert space using integral ty...
Alexander Decker
 
PDF
Estimating ecosystem functional features from intra-specific trait data
Tano Gutiérrez Cánovas
 
PDF
E023048063
inventionjournals
 
PDF
E023048063
inventionjournals
 
PPT
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
PPTX
Bioinformatica t3-scoringmatrices v2014
Prof. Wim Van Criekinge
 
PDF
Robust fuzzy-observer-design-for-nonlinear-systems
Cemal Ardil
 
PPTX
Controllability of Linear Dynamical System
Purnima Pandit
 
D028036046
inventionjournals
 
Global and local alignment (bioinformatics)
Pritom Chaki
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
AIST
 
Asymptotic Analysis.ppt
abhishekchakraborty261420
 
Robotics_Introduction_to_Jacobian_part1.ppt
amitshahtech
 
A new six point finite difference scheme for nonlinear waves interaction model
Alexander Decker
 
Sequence Alignment
Alexander Niema Moshiri
 
The fourier series signals and systems by R ismail
Rumaisa35
 
17330361.ppt
AffanWaheed6
 
Lecture 23 loop transfer function
bennedy ningthoukhongjam
 
Spatially resolved pair correlation functions for point cloud data
Tony Fast
 
Epidemic processes on switching networks
Naoki Masuda
 
A common unique random fixed point theorem in hilbert space using integral ty...
Alexander Decker
 
Estimating ecosystem functional features from intra-specific trait data
Tano Gutiérrez Cánovas
 
E023048063
inventionjournals
 
E023048063
inventionjournals
 
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
Bioinformatica t3-scoringmatrices v2014
Prof. Wim Van Criekinge
 
Robust fuzzy-observer-design-for-nonlinear-systems
Cemal Ardil
 
Controllability of Linear Dynamical System
Purnima Pandit
 
Ad

More from avrilcoghlan (8)

PPT
DESeq Paper Journal club
avrilcoghlan
 
PPT
Introduction to genomes
avrilcoghlan
 
PPT
Homology
avrilcoghlan
 
PPT
Statistical significance of alignments
avrilcoghlan
 
PPT
Multiple alignment
avrilcoghlan
 
PPT
Alignment scoring functions
avrilcoghlan
 
PPT
Dotplots for Bioinformatics
avrilcoghlan
 
PPT
Introduction to HMMs in Bioinformatics
avrilcoghlan
 
DESeq Paper Journal club
avrilcoghlan
 
Introduction to genomes
avrilcoghlan
 
Homology
avrilcoghlan
 
Statistical significance of alignments
avrilcoghlan
 
Multiple alignment
avrilcoghlan
 
Alignment scoring functions
avrilcoghlan
 
Dotplots for Bioinformatics
avrilcoghlan
 
Introduction to HMMs in Bioinformatics
avrilcoghlan
 
Ad

Recently uploaded (20)

PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
PPTX
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
PPTX
QUARTER 1 WEEK 2 PLOT, POV AND CONFLICTS
KynaParas
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PDF
The Different Types of Non-Experimental Research
Thelma Villaflores
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PDF
Horarios de distribución de agua en julio
pegazohn1978
 
PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PDF
Geographical Diversity of India 100 Mcq.pdf/ 7th class new ncert /Social/Samy...
Sandeep Swamy
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PPTX
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
QUARTER 1 WEEK 2 PLOT, POV AND CONFLICTS
KynaParas
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
The Different Types of Non-Experimental Research
Thelma Villaflores
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
Horarios de distribución de agua en julio
pegazohn1978
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
Geographical Diversity of India 100 Mcq.pdf/ 7th class new ncert /Social/Samy...
Sandeep Swamy
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 

The Smith Waterman algorithm

  • 1. The Smith-Waterman algorithm Dr Avril Coghlan [email protected] Note: this talk contains animations which can only be seen by downloading and using ‘View Slide show’ in Powerpoint
  • 2. Global versus Local Alignment • A global alignment covers the entire lengths of the sequences involved The Needleman-Wunsch algorithm finds the best global alignment between 2 sequences • A local alignment only covers parts of the sequences The Smith-Waterman algorithm finds the best local alignment between 2 sequences Global alignment Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C Local alignment E S G | | | E S G
  • 3. Local alignment • The concept of ‘local alignment’ was introduced by Smith & Waterman in 1981 • A local alignment of 2 sequences is an alignment between parts of the 2 sequences Two proteins may one share one stretch of high sequence similarity, but be very dissimilar outside that region A global (N-W) alignment of such sequences would have: (i) lots of matches in the region of high sequence similarity (ii) lots of mismatches & gaps (insertions/deletions) outside the region of similarity It makes sense to find the best local alignment instead
  • 4. Real data: fruitfly & human Eyeless • This is a global alignment of human & fruitfly Eyeless Do you think it’s sensible to make a global alignment of these two sequences?
  • 5. Real data: fruitfly & human Eyeless There are 2 short regions of high similarity Outside those regions, there are many mismatches and gaps It might be more sensible to make local alignments of one or both of the regions of high similarity
  • 6. Real data: fruitfly & human Eyeless • This is a local alignment of human & fruitfly Eyeless What parts of the sequences were used in the local alignment?
  • 7. The Smith-Waterman algorithm • S-W is mathematically proven to find the best (highest-scoring) local alignment of 2 sequences The best local alignment is the best alignment of all possible subsequences (parts) of sequences S1 and S2 The 0th row and 0th column of T are first filled with zeroes The recurrence relation used to fill table T is: T(i-1, j-1) + σ(S1(i), S2(j)) T(i, j) = max T(i-1, j) + gap penalty T(i, j-1) + gap penalty A 4th possibility (unlike 0 N-W) The traceback starts at the highest scoring cell in the matrix T, and travels up/left while the score is still positive (While in N-W, traceback starts at the bottom right, & ends at the top left, which ensures it’s a global alignment)
  • 8. • eg., to find the best local alignment of sequences “ACCTAAGG” and “GGCTCAATCA”, using +2 for a match, -1 for a mismatch, and -2 for a gap: We first make matrix T (as in N-W): The 0th row and 0th column of T are filled with zeroes The recurrence relation is then used to fill the matrix T G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 A 0 C 0 C 0 T 0 A 0 A 0 G 0 G 0
  • 9. We first calculate T(1,1) using the recurrence relation: T(i-1, j-1) + σ(S1(i), S2(j)) = 0 – 1 = -1 T(i, j) = max T(i-1, j) + gap penalty = 0 -2 = -2 T(i, j-1) + gap penalty = 0 -2 = -2 0 The maximum value is 0, so we set T(1,1) to 0 G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 We next calculate T(2,1)… A 0 0 ? ? C 0 C 0 T 0 A 0 A 0 G 0 G 0
  • 10. You fill in the whole of T, recording the previous cell (if any) used to calculate the value of each T(i, j): G G G G C C T T C C A A A A T T C C A A 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 2 2 0 0 2 C 0 0 0 2 0 2 0 1 1 2 0 C 0 0 0 2 1 2 1 0 0 3 1 T 0 0 0 0 4 2 1 0 2 1 2 A A 0 0 0 0 2 3 4 3 1 1 3 A A 0 0 0 0 0 1 5 6 4 2 3 G G 0 2 2 0 0 0 3 4 5 3 1 G G 0 2 4 2 0 0 1 2 3 4 2
  • 11. G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 2 2 0 0 2 C 0 0 0 2 0 2 0 1 1 2 0 C 0 0 0 2 1 2 1 0 0 3 1 T 0 0 0 0 4 2 1 0 2 1 2 A 0 0 0 0 2 3 4 3 1 1 3 A 0 0 0 0 0 1 5 6 4 2 3 G 0 2 2 0 0 0 3 4 5 3 1 G 0 2 4 2 0 0 1 2 3 4 2 You work out the best local alignment from the traceback (just like in N- W): C T C A A | | | | C T - A A
  • 12. Software for making alignments • For Smith-Waterman pairwise alignment pairwiseAlignment() in the “Biostrings” R library the EMBOSS (emboss.sourceforge.net/) water program
  • 13. Problem • Find the best local alignment between “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2 for a mismatch, and -2 for a gap.
  • 14. Answer • Find the best local alignment between “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2 for a mismatch, and -2 for a gap Matrix T looks like this, with the pink traceback: T C A G T T G C C 0 0 0 0 0 0 0 0 0 0 A 0 0 0 1 0 0 0 0 0 0 Alignment: G 0 0 0 0 2 0 0 1 0 0 G T T G G 0 0 0 0 1 0 0 1 0 0 | | | | T 0 1 0 0 0 2 1 0 0 0 G T T G T 0 1 0 0 0 1 3 1 0 0 (Pink traceback) G 0 0 0 0 1 0 1 4 2 0
  • 15. Further Reading • Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn • Chapter 6 in Deonier et al Computational Genome Analysis • Practical on pairwise alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html

Editor's Notes

  • #4: Image credit (Temple Smith): http://www.modulargenetics.com/Temple%20Smith.jpg Image credit (Michael Waterman): http://www.iscb.org/cms_addon/conferences/ismb2003/images/watterman.jpg
  • #5: Made alignment of human.fa and fly.fa using Needleman-wunsch with default parameters at: http://emboss.bioinformatics.nl/cgi-bin/emboss/needle (EMBOSS needle) Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Viewed in jalview, and saved as humanfly_needlemanwunsch.png
  • #7: Made alignment of human.fa and fly.fa using Smith-Waterman with default parameters at: http://emboss.bioinformatics.nl/cgi-bin/emboss/water (EMBOSS) Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Viewed in jalview, and saved as humanfly_smithwaterman.png
  • #11: In R: >library("Biostrings") >seq1 <- "GGCTCAATCA" >seq2 <- "ACCTAAGG" >sigma <- nucleotideSubstitutionMatrix(match = 2, mismatch = -1, baseOnly = TRUE) >pairwiseAlignment(seq1, seq2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -2, scoreOnly = FALSE,type="local") dFixedSubject (1 of 1) pattern: [3] CTCAA subject: [3] CT-AA score: 6 Also: >source("C:/Documents and Settings/Avril Coughlan/My Documents/Rfunctions.R") >dnasmithwaterman(seq1,seq2,gapopen=0,gapextend=-2,mymatch=2,mymismatch=-1) [1] "maxT= 6" NA G G C T C A A T C A NA NA NA NA NA NA NA NA NA NA NA NA A NA "0 +" "0 +" "0 +" "0 +" "0 +" "2 >" "2 >" "0 -" "0 +" "2 >" C NA "0 +" "0 +" "2 >" "0 -" "2 >" "0 L" "1 >" "1 >" "2 >" "0 L" C NA "0 +" "0 +" "2 >" "1 >" "2 >" "1 >" "0 +" "0 >" "3 >" "1 Z" T NA "0 +" "0 +" "0 |" "4 >" "2 -" "1 >" "0 >" "2 >" "1 |" "2 >" A NA "0 +" "0 +" "0 +" "2 |" "3 >" "4 >" "3 >" "1 -" "1 >" "3 >" A NA "0 +" "0 +" "0 +" "0 |" "1 V" "5 >" "6 >" "4 -" "2 -" "3 >" G NA "2 >" "2 >" "0 -" "0 +" "0 +" "3 |" "4 V" "5 >" "3 Z" "1 *" G NA "2 >" "4 >" "2 -" "0 -" "0 +" "1 |" "2 V" "3 V" "4 >" "2 Z“ NOTE: there seems to be a mistake in the Deonier book for this example on page 157 of Deonier – it has “... 2 3 4 3 2 1 3” on one row, but should have “ ... 2 3 4 3 1 1 3” on that row (row i =5).
  • #15: In R: >library("Biostrings") >seq1 <- " TCAGTTGCC " >seq2 <- " AGGTTG " >sigma <- nucleotideSubstitutionMatrix(match = 1, mismatch = -2, baseOnly = TRUE) >pairwiseAlignment(seq1, seq2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -2, scoreOnly = FALSE,type="local") Local PairwiseAlignedFixedSubject (1 of 1) pattern: [4] GTTG subject: [3] GTTG score: 4 Also: >source("C:/Documents and Settings/Avril Coughlan/My Documents/Rfunctions.R") >dnasmithwaterman(seq1,seq2,gapopen=0,gapextend=-2,mymatch=1,mymismatch=-2) [1] "maxT= 4" NA T C A G T T G C C NA NA NA NA NA NA NA NA NA NA NA A NA "0 +" "0 +" "1 >" "0 +" "0 +" "0 +" "0 +" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "2 >" "0 -" "0 +" "1 >" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "1 >" "0 >" "0 +" "1 >" "0 +" "0 +" T NA "1 >" "0 +" "0 +" "0 +" "2 >" "1 >" "0 +" "0 +" "0 +" T NA "1 >" "0 +" "0 +" "0 +" "1 >" "3 >" "1 -" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "1 >" "0 +" "1 |" "4 >" "2 -" "0 -"