SlideShare a Scribd company logo
Knuth-Morris-Pratt Substring
Search Algorithm
Prepared By:
Sabiya Fatima
Email ID: sabiya1990fatima@gmail.com
1
Outline
 Definition
 History
 Components of KMP
 Algorithm
 Example
 Run-Time Analysis
 Complexity comparison of String Matching Algorithms
 Advantages and Disadvantages
 Real Time Applications
 References
2
What is Pattern Searching ?
 Suppose you are reading a text document.
 You want to search for a word.
 You click CTRL + F and search for that word.
 The word processor scans the document and shows the position of
occurrence.
What exactly happens is that, word i.e. pattern is searched inside the
text document.
3
Definition
 Best known for linear time for exact matching. Compares from left to right.
 Shifts more than one position.
 Preprocessing approach of Pattern to avoid trivial comparisions.
 Avoids recomputing matches.
4
History
 This algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by
James H.Morris in 1977.
 Knuth, Morris and Pratt discovered first linear time string-matching algorithm by
analysis of the naive algorithm.
 It keeps the information that naive approach wasted gathered during the scan of the
text. By avoiding this waste of information, it achieves a running time of O(m + n).
 The implementation of Knuth-Morris-Pratt algorithm is efficient because it minimizes
the total number of comparisons of the pattern against the input string.
5
Naïve Approach
The naïve approach is to check whether the pattern matches the string at
every possible position in the string.
P= Pattern (word) of length m
T= Text (document) of length n
Naive string matching algorithm
takes time O((n-m+1)m) or
O(mn)
6
The KMP Algorithm - Motivation
x
j
. . a b a a b . . . . .
a b a a b a
a b a a b a
No need to
repeat these
comparisons
Resume
comparing
here
 Knuth-Morris-Pratt’s algorithm
compares the pattern to the text
in left-to-right, but shifts the
pattern more intelligently than
the brute-force algorithm.
 When a mismatch occurs, what
is the most we can shift the
pattern so as to avoid redundant
comparisons?
 Answer: the largest prefix of
P[0..j] that is a suffix of P[1..j]
7
Components of KMP algorithm
 The prefix function, Π
The prefix function,Π for a pattern encapsulates knowledge about how the pattern
matches against shifts of itself. This information can be used to avoid useless shifts of
the pattern ‘p’. In other words, this enables avoiding backtracking on the text ‘T’.
 The KMP Matcher
With text ‘T’, pattern ‘p’ and prefix function ‘Π’ as inputs, finds the occurrence of ‘p’ in
‘T’ and returns the number of shifts of ‘p’ after which occurrence is found.
8
The prefix function, Π
Following pseudocode computes the prefix function, Π:
Compute-Prefix-Function (p)
1 m  length[p] //’p’ pattern to be matched
2 Π[1]  0
3 k  0
4 for q  2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k  Π[k]
7 If p[k+1] = p[q]
8 then k  k +1
9 Π[q]  k
10 return Π
9
Example: compute Π for the pattern ‘p’ below:
a b a b a c a
Initially: m = length[p] = 7
Π[1] = 0
k = 0
Step 1: q = 2, k=0
Π[2] = 0
Step 2: q = 3, k = 0,
Π[3] = 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1
10
p
10
Contd…
11
Step 3: q = 4, k = 1
Π[4] = 2 q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2
Step 4: q = 5, k =2
Π[5] = 3
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3
Step 5: q = 6, k = 3
Π[6] = 0
Step 6: q = 7, k = 0
Π[7] = 1
After iterating 6 times, the prefix function
computation is complete: 
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1
The running time of the prefix function is O(m).
12Contd…..
The KMP Matcher
Input: The KMP Matcher, with pattern ‘p’, text ‘T’and prefix function ‘Π’, finds a match of p in T.
Following pseudocode computes the matching component of KMP algorithm:
KMP-Matcher(T,p)
1 n  length[T]
2 m  length[p]
3 Π  Compute-Prefix-Function(p)
4 q  0 //number of characters matched
5 for i  1 to n //scan T from left to right
6 do while q > 0 and p[q+1] != T[i]
7 do q  Π[q] //next character does not match
8 if p[q+1] = T[i]
9 then q  q + 1 //next character matches
10 if q = m //is all of p matched?
11 then print “Pattern occurs with shift” i – m
12 q  Π[ q] // look for the next match
Note: KMP finds every occurrence of a ‘p’in ‘T’. That is why KMP does not terminate in step 12, rather it searches
remainder of ‘T’for any more occurrences of ‘p’.
13
Illustration: given a Text ‘T’ and pattern ‘p’ as follows:
T
b a c b a b a b a b a c a c a
p a b a b a c a
Let us execute the KMP algorithm to find whether ‘p’ occurs in ‘T’.
For ‘p’the prefix function, Π was computed previously and is as follows:
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1 14
14
b a c b a b a b a b a c a a b
a b a b a c a
Initially: n = size of T = 15;
m = size of p = 7
Step 1: i = 1, q = 0
comparing p[1] with T[1]
T
p
P[1] does not match with T[1]. ‘p’ will be shifted one position to the right.
15
15
Contd…
Step 2: i = 2, q = 0
comparing p[1] with T[2]
T
p
a b a b a c a
P[1] matches T[2]. Since there is a match, p is not shifted.
b a c b a b a b a b a c a a b
Contd…
16
T b a c b a b a b a b a c a a b
a b a b a c ap
p[2] does not match with T[3]
Backtracking on p, comparing p[1] and T[3]
Step 3: i = 3, q = 1
Comparing p[2] with T[3]
a b a b a c a
T
p
Step 4: i = 4, q = 0 comparing p[1] with T[4] p[1] does not match with T[4]
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
T
p
Step 5: i = 5, q = 0
comparing p[1] with T[5] p[1] matches with T[5]
17
17
Step 6: i = 6, q = 1 Comparing p[2] with T[6] p[2] matches with T[6]
T
p
b a c b a b a b a b a c a a b
a b a b a c a
Contd…
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
T
p
Step 7: i = 7, q = 2 Comparing p[3] with T[7]
p[3] matches with T[7]
Step 8: i = 8, q = 3 Comparing p[4] with T[8]
p[4] matches with T[8]
T
p
18
18
Contd…
Step 9: i = 9, q = 4
Comparing p[5] with T[9]
Comparing p[6] with T[10]Step 10: i = 10, q = 5
T
p
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
p[6] doesn’t match with T[10]
Backtracking on p, comparing p[4] with T[10] because after mismatch q = Π[5] = 3
p[5] matches with T[9]
19
19
T
p
Contd…
20
Step 11: i = 11, q = 4
Comparing p[5] with T[11] p[5] matches with T[11]
T
p
b a c b a b a b a b a c a a b
a b a b a c a
Contd…
Step 12: i = 12, q = 5 Comparing p[6] with T[12] p[6] matches with T[12]
a b a b a c ap
b a c b a b a b a b a c a a bT
b a c b a b a b a b a c a a b
a b a b a c a
Comparing p[7] with T[13]
T
p
Step 13: i = 13, q = 6 p[7] matches with T[13]
Pattern ‘p’ has been found to completely occur in text ‘T’. The total number of shifts
that took place for the match to be found are: i – m = 13 – 7 = 6 shifts.
The running time of the KMP-Matcher function is O(n).
21
21
Contd…
Complexity
 O(m) - It is to compute the prefix function values.
 O(n) - It is to compare the pattern to the text.
 Total of O(n + m) run time.
22
Complexity comparison of String Matching
Algorithms
23
Advantage and Disadvantage
Advantages:
1.The running time of the KMP algorithm is optimal (O(m + n)), which is very fast.
2.The algorithm never needs to move backwards in the input text T. It makes the
algorithm good for processing very large files.
Disadvantages:
Doesn’t work so well as the size of the alphabets increases. By which more chances
of mismatch occurs.
24
Real time Applications
 Good for plagiarism analysis.
 search engines
 language syntax checker
 database queries
 music content retrieval
25
Real time Applications
26
 DNA sequences analysis :
• It is mainly composed of nucleotides of four types. The four bases in DNA are
Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). A DNA sequence is a
representation of a string of nucleotides contained in a strand of DNA.
• DNA sequences analysis of various diseases which are stored in database for retrieval
and comparison. This system compares similarity values with threshold value and
stores particular result which is diseased or not.
• For example: ATTCGTAACTAGTAAGTTA. The DNA sequencing techniques have
allowed the vast amount of data to be analyzed in a short span of time. So, pattern
matching techniques plays a vital role in computational biology for data analysis
related to biological data such as DNA sequences
References
 Thomas H.Cormen; Charles E.Leiserson., Introduction to algorithms second edition , “The Knuth-Morris-Pratt
Algorithm”, year = 2001.
 https://pdfs.semanticscholar.org/fe41/52465f96d09c94b46b86a3b6408dae5dbe13.pdf
 http://research.ijcaonline.org/volume115/number23/pxc3902734.pdf
27
Thank You
28

More Related Content

What's hot (20)

PDF
Rabin karp string matcher
Amit Kumar Rathi
 
PPT
String matching algorithm
Alokeparna Choudhury
 
PPT
KMP Pattern Matching algorithm
Kamal Nayan
 
PPTX
String matching algorithms
Ashikapokiya12345
 
PDF
String matching algorithms
Mahdi Esmailoghli
 
PPTX
Boyer more algorithm
Kritika Purohit
 
PPTX
Rabin karp string matching algorithm
Gajanand Sharma
 
PDF
String matching, naive,
Amit Kumar Rathi
 
PPTX
String matching algorithms(knuth morris-pratt)
Neel Shah
 
PPTX
String Matching Algorithms-The Naive Algorithm
Adeel Rasheed
 
PPTX
Boyer moore algorithm
AYESHA JAVED
 
PPTX
Naive string matching
Abhishek Singh
 
PPTX
Rabin Carp String Matching algorithm
sabiya sabiya
 
PPTX
String Matching Finite Automata & KMP Algorithm.
Malek Sumaiya
 
PPT
B trees in Data Structure
Anuj Modi
 
PPT
Pattern matching
shravs_188
 
PPT
String searching
thinkphp
 
PDF
Algorithms Lecture 4: Sorting Algorithms I
Mohamed Loey
 
PPTX
String Matching (Naive,Rabin-Karp,KMP)
Aditya pratap Singh
 
PDF
Turing machines
surekamurali
 
Rabin karp string matcher
Amit Kumar Rathi
 
String matching algorithm
Alokeparna Choudhury
 
KMP Pattern Matching algorithm
Kamal Nayan
 
String matching algorithms
Ashikapokiya12345
 
String matching algorithms
Mahdi Esmailoghli
 
Boyer more algorithm
Kritika Purohit
 
Rabin karp string matching algorithm
Gajanand Sharma
 
String matching, naive,
Amit Kumar Rathi
 
String matching algorithms(knuth morris-pratt)
Neel Shah
 
String Matching Algorithms-The Naive Algorithm
Adeel Rasheed
 
Boyer moore algorithm
AYESHA JAVED
 
Naive string matching
Abhishek Singh
 
Rabin Carp String Matching algorithm
sabiya sabiya
 
String Matching Finite Automata & KMP Algorithm.
Malek Sumaiya
 
B trees in Data Structure
Anuj Modi
 
Pattern matching
shravs_188
 
String searching
thinkphp
 
Algorithms Lecture 4: Sorting Algorithms I
Mohamed Loey
 
String Matching (Naive,Rabin-Karp,KMP)
Aditya pratap Singh
 
Turing machines
surekamurali
 

Similar to Knuth morris pratt string matching algo (20)

PPT
W9Presentation.ppt
AlinaMishra7
 
PPTX
String-Matching algorithms KNuth-Morri-Pratt.pptx
attaullahsahito1
 
PPTX
KMP String Matching Algorithm
kalpanasatishkumar
 
PPT
lec17.ppt
shivkr15
 
PPTX
Gp 27[string matching].pptx
SumitYadav641839
 
PPT
Lec17
Nikhil Chilwant
 
PPT
String matching algorithms
Dr Shashikant Athawale
 
PDF
StringMatching-Rabikarp algorithmddd.pdf
bhagabatijenadukura
 
PDF
module6_stringmatchingalgorithm_2022.pdf
Shiwani Gupta
 
PPT
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
RAtna29
 
PPT
String-Matching Algorithms Advance algorithm
ssuseraf60311
 
PDF
Pattern matching programs
akruthi k
 
PPT
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
saki931
 
PPTX
Kmp & bm copy
Hessam Yusaf
 
PPT
Chpt9 patternmatching
dbhanumahesh
 
PPTX
String matching Algorithm by Foysal
Foysal Mahmud
 
PPTX
STRING MATCHING
Hessam Yusaf
 
PPT
String kmp
thinkphp
 
PPTX
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
NETAJI SUBHASH ENGINEERING COLLEGE , KOLKATA
 
PDF
KMP Pattern Search
Arjun SK
 
W9Presentation.ppt
AlinaMishra7
 
String-Matching algorithms KNuth-Morri-Pratt.pptx
attaullahsahito1
 
KMP String Matching Algorithm
kalpanasatishkumar
 
lec17.ppt
shivkr15
 
Gp 27[string matching].pptx
SumitYadav641839
 
String matching algorithms
Dr Shashikant Athawale
 
StringMatching-Rabikarp algorithmddd.pdf
bhagabatijenadukura
 
module6_stringmatchingalgorithm_2022.pdf
Shiwani Gupta
 
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
RAtna29
 
String-Matching Algorithms Advance algorithm
ssuseraf60311
 
Pattern matching programs
akruthi k
 
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
saki931
 
Kmp & bm copy
Hessam Yusaf
 
Chpt9 patternmatching
dbhanumahesh
 
String matching Algorithm by Foysal
Foysal Mahmud
 
STRING MATCHING
Hessam Yusaf
 
String kmp
thinkphp
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
NETAJI SUBHASH ENGINEERING COLLEGE , KOLKATA
 
KMP Pattern Search
Arjun SK
 
Ad

Recently uploaded (20)

PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPT
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PPTX
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPTX
Introduction to Basic Renewable Energy.pptx
examcoordinatormesu
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
Introduction to Basic Renewable Energy.pptx
examcoordinatormesu
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Ad

Knuth morris pratt string matching algo

  • 2. Outline  Definition  History  Components of KMP  Algorithm  Example  Run-Time Analysis  Complexity comparison of String Matching Algorithms  Advantages and Disadvantages  Real Time Applications  References 2
  • 3. What is Pattern Searching ?  Suppose you are reading a text document.  You want to search for a word.  You click CTRL + F and search for that word.  The word processor scans the document and shows the position of occurrence. What exactly happens is that, word i.e. pattern is searched inside the text document. 3
  • 4. Definition  Best known for linear time for exact matching. Compares from left to right.  Shifts more than one position.  Preprocessing approach of Pattern to avoid trivial comparisions.  Avoids recomputing matches. 4
  • 5. History  This algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by James H.Morris in 1977.  Knuth, Morris and Pratt discovered first linear time string-matching algorithm by analysis of the naive algorithm.  It keeps the information that naive approach wasted gathered during the scan of the text. By avoiding this waste of information, it achieves a running time of O(m + n).  The implementation of Knuth-Morris-Pratt algorithm is efficient because it minimizes the total number of comparisons of the pattern against the input string. 5
  • 6. Naïve Approach The naïve approach is to check whether the pattern matches the string at every possible position in the string. P= Pattern (word) of length m T= Text (document) of length n Naive string matching algorithm takes time O((n-m+1)m) or O(mn) 6
  • 7. The KMP Algorithm - Motivation x j . . a b a a b . . . . . a b a a b a a b a a b a No need to repeat these comparisons Resume comparing here  Knuth-Morris-Pratt’s algorithm compares the pattern to the text in left-to-right, but shifts the pattern more intelligently than the brute-force algorithm.  When a mismatch occurs, what is the most we can shift the pattern so as to avoid redundant comparisons?  Answer: the largest prefix of P[0..j] that is a suffix of P[1..j] 7
  • 8. Components of KMP algorithm  The prefix function, Π The prefix function,Π for a pattern encapsulates knowledge about how the pattern matches against shifts of itself. This information can be used to avoid useless shifts of the pattern ‘p’. In other words, this enables avoiding backtracking on the text ‘T’.  The KMP Matcher With text ‘T’, pattern ‘p’ and prefix function ‘Π’ as inputs, finds the occurrence of ‘p’ in ‘T’ and returns the number of shifts of ‘p’ after which occurrence is found. 8
  • 9. The prefix function, Π Following pseudocode computes the prefix function, Π: Compute-Prefix-Function (p) 1 m  length[p] //’p’ pattern to be matched 2 Π[1]  0 3 k  0 4 for q  2 to m 5 do while k > 0 and p[k+1] != p[q] 6 do k  Π[k] 7 If p[k+1] = p[q] 8 then k  k +1 9 Π[q]  k 10 return Π 9
  • 10. Example: compute Π for the pattern ‘p’ below: a b a b a c a Initially: m = length[p] = 7 Π[1] = 0 k = 0 Step 1: q = 2, k=0 Π[2] = 0 Step 2: q = 3, k = 0, Π[3] = 1 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 10 p 10
  • 11. Contd… 11 Step 3: q = 4, k = 1 Π[4] = 2 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 Step 4: q = 5, k =2 Π[5] = 3 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3
  • 12. Step 5: q = 6, k = 3 Π[6] = 0 Step 6: q = 7, k = 0 Π[7] = 1 After iterating 6 times, the prefix function computation is complete:  q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 The running time of the prefix function is O(m). 12Contd…..
  • 13. The KMP Matcher Input: The KMP Matcher, with pattern ‘p’, text ‘T’and prefix function ‘Π’, finds a match of p in T. Following pseudocode computes the matching component of KMP algorithm: KMP-Matcher(T,p) 1 n  length[T] 2 m  length[p] 3 Π  Compute-Prefix-Function(p) 4 q  0 //number of characters matched 5 for i  1 to n //scan T from left to right 6 do while q > 0 and p[q+1] != T[i] 7 do q  Π[q] //next character does not match 8 if p[q+1] = T[i] 9 then q  q + 1 //next character matches 10 if q = m //is all of p matched? 11 then print “Pattern occurs with shift” i – m 12 q  Π[ q] // look for the next match Note: KMP finds every occurrence of a ‘p’in ‘T’. That is why KMP does not terminate in step 12, rather it searches remainder of ‘T’for any more occurrences of ‘p’. 13
  • 14. Illustration: given a Text ‘T’ and pattern ‘p’ as follows: T b a c b a b a b a b a c a c a p a b a b a c a Let us execute the KMP algorithm to find whether ‘p’ occurs in ‘T’. For ‘p’the prefix function, Π was computed previously and is as follows: q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 14 14
  • 15. b a c b a b a b a b a c a a b a b a b a c a Initially: n = size of T = 15; m = size of p = 7 Step 1: i = 1, q = 0 comparing p[1] with T[1] T p P[1] does not match with T[1]. ‘p’ will be shifted one position to the right. 15 15 Contd… Step 2: i = 2, q = 0 comparing p[1] with T[2] T p a b a b a c a P[1] matches T[2]. Since there is a match, p is not shifted. b a c b a b a b a b a c a a b
  • 16. Contd… 16 T b a c b a b a b a b a c a a b a b a b a c ap p[2] does not match with T[3] Backtracking on p, comparing p[1] and T[3] Step 3: i = 3, q = 1 Comparing p[2] with T[3] a b a b a c a T p Step 4: i = 4, q = 0 comparing p[1] with T[4] p[1] does not match with T[4] b a c b a b a b a b a c a a b
  • 17. b a c b a b a b a b a c a a b a b a b a c a T p Step 5: i = 5, q = 0 comparing p[1] with T[5] p[1] matches with T[5] 17 17 Step 6: i = 6, q = 1 Comparing p[2] with T[6] p[2] matches with T[6] T p b a c b a b a b a b a c a a b a b a b a c a Contd…
  • 18. b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a T p Step 7: i = 7, q = 2 Comparing p[3] with T[7] p[3] matches with T[7] Step 8: i = 8, q = 3 Comparing p[4] with T[8] p[4] matches with T[8] T p 18 18 Contd…
  • 19. Step 9: i = 9, q = 4 Comparing p[5] with T[9] Comparing p[6] with T[10]Step 10: i = 10, q = 5 T p b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a p[6] doesn’t match with T[10] Backtracking on p, comparing p[4] with T[10] because after mismatch q = Π[5] = 3 p[5] matches with T[9] 19 19 T p Contd…
  • 20. 20 Step 11: i = 11, q = 4 Comparing p[5] with T[11] p[5] matches with T[11] T p b a c b a b a b a b a c a a b a b a b a c a Contd… Step 12: i = 12, q = 5 Comparing p[6] with T[12] p[6] matches with T[12] a b a b a c ap b a c b a b a b a b a c a a bT
  • 21. b a c b a b a b a b a c a a b a b a b a c a Comparing p[7] with T[13] T p Step 13: i = 13, q = 6 p[7] matches with T[13] Pattern ‘p’ has been found to completely occur in text ‘T’. The total number of shifts that took place for the match to be found are: i – m = 13 – 7 = 6 shifts. The running time of the KMP-Matcher function is O(n). 21 21 Contd…
  • 22. Complexity  O(m) - It is to compute the prefix function values.  O(n) - It is to compare the pattern to the text.  Total of O(n + m) run time. 22
  • 23. Complexity comparison of String Matching Algorithms 23
  • 24. Advantage and Disadvantage Advantages: 1.The running time of the KMP algorithm is optimal (O(m + n)), which is very fast. 2.The algorithm never needs to move backwards in the input text T. It makes the algorithm good for processing very large files. Disadvantages: Doesn’t work so well as the size of the alphabets increases. By which more chances of mismatch occurs. 24
  • 25. Real time Applications  Good for plagiarism analysis.  search engines  language syntax checker  database queries  music content retrieval 25
  • 26. Real time Applications 26  DNA sequences analysis : • It is mainly composed of nucleotides of four types. The four bases in DNA are Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). A DNA sequence is a representation of a string of nucleotides contained in a strand of DNA. • DNA sequences analysis of various diseases which are stored in database for retrieval and comparison. This system compares similarity values with threshold value and stores particular result which is diseased or not. • For example: ATTCGTAACTAGTAAGTTA. The DNA sequencing techniques have allowed the vast amount of data to be analyzed in a short span of time. So, pattern matching techniques plays a vital role in computational biology for data analysis related to biological data such as DNA sequences
  • 27. References  Thomas H.Cormen; Charles E.Leiserson., Introduction to algorithms second edition , “The Knuth-Morris-Pratt Algorithm”, year = 2001.  https://pdfs.semanticscholar.org/fe41/52465f96d09c94b46b86a3b6408dae5dbe13.pdf  http://research.ijcaonline.org/volume115/number23/pxc3902734.pdf 27