SlideShare a Scribd company logo
STRING MATCHING
Partha P. Chakrabarti & Aritra Hazra
Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
P
P
P
P
P
P
P
P
T
P
P
String Matching: The Problem
• Goal: Find pattern P[ ] of length M in a text T[ ] of length N.
– Typically, N >> M and N is very very large (M can also be large)!
• Example: Finding a keyword from a whole PDF document
Naïve (Brute-Force) Approach
• Check for pattern starting at each text position
– Recursive Formulation (naiveMatch_rec)
– Iterative Approach (naiveMatch_itr)
Algorithm naiveMatch_rec (T[ ], N, P[ ], M)
if (N < M) then return 0;
else if (M == -1) then return 1;
else if (T[N] == P[M]) then
return (naiveMatchRec (T, N-1, P, M-1));
else
return (naiveMatchRec (T, N-1, P, M));
Algorithm naiveMatch_itr (T[ ], N, P[ ], M)
for i = 0 to N-M do {
for j = 0 to M-1 do {
if (P[i+j] == T[j]) then j++;
else break;
}
if (j == M) then
match found starting at T[i]; break;
}
Overall Time
Complexity: Θ(MN)
Can Naïve String Search be made Better?
• Illustrating Example:
– Suppose we are searching in text for pattern BAAAAAAAAA
– Suppose we match 5 characters in pattern, with mismatch on 6th character
– We know previous 6 characters in text are BAAAAB (assuming, alphabet Σ = {A, B})
• How can we make string search
algorithm more efficient?
– DO NOT check every
overlapping occurrence of
pattern string in text string
– DO make greater jumps
and DO reduce number of
comparisons
– DO NOT need to back up
the pointer in text string
Reducing Overlapped Checking: by Memorization
• Additional storage remembering what has been SEEN in Text String previously
• State Machine as
the data structure
Finite number of
states (including
start state and
halt state)
Exactly one state
transition for each
char in alphabet
Accept if sequence
of state transitions
leads to halt state DFA (Deterministic Finite Automaton)
Text String
Pattern String
Knuth-Morris-Pratt (KMP) Algorithm: Definitions
• Some Necessary Definitions
– String of length N is given as, S[0..N-1] = s0 s1 … sN-1 (where each si is from Σ)
– Substring of S[0..N-1] of length (j-i+1) is, S[i..j] = si si+1 ... sj-1 sj (0 ≤ i ≤ j ≤ N-1)
– Prefix of S[0..N-1] of length k is given as, S[0..k-1] = s0 s1 … sk-1 (1 ≤ k ≤ N-1)
– Suffix of S[0..N-1] of length l is given as, S[N-l..N-1] = sN-l sN-l+1 ... sN-1 (1 ≤ k ≤ N-1)
– Border: A substring if it is a prefix as well as suffix
• Border of S[0..N-1] having length k if S[0..k-1] = S[N-k..N-1]
• Proper Border if it is not the whole string itself
• Intuition: To find longest length proper border!!
ß string of length N à
s0 … sk-1 sk ... sN-k-1 sN-k ... sN-1
prefix suffix
KMP Algorithm: Notions and Intuition
• Longest Proper Border à Failure Function
– Given pattern string P[0..M-1], we define failure function for each i (0 ≤ i ≤ M) as,
F(i) = MAXIMUM { k | 0 ≤ k ≤ i-1 and P[1..k] = P[i-k+1..i] }
– Example:
i 0 1 2 3 4 5 6 7
P[i] a b c a b a b c
Longest Proper Border of P[0..i] ϕ ϕ ϕ a ab a ab abc
F[i] 0 0 0 1 2 1 2 3
T
P
P
§ Intuition: Use failure function to jump/shift P[ ]
by (k-F[k]+1) positions ahead
§ Proof: If shifting P by smaller amount
produced a match, then proper border of
P[0..k] longer than F[k] à Contradiction!!
KMP Algorithm: An Example
b a b
c a b a b a b a c a a b
a b a b a c a
b a b
c a b a b a b a c a a b
a b a b a c a
b a b
c a b a b a b a c a a b
a b a b a c a
0 0 1 2 3 0 1
b a b
c a b a b a b a c a a b
a b a b a c a
b a b
c a b a b a b a c a a b
a b a b a c a
b a b
c a b a b a b a c a a b
a b a b a c a
b a b
c a b a b a b a c a a b
a b a b a c a
Pattern String
Longest Proper Border Length
Text String
MATCH
KMP Algorithm and Time Complexity
Time Complexity:
• Outer loop runs ≤ (N-M+1) time
• Each iteration of outer loop increments (i-j)
– (i-j) initializes to 0 and inner loop does
not impact (i-j), as it increases i & j both
– when j continues to be 0, i increases by
1 => (i-j) increases by 1
– when j>1, i unchanged & j gets F[j-1]
• F[j-1] ≤ j-1 => i - F[j-1] ≥ (i-j)+1
• so j getting F[j-1] increases (i-j) by 1
• O(N) time in total
+ KMP_Match algorithm = O(N-M+1) time
+ Computing failure function = O(M) time
Algorithm KMP_Match (T[ ], N, P[ ], M)
F[ ] ß ComputeFailureFunct (P[ ], M);
i = 0; j = 0;
while (i-j ≤ N-M) do { // M-j ≤ N-i
while ( (j < M) and (T[i+j] == P[j]) ) do {
i++; j++;
}
if (j == M) then
match found starting at T[i-M]
if (j == 0) then i++;
else j = F[j-1];
}
find longest
matching prefix
report for match
jump/shift using
failure function
KMP Algorithm: Computing Failure Function
Algorithm ComputeFailureFunct (P[ ], M);
F[0] = 0; i = 1; j = 0;
while (i < M) do {
while ( (i < M) and (P[i] == P[j]) ) do {
j++; F[i] = j; i++;
}
if (j == 0) then do {
F[i] = 0; i++;
}
else j = F[j-1];
}
P
P
P
P
P
P
P
P
Example
Failure Function computed by sliding the Pattern String over itself !
Time Complexity: O(M)
Food-for-Thought: Exercise?
• String matching using KMP Algorithm searches only for first match
• Modify KMP Algorithm to perform the following:
① What changes will you make in the algorithm so that it can search for all
matches of pattern present in the text string?
• Example: Text = ABACAABAACAABABABAACAABBCA & Pattern = ACAAB
② When the matches may be overlapped, then how can you find all overlapping
matches as well?
• Example: Text = BABABABACABABABABACBABABAC & Pattern = ABABA
Hint: Try to bring modifications to the DFA and re-position your jumps/shifts!
Rabin-Karp Algorithm: Mathematical Overview
• Use mathematical computations
– Assume that, string is formed from Σ = {0, 1, 2, …, R-1} (radix-R notation, R = |Σ|)
– P ß decimal value of pattern string P[0..M-1] = p0 p1 … pM-1 (each pi is from Σ)
• P = pM-1 + R (pM-2 + R (pM-3 + … + R (p1 + R p0) ... )) ß Horner’s Rule [ Θ(M)-time ]
– Ti ß decimal value of M-window text-string starting at T[i], i.e. ti ti+1 … ti+M-1
• T0 ß Compute similarly for t0 t1 … tM-1 using Horner’s Rule in Θ(M)-time
– Example (…32145… in decimal): Ti = 5 + 10 x (4 + 10 x (1 + 10 x (2 + 10 x 3)))
• Ti+1 = R (Ti – RM-1 ti) + ti+M ß Compute from Ti (shift M-length window) in Θ(1)-time
– Example (...321456... à ...321456...): Ti+1 = 10 x (Ti – 10(5-1) x 3) + 6
• Computation of T1, T2, …, TN-M in Θ(N-M)-time
• When P = Ti, MATCH FOUND from index-i at T[ ], i.e. p0 p1 … pM-1 = ti ti+1 … ti+M-1
Overall Time
Complexity:
Θ(N)
Rabin-Karp Algorithm: Efficient Computation
• Challenge: efficiently compute Ti+1 given that we know Ti
– Ti = ti RM-1 + ti+1 RM-2 + ... + ti+M-1 R0 and Ti+1 = ti+1 RM-1 + ti+2 RM-2 + ... + ti+M R0
• Key property:
Can update function in
constant time!
– Ti+1 = (Ti – ti RM-1) R + ti+M
current
value
subtract
leading digit
multiply
by radix
add new
trailing digit
Rabin-Karp Algorithm: An Example
T0 = ((((3) * 10 + 1) * 10 + 4) * 10 + 1) * 10 + 5
T1 = 10 * (31415 – 104 * 3) + 9
T2 = 10 * (14159 – 104 * 1) + 2
T3 = 10 * (41592 – 104 * 4) + 6
T4 = 10 * (15926 – 104 * 1) + 5
T5 = 10 * (59265 – 104 * 5) + 3
T6 = 10 * (92653 – 104 * 9) + 5
So, P
MATCH !!
as, P = T6
Θ(M)
Θ(M)
each in Θ(1)
Θ(N-M) in
worst-case
Overall Time-
Complexity:
Θ(N)
Rabin-Karp Algorithm: Hash-map based Approach
• Solution: use Modular Hashing
– Compute a hash of
P[0..M-1], say HP
– For each i, compute a hash
of T[i..i+M-1], say HT
– If pattern hash (HP) ≠ text
substring hash (HT),
definitely NOT a match
– If pattern hash (HP) = text
substring hash (HT), check
for a VALID match
• Demerit of computing P and Ti values:
– may be very large if M is long! (non-constant arithmetic operations)
Modular Hash with R=10
and H(k) = k (mod 997)
Rabin-Karp Algorithm: Modular Hash-map Arithmatic
Modular hash function Compute:
• Ti = ti RM-1 + ti+1 RM-2 + ... + ti+M–1
R0 (mod Q)
– Horner's method: Linear-
time method to evaluate
degree-M polynomial
• Ti+1 = [ ( Ti(mod Q) – ti *
RM-1(mod Q) ) R + ti+M ](mod Q)
– Efficient modular maths
To keep numbers small, take
intermediate results modulo Q
26535 = 2*10000 + 6*1000 + 5*100 + 3*10 + 5
= ((((2) *10 + 6) * 10 + 5) * 10 + 3) * 10 + 5
Rabin-Karp Algorithm: Rolling Modular Hash-map
• First R entries: Use Horner's rule
• Remaining entries: Use rolling hash (and % or modulus to avoid overflow)
Rabin-Karp Algorithm (Psudo-code)
Algorithm Rabin-Karp_StrMatch (TXT[], N, PAT[], M, R, Q)
C = RM-1 mod Q; P = 0; T0 = 0;
for j = 1 to m do { // Preprocessing
P = (RP + PAT[j]) mod Q; T0 = (RT0 + TXT[j]) mod Q;
}
for i = 0 to N-M do { // Matching
if (P == Ti) then
if (PAT[1..M] = TXT[i+1..i+M]) then
match found starting at TXT[i];
if (i < N-M) then
Ti+1 = (R (Ti – TXT[i+1] C) + TXT[i+M+1]) mod Q
}
Comparative Study
Θ(n+m) in
practical cases
n = text string length
m = pattern string length
Thank you

More Related Content

Similar to StringMatching-Rabikarp algorithmddd.pdf (20)

PPT
chapter1 (1).ppt
NateHigger
 
PPT
chapter1.ppt
NateHigger
 
PPT
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
saki931
 
PDF
Rabin Karp Algorithm
Kiran K
 
PDF
lec 03wweweweweweweeweweweewewewewee.pdf
Huma Ayub
 
PDF
String matching algorithms
Mahdi Esmailoghli
 
PPTX
LSH
Hsiao-Fei Liu
 
PPT
Chapter 8 Root Locus Techniques
guesta0c38c3
 
PDF
ESINF03-AlgAnalis.pdfESINF03-AlgAnalis.pdf
LusArajo20
 
PPT
lecture3.pptlecture3 data structures pptt
SyedAliShahid3
 
PDF
Computer algorithm(Dynamic Programming).pdf
jannatulferdousmaish
 
PPT
Chap09alg
Munhchimeg
 
PPT
Chap09alg
Munkhchimeg
 
PPTX
asymptotic analysis and insertion sort analysis
Anindita Kundu
 
PPT
ALGORITHM-ANALYSIS.ppt
sapnaverma97
 
PPT
Lecture 1 and 2 of Data Structures & Algorithms
haseebanjum2611
 
PPT
Pairing scott
SghaierAnissa
 
PPT
Oral-2
Thomas Effland
 
PDF
Unit 1_final DESIGN AND ANALYSIS OF ALGORITHM.pdf
saiscount01
 
PPT
Dynamic_methods_Greedy_algorithms_11.ppt
Gautam873893
 
chapter1 (1).ppt
NateHigger
 
chapter1.ppt
NateHigger
 
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
saki931
 
Rabin Karp Algorithm
Kiran K
 
lec 03wweweweweweweeweweweewewewewee.pdf
Huma Ayub
 
String matching algorithms
Mahdi Esmailoghli
 
Chapter 8 Root Locus Techniques
guesta0c38c3
 
ESINF03-AlgAnalis.pdfESINF03-AlgAnalis.pdf
LusArajo20
 
lecture3.pptlecture3 data structures pptt
SyedAliShahid3
 
Computer algorithm(Dynamic Programming).pdf
jannatulferdousmaish
 
Chap09alg
Munhchimeg
 
Chap09alg
Munkhchimeg
 
asymptotic analysis and insertion sort analysis
Anindita Kundu
 
ALGORITHM-ANALYSIS.ppt
sapnaverma97
 
Lecture 1 and 2 of Data Structures & Algorithms
haseebanjum2611
 
Pairing scott
SghaierAnissa
 
Unit 1_final DESIGN AND ANALYSIS OF ALGORITHM.pdf
saiscount01
 
Dynamic_methods_Greedy_algorithms_11.ppt
Gautam873893
 

Recently uploaded (20)

PPTX
MATLAB : Introduction , Features , Display Windows, Syntax, Operators, Graph...
Amity University, Patna
 
PDF
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
Big Data and Data Science hype .pptx
SUNEEL37
 
PPTX
Introduction to Basic Renewable Energy.pptx
examcoordinatormesu
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PPT
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
PPT
Carmon_Remote Sensing GIS by Mahesh kumar
DhananjayM6
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PDF
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
MATLAB : Introduction , Features , Display Windows, Syntax, Operators, Graph...
Amity University, Patna
 
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Big Data and Data Science hype .pptx
SUNEEL37
 
Introduction to Basic Renewable Energy.pptx
examcoordinatormesu
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Thermal runway and thermal stability.pptx
godow93766
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
Carmon_Remote Sensing GIS by Mahesh kumar
DhananjayM6
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
Ad

StringMatching-Rabikarp algorithmddd.pdf

  • 1. STRING MATCHING Partha P. Chakrabarti & Aritra Hazra Department of Computer Science and Engineering Indian Institute of Technology Kharagpur P P P P P P P P T P P
  • 2. String Matching: The Problem • Goal: Find pattern P[ ] of length M in a text T[ ] of length N. – Typically, N >> M and N is very very large (M can also be large)! • Example: Finding a keyword from a whole PDF document
  • 3. Naïve (Brute-Force) Approach • Check for pattern starting at each text position – Recursive Formulation (naiveMatch_rec) – Iterative Approach (naiveMatch_itr) Algorithm naiveMatch_rec (T[ ], N, P[ ], M) if (N < M) then return 0; else if (M == -1) then return 1; else if (T[N] == P[M]) then return (naiveMatchRec (T, N-1, P, M-1)); else return (naiveMatchRec (T, N-1, P, M)); Algorithm naiveMatch_itr (T[ ], N, P[ ], M) for i = 0 to N-M do { for j = 0 to M-1 do { if (P[i+j] == T[j]) then j++; else break; } if (j == M) then match found starting at T[i]; break; } Overall Time Complexity: Θ(MN)
  • 4. Can Naïve String Search be made Better? • Illustrating Example: – Suppose we are searching in text for pattern BAAAAAAAAA – Suppose we match 5 characters in pattern, with mismatch on 6th character – We know previous 6 characters in text are BAAAAB (assuming, alphabet Σ = {A, B}) • How can we make string search algorithm more efficient? – DO NOT check every overlapping occurrence of pattern string in text string – DO make greater jumps and DO reduce number of comparisons – DO NOT need to back up the pointer in text string
  • 5. Reducing Overlapped Checking: by Memorization • Additional storage remembering what has been SEEN in Text String previously • State Machine as the data structure Finite number of states (including start state and halt state) Exactly one state transition for each char in alphabet Accept if sequence of state transitions leads to halt state DFA (Deterministic Finite Automaton) Text String Pattern String
  • 6. Knuth-Morris-Pratt (KMP) Algorithm: Definitions • Some Necessary Definitions – String of length N is given as, S[0..N-1] = s0 s1 … sN-1 (where each si is from Σ) – Substring of S[0..N-1] of length (j-i+1) is, S[i..j] = si si+1 ... sj-1 sj (0 ≤ i ≤ j ≤ N-1) – Prefix of S[0..N-1] of length k is given as, S[0..k-1] = s0 s1 … sk-1 (1 ≤ k ≤ N-1) – Suffix of S[0..N-1] of length l is given as, S[N-l..N-1] = sN-l sN-l+1 ... sN-1 (1 ≤ k ≤ N-1) – Border: A substring if it is a prefix as well as suffix • Border of S[0..N-1] having length k if S[0..k-1] = S[N-k..N-1] • Proper Border if it is not the whole string itself • Intuition: To find longest length proper border!! ß string of length N à s0 … sk-1 sk ... sN-k-1 sN-k ... sN-1 prefix suffix
  • 7. KMP Algorithm: Notions and Intuition • Longest Proper Border à Failure Function – Given pattern string P[0..M-1], we define failure function for each i (0 ≤ i ≤ M) as, F(i) = MAXIMUM { k | 0 ≤ k ≤ i-1 and P[1..k] = P[i-k+1..i] } – Example: i 0 1 2 3 4 5 6 7 P[i] a b c a b a b c Longest Proper Border of P[0..i] ϕ ϕ ϕ a ab a ab abc F[i] 0 0 0 1 2 1 2 3 T P P § Intuition: Use failure function to jump/shift P[ ] by (k-F[k]+1) positions ahead § Proof: If shifting P by smaller amount produced a match, then proper border of P[0..k] longer than F[k] à Contradiction!!
  • 8. KMP Algorithm: An Example b a b c a b a b a b a c a a b a b a b a c a b a b c a b a b a b a c a a b a b a b a c a b a b c a b a b a b a c a a b a b a b a c a 0 0 1 2 3 0 1 b a b c a b a b a b a c a a b a b a b a c a b a b c a b a b a b a c a a b a b a b a c a b a b c a b a b a b a c a a b a b a b a c a b a b c a b a b a b a c a a b a b a b a c a Pattern String Longest Proper Border Length Text String MATCH
  • 9. KMP Algorithm and Time Complexity Time Complexity: • Outer loop runs ≤ (N-M+1) time • Each iteration of outer loop increments (i-j) – (i-j) initializes to 0 and inner loop does not impact (i-j), as it increases i & j both – when j continues to be 0, i increases by 1 => (i-j) increases by 1 – when j>1, i unchanged & j gets F[j-1] • F[j-1] ≤ j-1 => i - F[j-1] ≥ (i-j)+1 • so j getting F[j-1] increases (i-j) by 1 • O(N) time in total + KMP_Match algorithm = O(N-M+1) time + Computing failure function = O(M) time Algorithm KMP_Match (T[ ], N, P[ ], M) F[ ] ß ComputeFailureFunct (P[ ], M); i = 0; j = 0; while (i-j ≤ N-M) do { // M-j ≤ N-i while ( (j < M) and (T[i+j] == P[j]) ) do { i++; j++; } if (j == M) then match found starting at T[i-M] if (j == 0) then i++; else j = F[j-1]; } find longest matching prefix report for match jump/shift using failure function
  • 10. KMP Algorithm: Computing Failure Function Algorithm ComputeFailureFunct (P[ ], M); F[0] = 0; i = 1; j = 0; while (i < M) do { while ( (i < M) and (P[i] == P[j]) ) do { j++; F[i] = j; i++; } if (j == 0) then do { F[i] = 0; i++; } else j = F[j-1]; } P P P P P P P P Example Failure Function computed by sliding the Pattern String over itself ! Time Complexity: O(M)
  • 11. Food-for-Thought: Exercise? • String matching using KMP Algorithm searches only for first match • Modify KMP Algorithm to perform the following: ① What changes will you make in the algorithm so that it can search for all matches of pattern present in the text string? • Example: Text = ABACAABAACAABABABAACAABBCA & Pattern = ACAAB ② When the matches may be overlapped, then how can you find all overlapping matches as well? • Example: Text = BABABABACABABABABACBABABAC & Pattern = ABABA Hint: Try to bring modifications to the DFA and re-position your jumps/shifts!
  • 12. Rabin-Karp Algorithm: Mathematical Overview • Use mathematical computations – Assume that, string is formed from Σ = {0, 1, 2, …, R-1} (radix-R notation, R = |Σ|) – P ß decimal value of pattern string P[0..M-1] = p0 p1 … pM-1 (each pi is from Σ) • P = pM-1 + R (pM-2 + R (pM-3 + … + R (p1 + R p0) ... )) ß Horner’s Rule [ Θ(M)-time ] – Ti ß decimal value of M-window text-string starting at T[i], i.e. ti ti+1 … ti+M-1 • T0 ß Compute similarly for t0 t1 … tM-1 using Horner’s Rule in Θ(M)-time – Example (…32145… in decimal): Ti = 5 + 10 x (4 + 10 x (1 + 10 x (2 + 10 x 3))) • Ti+1 = R (Ti – RM-1 ti) + ti+M ß Compute from Ti (shift M-length window) in Θ(1)-time – Example (...321456... à ...321456...): Ti+1 = 10 x (Ti – 10(5-1) x 3) + 6 • Computation of T1, T2, …, TN-M in Θ(N-M)-time • When P = Ti, MATCH FOUND from index-i at T[ ], i.e. p0 p1 … pM-1 = ti ti+1 … ti+M-1 Overall Time Complexity: Θ(N)
  • 13. Rabin-Karp Algorithm: Efficient Computation • Challenge: efficiently compute Ti+1 given that we know Ti – Ti = ti RM-1 + ti+1 RM-2 + ... + ti+M-1 R0 and Ti+1 = ti+1 RM-1 + ti+2 RM-2 + ... + ti+M R0 • Key property: Can update function in constant time! – Ti+1 = (Ti – ti RM-1) R + ti+M current value subtract leading digit multiply by radix add new trailing digit
  • 14. Rabin-Karp Algorithm: An Example T0 = ((((3) * 10 + 1) * 10 + 4) * 10 + 1) * 10 + 5 T1 = 10 * (31415 – 104 * 3) + 9 T2 = 10 * (14159 – 104 * 1) + 2 T3 = 10 * (41592 – 104 * 4) + 6 T4 = 10 * (15926 – 104 * 1) + 5 T5 = 10 * (59265 – 104 * 5) + 3 T6 = 10 * (92653 – 104 * 9) + 5 So, P MATCH !! as, P = T6 Θ(M) Θ(M) each in Θ(1) Θ(N-M) in worst-case Overall Time- Complexity: Θ(N)
  • 15. Rabin-Karp Algorithm: Hash-map based Approach • Solution: use Modular Hashing – Compute a hash of P[0..M-1], say HP – For each i, compute a hash of T[i..i+M-1], say HT – If pattern hash (HP) ≠ text substring hash (HT), definitely NOT a match – If pattern hash (HP) = text substring hash (HT), check for a VALID match • Demerit of computing P and Ti values: – may be very large if M is long! (non-constant arithmetic operations) Modular Hash with R=10 and H(k) = k (mod 997)
  • 16. Rabin-Karp Algorithm: Modular Hash-map Arithmatic Modular hash function Compute: • Ti = ti RM-1 + ti+1 RM-2 + ... + ti+M–1 R0 (mod Q) – Horner's method: Linear- time method to evaluate degree-M polynomial • Ti+1 = [ ( Ti(mod Q) – ti * RM-1(mod Q) ) R + ti+M ](mod Q) – Efficient modular maths To keep numbers small, take intermediate results modulo Q 26535 = 2*10000 + 6*1000 + 5*100 + 3*10 + 5 = ((((2) *10 + 6) * 10 + 5) * 10 + 3) * 10 + 5
  • 17. Rabin-Karp Algorithm: Rolling Modular Hash-map • First R entries: Use Horner's rule • Remaining entries: Use rolling hash (and % or modulus to avoid overflow)
  • 18. Rabin-Karp Algorithm (Psudo-code) Algorithm Rabin-Karp_StrMatch (TXT[], N, PAT[], M, R, Q) C = RM-1 mod Q; P = 0; T0 = 0; for j = 1 to m do { // Preprocessing P = (RP + PAT[j]) mod Q; T0 = (RT0 + TXT[j]) mod Q; } for i = 0 to N-M do { // Matching if (P == Ti) then if (PAT[1..M] = TXT[i+1..i+M]) then match found starting at TXT[i]; if (i < N-M) then Ti+1 = (R (Ti – TXT[i+1] C) + TXT[i+M+1]) mod Q }
  • 19. Comparative Study Θ(n+m) in practical cases n = text string length m = pattern string length