Advance algorithms in master of technology

Module 2
String-Matching Algorithms:
Naïve string Matching; Rabin - Karp algorithm; String matching with
finite automata; Knuth-Morris-Pratt algorithm; Boyer – Moore
algorithms.

String matching
• Text-editing programs frequently need to find all occurrences of a
pattern in the text
• the text is a document being edited
• The pattern searched for is a particular word supplied by the user.
• string matching” can increase the responsiveness of the text-editing
programs.
• Examples like DNA sequence patterns and internet search engines
make use of String-matching algorithms

• We assume that the text is an array T [1… n] of length n and the
pattern is an array P[1…m] m of length m <=n
• We assume that the elements of “P” and “T” are characters drawn
from a finite alphabet ∑.
• Eg: ∑={0,1} or ∑={a,b,….z} The character arrays P and T are often
called strings of characters

if pattern P occurs with shift s in text T, then we call s as valid shift.
Otherwise, it is an invalid shift
Here The pattern occurs only once in the text, at shift s = 3, which we call a
valid shift

Notation and terminology
• ∑* the set of all finite-length strings formed using characters from the
alphabet ∑.
• The zero-length empty string, denoted ε, also belongs to ∑*.
• The length of a string x is denoted |x|.
• The concatenation of two strings x and y, denoted xy, has length
|x|+|y|; x followed by y
• a string w is a prefix of a string x, if x = wy for some string
y∈ ∑* also |w|<=|x|
a string w is a suffix of a string x, if x = yw for some string y∈ ∑*
also |w|<=|x|

• NAIVE-STRING-MATCHER takes time O((n-m+1)m) and this bound is
tight in the worst case.
• Because it requires no preprocessing, NAIVESTRING-MATCHER’s

Advance algorithms in master of technology

The Rabin-Karp algorithm
• Uses Hashing to find whether the pattern exists in the text or not
• Firstly we will generate the hash of the given pattern
• Then we will take all substrings of same length present in text as a
pattern and compare their Hash with the pattern Hash, If both Hash
values are same, then complete with the pattern.
• Assume ∑={0,1,2,..9} so that each character is a decimal digit d=10
• Given a pattern p[1…m] and p denote its hash value
• Given a text T[1…n] and 𝑡𝑠 denote hash value of substring of length m
• If p[1…m]=T[s+1,….s+m] and 𝑡𝑠 =p then S is a valid shift

String matching with finite automata
A finite automaton is a simple machine for processing information that
scans the text string T for all occurrences of the pattern P.
A “finite automaton” (FA) is five tuple (Q, 𝑞0, A, ∑,δ)
where

• For any given input string x over the alphabet ∑, a finite automata(FA)
* starts from starting state 𝑞0∈ Q
*Reads the string x, character by character by changing state
after each character read.
• The Finite Automata (FA)
*accepts the string x, if it ends up in an accepting state
* Rejects the string x, if it does not end up in an accepting state

• The string-matching automata are very efficient, they examine each
text character exactly once
• The preprocessing time required to compute the transition
function(δ) for ∑ is given by O(M| ∑|)
• The matching time on a text string of length n is because it examines
each character exactly once.

Boyer- Moore String Matching Algorithm
• It is an efficient string-searching algorithm that is the standard
benchmark for practical string-search algorithms.
• The algorithm preprocesses the string being searched for the pattern
but not the string being searched in the text.
• The Boyer-Moore algorithm uses information gathered during
preprocessing to skip text sections, resulting in a lower constant
factor than many other string search algorithms.

• Key features
* matches on the tail of the pattern rather than the head.
*skips the text in jumps of multiple characters rather than searching
every single character in the text
*A shift is calculated by applying two rules
*bad character rule
*good suffix rule

• Bad character rule: The bad character rule considers the character in
T at which the comparison process failed by using
shift=length-index-1 * length=pattern length
* index=character
• Good suffix rule:
shift(D)=max(shift(char)-k,1) *char=bad move character
*K=NO of char match

The Knuth-Morris-Pratt algorithm
• linear-time string-matching algorithm
• Works on the principle of suffix and prefix of string(pattern)
• The prefix function ∏ for a pattern encapsulates knowledge about
how the pattern matches against shifts of itself

Advance algorithms in master of technology

More Related Content

Similar to Advance algorithms in master of technology (20)

Recently uploaded (20)

Advance algorithms in master of technology