2. Introduction
● String matching algorithms are fundamental in computer science, allowing us to search for a specific pattern
within a larger text efficiently. These algorithms play a crucial role in various real-world applications, from
text processing to security systems.
Why Are String Matching Algorithms Important?
● Helps in fast searching of text or patterns within a large dataset.
● Improves efficiency in data retrieval and pattern recognition.
● Used in various fields like bioinformatics, cybersecurity, search engines, and plagiarism detection.
4. Types of String Matching Algorithms
A. Exact String Matching
These algorithms find occurrences where the pattern exactly matches a part of the text.
Examples of Exact Matching Algorithms:
1. Brute Force Algorithm:
○ Compares the pattern with every substring in the text sequentially.
○ Simple but inefficient for large texts.
○ Slides the pattern one character at a time until a match is found.
2. Knuth-Morris-Pratt (KMP) Algorithm:
○ Uses a preprocessing step (prefix function) to avoid unnecessary comparisons.
○ Efficient for large-scale text searching.
○ Instead of sliding the pattern one step at a time, it jumps based on previous matches.
3. Boyer-Moore Algorithm:
○ Compares the pattern from right to left for faster mismatches.
○ Uses two heuristics: bad-character heuristic (shifts based on mismatched character) and good-suffix heuristic (shifts based on
matched suffixes).
○ Works well for long patterns and large texts.
4. Rabin-Karp Algorithm:
○ Uses hashing to quickly compare substrings.
○ Ideal for searching multiple patterns at once.
5. Aho-Corasick Algorithm:
○ Uses a Trie data structure for searching multiple patterns simultaneously.
○ Commonly used in network security and bioinformatics.
5. B. Approximate String Matching Algorithms
These algorithms find matches even when there are slight differences (e.g., typos, mutations in DNA sequences).
Examples of Approximate Matching Algorithms:
1. Naive Approach:
○ Similar to the exact matching naive approach but allows minor differences.
2. Sellers Algorithm:
○ Uses dynamic programming to calculate how different two strings are.
3. Shift-Or Algorithm:
○ Uses bitwise operations to speed up searching in texts with errors.
Types of String Matching Algorithms
6. Real-World Applications of String Matching Algorithms
A. Plagiarism Detection
● Compares documents to find similarities.
● Used in academic institutions and research publications.
● Example: Turnitin, Grammarly.
B. Bioinformatics and DNA Sequencing
● Finds patterns in genetic sequences.
● Helps in identifying mutations, gene mapping, and disease research.
● Example: BLAST (Basic Local Alignment Search Tool).
C. Digital Forensics
● Locates specific keywords in large datasets during investigations.
● Used in crime detection and cybersecurity.
● Example: Searching for illegal keywords in emails or chat logs.
D. Spell Checking and Auto-correction
● Uses Trie structures and approximate matching to detect misspellings.
● Example: Microsoft Word spell checker, Google Keyboard auto-correct.
7. Real-World Applications of String Matching Algorithms
E. Spam Filters
● Detects spam emails by searching for common spam phrases.
● Example: Gmail's spam filtering system.
F. Search Engines and Database Searching
● Indexes and retrieves relevant information based on search keywords.
● Example: Google Search, SQL full-text search.
G. Intrusion Detection Systems (IDS)
● Identifies malicious network packets by matching with known attack signatures.
● Example: Snort, an open-source IDS.
8. String Matching Problem and Terminology
● A string w is a prefix of x if x= w y, for some string
● Similarly, a string w is a suffix of x if x =y w , for some string .
9. Algorithms
Brute Force Algorithm
Initially, P is aligned with T at the first index position. P is then compared with T from
left-to-right. If a mismatch occurs, ”slide” P to right by 1 position, and start the
comparison again.
10. Brute Force Algorithm
BF_StringMatcher(T, P) {
n = length(T); m = length(P);
for (s=0; s<=n-m; s++) {
i=1; j=1;
while (j<=m && T[s+i]==P[j]) {
i++; j++;
}
if (j==m+1) print ("Pattern occurs with shift=", s)
}
}
11. The Knuth-Morris-Pratt (KMP) Algorithm
In the Brute-Force algorithm, if a mismatch occurs at P[ j ] (j>1), it only slides P to right
by 1 step. It throws away one piece of information that we’ve already known. What is that
piece of information ?
Let be the current shift value. Since it is a mismatch
at P[j] , we know
12. The Knuth-Morris-Pratt (KMP) Algorithm
How can we make use of this information to make the next shift? In general, P should
slide by s’> s such that P[1..k] = T[s’ +1..s’ + k]. We then compare
P[1+k] with T[s’ +1..s’ + k] .