CS计算机代考程序代写 Java finance Finite State Automaton ER algorithm Algorithms

Algorithms
ROBERT SEDGEWICK | KEVIN WAYNE
Algorithms FOURTH EDITION
ROBERT SEDGEWICK | KEVIN WAYNE
http://algs4.cs.princeton.edu
5.3 SUBSTRING SEARCH
‣ introduction
‣ brute force
‣ Knuth-Morris-Pratt ‣ Boyer-Moore
‣ Rabin-Karp

Algorithms
ROBERT SEDGEWICK | KEVIN WAYNE
http://algs4.cs.princeton.edu
5.3 SUBSTRING SEARCH
‣ introduction
‣ brute force
‣ Knuth-Morris-Pratt ‣ Boyer-Moore
‣ Rabin-Karp

Substring search
Goal. Find pattern of length M in a text of length N.
typically N >> M
(in some cases : MI
Nz)
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
Substring search
match
3

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
typically N >> M
Substring search
match
4

Substring search applications
Goal. Find pattern of length M in a text of length N.
NEEDLE INAHAYSTACKNEEDLEINA
match
Substring search
Computer forensics. Search memory or disk for signatures,
 e.g., all URLs or RSA keys that the user has entered.
pattern
text
typically N >> M
http://citp.princeton.edu/memory
5

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
typically N >> M
Identify patterns indicative of spam.
Substring search
match
6

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
typically N >> M
I・dentify patterns indicative of spam. PROFITS
Substring search
match
6

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
typically N >> M
I・dentify patterns indicative of spam. ・ PROFITS
L0SE WE1GHT
Substring search
match
6

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
typically N >> M
I・dentify patterns indicative of spam. ・ PROFITS
・L0SE WE1GHT
herbal Viagra
Substring search
match
6

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
typically N >> M
I・dentify patterns indicative of spam. ・ PROFITS
・L0SE WE1GHT
・herbal Viagra
There is no catch.
Substring search
match
6

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
typically N >> M
Identify patterns indicative of spam.
Substring search
match
・ PROFITS
・L0SE WE1GHT
・herbal Viagra
・There is no catch.
・This is a one-time mailing.
6

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
typically N >> M
Identify patterns indicative of spam.
Substring search
match
・ PROFITS
・L0SE WE1GHT
・herbal Viagra
・There is no catch.
・This is a one-time mailing.
・This message is sent in compliance with spam regulations.
6

Substring search applications
Goal. Find pattern of length M in a text of length N.
pattern
text
NEEDLE INAHAYSTACKNEEDLEINA
Identify patterns indicative of spam.
Substring search
match
typically N >> M
・ PROFITS
・L0SE WE1GHT
・herbal Viagra
・There is no catch.
・This is a one-time mailing.
・This message is sent in compliance with spam regulations.
6

Substring search applications
Electronic surveillance.
Need to monitor all internet traffic. (security)
7

Substring search applications
Electronic surveillance.
Need to monitor all internet traffic. (security)
No way! (privacy)
7

Substring search applications
Electronic surveillance.
Need to monitor all internet traffic. (security)
No way! (privacy)
Well, we’re mainly interested in “ATTACK AT DAWN”
7

Substring search applications
Electronic surveillance.
Need to monitor all internet traffic. (security)
No way! (privacy)
Well, we’re mainly interested in “ATTACK AT DAWN”
OK. Build a machine that just looks for that.
7

Substring search applications
Electronic surveillance.
Need to monitor all internet traffic. (security)
No way! (privacy)
Well, we’re mainly interested in “ATTACK AT DAWN”
OK. Build a machine that just looks for that.
“ATTACK AT DAWN” substring search
 machine
found
7

Substring search applications
Screen scraping. Extract relevant data from web page.
Ex. Find string delimited by and after first occurrence of

pattern Last Trade:.


Last Trade: 452.92
Trade Time: …
http://finance.yahoo.com/q?s=goog
8

Screen scraping: Java implementation
Java library. The indexOf() method in Java’s string library returns the index of the first occurrence of a given string, starting at a given offset.
public class StockQuote
 {
public static void main(String[] args)
 {
String name = “http://finance.yahoo.com/q?s=”;
In in = new In(name + args[0]);
String text = in.readAll();
String price = text.substring(from + 3, to);
StdOut.println(price);
}
}
int start
int from
int to
= text.indexOf(“Last Trade:”, 0);
= text.indexOf(““, start);
= text.indexOf(“
“, from);
% java StockQuote goog
582.93
% java StockQuote msft
24.84
9

Algorithms
ROBERT SEDGEWICK | KEVIN WAYNE
http://algs4.cs.princeton.edu
5.3 SUBSTRING SEARCH
‣ introduction
‣ brute force
‣ Knuth-Morris-Pratt ‣ Boyer-Moore
‣ Rabin-Karp

Brute-force substring search
Check for pattern starting at each text position.
N =
of txt
i j i+j 0 1 2 3 4 5 6 7 8 9 10 txt ABACADABRAC
6 410
return i when j is M
ABRA
match
entries in red are mismatches
entries in gray are for reference only
entries in black A B R A match the text A B R A
M
=
length length of
pattern
0 2 2
101
213
303
4 1 5
5 05
A B R A pat ABRA
ABRA ABRA
Brute-force substring search
11

Brute-force substring search: Java implementation
Check for pattern starting at each text position.
public static int search(String pat, String txt)
 {

int M = pat.length();

int N = txt.length();

for (int i = 0; i <= N - M; i++)
 {
 int j;
 for (j = 0; j < M; j++)
 if (txt.charAt(i+j) != pat.charAt(j))
 }
 return N; } not found break;
 if (j == M) return i;
 index in text where
 pattern starts 12 i ji+j 012345678910 ABACADABRAC 437 ADACR 505 ADACR Brute-force substring search: worst case Brute-force algorithm can be slow if text and pattern are repetitive. --T -_ 10 i j i+j 0 1 2 3 4 5 6 7 8 9 10 j i+j 0 1 2 3 4 5 6 7 8 9 i 044AAAAB txt ABACADABRAC μ N txt AAAAAAAAAB 0 2 2 A B R A pat pat entries in red are mismatches entries in gray are for reference only 101 ABRA 145 AAAAB 213 ABRA 246 AAAAB 303 ABRA 347 AAAAB 4 1 5 entries in black A B R A 448 AAAAB 505 match the text 5 5 10 A B R A A A A A B 6 4 10 return i when j is M A B R A Brute-force substring search (worst case) Brute-force substring search match 13 Brute-force substring search: worst case Brute-force algorithm can be slow if text and pattern are repetitive. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 i j i+j 0 1 2 3 4 5 6 7 8 9 10 j i+j 0 1 2 3 4 5 6 7 8 9 txt ABACADABRAC txt AAAAAAAAAB i 044AAAAB 0 2 2 A B R A pat pat 1 0 1 1 4 5 A B R A entries in red are AAAAB mismatches 213 ABRA A A A A B entries in gray are 246 30 347 4 1 5 5 10 6 4 10 Brute-force substring search (worst case) Brute-force substring search 448 50 AAAAB A B R A A A A A B A B R A Worst case. ~ M N char compares. 3 5 5 A B R A for reference only AAAAB entries in black A B R A match the text return i when j is M = imagine or N TN M match 109 MINT - and 13 Backup I・n many applications, we want to avoid backup in text stream. ・Treat input as stream of data. Abstract model: standard input. “ATTACK AT DAWN” substring search machine 14 Backup I・n many applications, we want to avoid backup in text stream. ・Treat input as stream of data. Abstract model: standard input. 
 
 “ATTACK AT DAWN” substring search machine Brute-force algorithm needs backup for every mismatch. text→ pattern matched chars ex mismatch backup 14 AAAAAAAAAAAAAAAAAAAAAB AAAAAB AAAAAAAAAAAAAAAAAAAAAB AAAAAB shift pattern right one position Backup I・n many applications, we want to avoid backup in text stream. ・Treat input as stream of data. Abstract model: standard input. 
 
 “ATTACK AT DAWN” substring search machine Brute-force algorithm needs backup for every mismatch. 
 
 
 
 
 
 
 
 matched chars mismatch backup Approach 1. Maintain buffer of last M characters. 14 AAAAAAAAAAAAAAAAAAAAAB AAAAAB AAAAAAAAAAAAAAAAAAAAAB AAAAAB shift pattern right one position Backup I・n many applications, we want to avoid backup in text stream. ・Treat input as stream of data. Abstract model: standard input. 
 
 “ATTACK AT DAWN” substring search machine Brute-force algorithm needs backup for every mismatch. 
 
 
 
 
 
 
 
 matched chars mismatch backup Approach 1. Maintain buffer of last M characters. Approach 2. Stay tuned. 14 AAAAAAAAAAAAAAAAAAAAAB AAAAAB AAAAAAAAAAAAAAAAAAAAAB AAAAAB shift pattern right one position Brute-force substring search: alternate implementation Same sequence of char compares as previous implementation. ・ i points to end of sequence of already-matched chars in text. ・ j stores # of already-matched chars (end of sequence in pattern). public static int search(String pat, String txt) { int i, N = txt.length(); int j, M = pat.length(); for (i = 0, j = 0; i < N && j < M; i++) { if (txt.charAt(i) == pat.charAt(j)) j++; } if (j == M) return i - M; else return N; } else { i -= j; j = 0; } explicit backup 15 i j 012345678910 ABACADABRAC 73 ADACR 50 ADACR Algorithmic challenges in substring search Brute-force is not always good enough. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for each good person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their attack at dawn party. Now is the time for each person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. 16 Algorithmic challenges in substring search Brute-force is not always good enough. 
 Theoretical challenge. Linear-time guarantee. fundamental algorithmic problem Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for each good person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their attack at dawn party. Now is the time for each person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. 16 Algorithmic challenges in substring search Brute-force is not always good enough. 
 Theoretical challenge. Linear-time guarantee. Practical challenge. Avoid backup in text stream. fundamental algorithmic problem often no room or time to save text Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for each good person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their attack at dawn party. Now is the time for each person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. 16 Algorithms ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu ‣ introduction ‣ brute force ‣ Knuth-Morris-Pratt ‣ Boyer-Moore ‣ Rabin-Karp 5.3 SUBSTRING SEARCH Michael Rabin Dick Karp Rabin-Karp fingerprint search Basic idea = modular hashing. ( division method ) 73 Rabin-Karp fingerprint search B・asic idea = modular hashing. Compute a hash of pat[0..M-1]. pat.charAt(i) i 01234 2 6 5 3 5 % 997 = 613 txt.charAt(i) i 01234 5 6 7 8 9 10 11 12 13 14 15 3141592653589793 large prime number ( division method ) k mod of hlkl % = tQ ( = " mod " ) 0 31 4 1 5 % 997 = 508 1 1 4 1 5 9 % 997 = 201 2 4 1 5 9 2 % 997 = 715 3 4 5 6 returni = 6 1 5 9 2 6 % 997 = 971 5 9 2 6 5 % 997 = 442 9 2 6 5 3 % 997 = 929 2 6 5 3 5 % 997 = 613 Basis for Rabin-Karp substring search modular hashing with R = 10 and hash(s) = s (mod 997) match 73 Rabin-Karp fingerprint search B・asic idea = modular hashing. ・Compute a hash of pat[0..M-1]. For each i, compute a hash of txt[i..M+i-1]. method ) ( division large prime numbe hlkl = k mod of (% = "mod") pat.charAt(i) tQ i 0 1 2 3 4 2 i 0 3 0 3 1 2 3 4 5 6 1 1 1 1 5 3 5 2 3 4 4 1 5 4 1 5 4 1 5 4 1 5 % 997 = 613 txt.charAt(i) 6 returni = 6 5 6 7 8 9101112131415 9 2 6 5 3 5 8 9 7 9 3 % 997 = 508 9 % 997 = 201 9 2 % 997 = 715 9 2 6 % 997 = 971 9 2 6 5 3 % 997 = 929 2 6 5 3 5 % 997 = 613 1 5 5 9 2 6 5 % 997 = 442 match Basis for Rabin-Karp substring search modular hashing with R = 10 and hash(s) = s (mod 997) 73 Rabin-Karp fingerprint search B・asic idea = modular hashing. ・Compute a hash of pat[0..M-1]. ・For each i, compute a hash of txt[i..M+i-1]. If pattern hash = text substring hash, check for a match. pat.charAt(i) i 0 2 i 0 3 0 3 1 2 3 4 5 1 6 1 1 1 1 2 3 4 5 3 5 2 3 4 4 1 5 4 1 5 4 1 5 4 1 5 % 997 = 613 txt.charAt(i) 5 6 7 8 9101112131415 9 2 6 5 3 5 8 9 7 9 3 6 returni = 6 % 997 = 508 9 % 997 = 201 9 2 % 997 = 715 9 2 6 % 997 = 971 9 2 6 5 3 % 997 = 929 2 6 5 3 5 % 997 = 613 1 5 5 9 2 6 5 % 997 = 442 match Basis for Rabin-Karp substring search modular hashing with R = 10 and hash(s) = s (mod 997) 73 - (division method) of strings with Modular hashing alphabet (# of distinct characters that can appear in text) general μ = length of pattern 1st char of text ( represented in range toil, - - i ' R = size alphabet of t. " "" (to Rn tt-Rt , )Q text R-H = hash value of the initial M characters of keychatleng.es -challenge 1 : If M is large , might have ( of length : cost Yo . - tta, h÷mod ... chatter Hashing - substring Mt) numerical overflow m) : costs m NM one Hashing N order Chatkngel: If M is large , then the number will overflow. modular arithmetic identies : Cll (atb) modof = ((a mudQ) +(bmodal)modQ (21(d-b)modQ=((amodQ). (bmodal)mudQ Two ÷method Horner's ÷÷÷. " "" Xo ii. ... " . 4to mod Q)Rtt, )mod4)Atta), mod Q = (to. rn tt- R t , - t ta N)mod Q , i. of:÷÷÷:: . . . totalcostofMN x; is hash value for tititi,. .. , titu- I Chalking Avoiding " " tin = - t,- t..-t-i- tin Xit, Xi (tiRM" tiRM" tinN)mode ... Titu Xiti = ( tin- RM" t . . - - t time, - R' ttm.m.ro/modQ = ((x tiI" ) team ) mod xi+, ..-RRt Q = Rm- ' mod Q Observation ! precomnule I : Modular arithmetic Math trick. To keep numbers small, take intermediate results modulo Q. Ex. 10000 mod 997 = 30 (mod 997) (mod 997) (mod 997) (mod 997) (10000 + 535) * 1000 = (30 + 535) * 3 = 1695 = 698 1000 mod 997 = 3 74 Modular arithmetic Math trick. To keep numbers small, take intermediate results modulo Q. Ex. 10000 mod 997 = 30 (mod 997) (mod 997) (mod 997) (mod 997) (10000 + 535) * 1000 = (30 + 535) * 3 = 1695 = 698 1000 mod 997 = 3 (a + b) mod Q = ((a mod Q) + (b mod Q)) mod Q (a * b) mod Q = ((a mod Q) * (b mod Q)) mod Q two useful modular arithmetic identities 74 Efficiently computing the hash function Modular hash function. Using the notation ti for txt.charAt(i),
 we wish to compute 
 
 
 xi = ti RM–1 +ti+1 RM–2 +...+ti+M–1 R0 (modQ) Intuition. M-digit, base-R integer, modulo Q. 26535 = 2*10000 + 6*1000 + 5*100 + 3*10 + 5 75 Efficiently computing the hash function Modular hash function. Using the notation ti for txt.charAt(i),
 we wish to compute 
 
 
 xi = ti RM–1 +ti+1 RM–2 +...+ti+M–1 R0 (modQ) Intuition. M-digit, base-R integer, modulo Q. 
 Horner's method. Linear-time method to evaluate degree-M polynomial. pat.charAt() i01234 26535 0 2 % 997 = 2 RQ 1 2 6 2 2 6 3 2 6 4 2 6 % 997 = (2*10 + 6) % 997 = 26 5 % 997 = (26*10 + 5) % 997 = 265 5 3 % 997 = (265*10 + 3) % 997 = 659 5 3 5 % 997 = (659*10 + 5) % 997 = 613 Computing the hash value for the pattern with Horner’s method 26535 = 2*10000 + 6*1000 + 5*100 + 3*10 + 5 = ((((2) *10 + 6) * 10 + 5) * 10 + 3) * 10 + 5 75 Efficiently computing the hash function Modular hash function. Using the notation ti for txt.charAt(i),
 we wish to compute 
 
 
 xi = ti RM–1 +ti+1 RM–2 +...+ti+M–1 R0 (modQ) Intuition. M-digit, base-R integer, modulo Q. 
 Horner's method. Linear-time method to evaluate degree-M polynomial. pat.charAt() i01234 26535 0 2 % 997 = 2 RQ 1 2 6 2 2 6 3 2 6 4 2 6 % 997 = (2*10 + 6) % 997 = 26 5 % 997 = (26*10 + 5) % 997 = 265 5 3 % 997 = (265*10 + 3) % 997 = 659 5 3 5 % 997 = (659*10 + 5) % 997 = 613 // Compute hash for M-digit key private long hash(String key, int M) { long h = 0; for (int j = 0; j < M; j++) h = (h * R + key.charAt(j)) % Q; return h; } Computing the hash value for the pattern with Horner’s method 26535 = 2*10000 + 6*1000 + 5*100 + 3*10 + 5 = ((((2) *10 + 6) * 10 + 5) * 10 + 3) * 10 + 5 75 Efficiently computing the hash function Challenge. How to efficiently compute xi+1 given that we know xi. xi = ti RM–1 +ti+1 RM–2 +...+ti+M–1 R0 xi+1 = ti+1 RM–1 +ti+2 RM–2 +...+ti+M R0 i ... 2 3 4 5 6 7 ... 6 5 6 5 current value 1 4 1 5 9 2 text new value 4 1 5 9 2 41592 -40000 1592 *10 15920 +6 15926 current value subtract leading digit multiply by radix add new trailing digit new value Key computation in Rabin-Karp substring search 76 (move right one position in the text) Efficiently computing the hash function Challenge. How to efficiently compute xi+1 given that we know xi. xi = ti RM–1 +ti+1 RM–2 +...+ti+M–1 R0 xi+1 = ti+1 RM–1 +ti+2 RM–2 +...+ti+M R0 i ... 2 current value 1 4 newvalue 4 4 - 4 1 1 3 4 1 5 1 5 1 5 0 0 1 5 5 6 9 2 9 2 9 2 0 0 9 2 7 ... 6 5 6 5 text 5 9 5 9 2 0 + 6 2 6 * 1 0 current value subtract leading digit multiply by radix add new trailing digit new value Key computation in Rabin-Karp substring search 76 (move right one position in the text) Efficiently computing the hash function Challenge. How to efficiently compute xi+1 given that we know xi. 
 
 
 Key property. xi = ti RM–1 +ti+1 RM–2 +...+ti+M–1 R0 xi+1 = ti+1 RM–1 +ti+2 RM–2 +...+ti+M R0 Can update "rolling" hash function in constant time! xi+1 = (xi – tiRM–1) R + ti+M current subtract
 multiply add new
 value leading digit by radix trailing digit (can precompute RM-1) i ... 2 current value 1 4 newvalue 4 4 - 4 1 1 3 4 1 5 1 5 1 5 0 0 1 5 5 6 9 2 9 2 9 2 0 0 9 2 7 ... 6 5 6 5 text 5 9 5 9 2 0 + 6 2 6 * 1 0 current value subtract leading digit multiply by radix add new trailing digit new value Key computation in Rabin-Karp substring search 76 (move right one position in the text) Rabin-Karp substring search example i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3141592653589793 0 3 % 997 = 3 Q 1 3 1 2 3 1 3 3 1 4 3 1 % 997 = (3*10 + 1) % 997 = 31 4 % 997 = (31*10 + 4) % 997 = 314 4 1 % 997 = (314*10 + 1) % 997 = 150 5 6 7 8 9 10 1 5 % 997 = (150*10 + 5) % 997 = 508 9 % 997 = ((508 + 3*(997 - 30))*10 + 9) % 997 = 201 9 2 % 997 = ((201 + 1*(997 - 30))*10 + 2) % 997 = 715 9 2 6 % 997 = ((715 + 4*(997 - 30))*10 + 6) % 997 = 971 4 1 5 1 4 4 1 5 RM R 2 6 5 % 997 = ((971 + 1*(997 - 30))*10 + 5) % 997 = 442 returni-M+1 = 6 2 6 5 3 5 % 997 = ((929 + 9*(997 - 30))*10 + 5) % 997 = 613 Rabin-Karp substring search example 1 5 5 9 match 9 2 6 5 3 % 997 = ((442 + 5*(997 - 30))*10 + 3) % 997 = 929 77 Rabin-Karp substring search example First R entries: Use Horner's rule. i 0 3 1 2 3 4 5 6 7 8 9101112131415 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 0 3 1 3 1 2 3 1 3 3 1 4 3 1 4 1 4 1 5 Horner's rule RM R 5 6 7 8 9 10 1 4 4 1 5 1 5 5 9 % 997 = ((508 + 3*(997 - 30))*10 + 9) % 997 = 201 2 % 997 = ((201 + 1*(997 - 30))*10 + 2) % 997 = 715 2 6 % 997 = ((715 + 4*(997 - 30))*10 + 6) % 997 = 971 2 6 5 % 997 = ((971 + 1*(997 - 30))*10 + 5) % 997 = 442 2 6 5 3 % 997 = ((442 + 5*(997 - 30))*10 + 3) % 997 = 929 2 6 5 3 5 % 997 = ((929 + 9*(997 - 30))*10 + 5) % 997 = 613 Rabin-Karp substring search example Q % 997 = (314*10 + 1) % 997 = 150 % 997 = (150*10 + 5) % 997 = 508 % 997 = 3 % 997 = (3*10 + 1) % 997 = 31 4 % 997 = (31*10 + 4) % 997 = 314 return i-M+1 = 6 1 5 9 9 9 9 match 77 Rabin-Karp substring search example First R entries: Use Horner's rule. Remaining entries: Use rolling hash (and % to avoid overflow). i 0 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 Q % 997 = (314*10 + 1) % 997 = 150 % 997 = (150*10 + 5) % 997 = 508 0 3 1 3 1 2 3 1 3 3 1 4 3 1 4 1 4 1 5 Horner's rule RM R % 997 = 3 % 997 = (3*10 + 1) % 997 = 31 4 % 997 = (31*10 + 4) % 997 = 314 5 6 7 8 9 10 1 4 4 1 5 1 5 5 9 % 997 = ((508 + 3*(997 - 30))*10 + 9) % 997 = 201 2 % 997 = ((201 + 1*(997 - 30))*10 + 2) % 997 = 715 2 6 % 997 = ((715 + 4*(997 - 30))*10 + 6) % 997 = 971 2 6 5 % 997 = ((971 + 1*(997 - 30))*10 + 5) % 997 = 442 2 6 5 3 % 997 = ((442 + 5*(997 - 30))*10 + 3) % 997 = 929 2 6 5 3 5 % 997 = ((929 + 9*(997 - 30))*10 + 5) % 997 = 613 Rabin-Karp substring search example return i-M+1 = 6 1 5 9 9 9 9 rolling hash match -30 (mod 997) = 997 – 30 10000 (mod 997) = 30 77 Rabin-Karp: Java implementation public class RabinKarp { private long patHash; private int M; private long Q; private int R; private long RM1; // pattern hash value // pattern length // modulus // radix // R^(M-1) % Q public RabinKarp(String pat) { M = pat.length(); R = 256; } private long hash(String key, int M) { /* as before */ } public int search(String txt) { /* see next slide */ } } Q = longRandomPrime(); RM1 = 1; for (int i = 1; i <= M-1; i++) RM1 = (R * RM1) % Q; patHash = hash(pat, M); a large prime (but avoid overflow) precompute RM – 1 (mod Q) 78 Rabin-Karp: Java implementation (continued) Monte Carlo version. Return match if hash match. public int search(String txt) { int N = txt.length(); int txtHash = hash(txt, M); if (patHash == txtHash) return 0; for (int i = M; i < N; i++) { check for hash collision
 using rolling hash function txtHash = (txtHash + Q - RM*txt.charAt(i-M) % Q) % Q; txtHash = (txtHash*R + txt.charAt(i)) % Q; if (patHash == txtHash) return i - M + 1; } return N; } Las Vegas version. Check for substring match if hash match;
 continue search if false collision. 79 Rabin-Karp analysis Theory. If Q is a sufficiently large random prime (about M N 2),
 then the probability of a false collision is about 1 / N. 80 Rabin-Karp analysis Theory. If Q is a sufficiently large random prime (about M N 2),
 then the probability of a false collision is about 1 / N. 
 Practice. Choose Q to be a large prime (but not so large to-cause overflow).
 Und- Las- Vests Alg er reasonable assumptions, probability of a collision is about 1 / Q. . use Rabin and u p o n - Karn to find hash matches, expected cost of alg ' . ..- O ( N t N Yo M ↳ each hash match, check ifsabstrinsoftext}"⇐n actually matches pattern ) Nt suppose KIM of Ntn- ml=out 80 Rabin-Karp analysis Theory. If Q is a sufficiently large random prime (about M N 2),
 then the probability of a false collision is about 1 / N. 
 Practice. Choose Q to be a large prime (but not so large to cause overflow).
 Under reasonable assumptions, probability of a collision is about 1 / Q. 
 Monte Carlo version. ・Always runs in linear time. ・Extremely likely to return correct answer (but not always!). 
 L・as Vegas version. ・Always returns correct answer. Extremely likely to run in linear time (but worst case is M N). 80 Rabin-Karp fingerprint search Advantages. ・Extends to 2d patterns. ・Extends to finding multiple patterns. 81 Rabin-Karp fingerprint search Advantages. ・Extends to 2d patterns. ・Extends to finding multiple patterns. 
 Disadvantages. ・Arithmetic ops slower than char compares. ・Las Vegas version requires backup. ・Poor worst-case guarantee. 81 Rabin-Karp fingerprint search Advantages. ・Extends to 2d patterns. ・Extends to finding multiple patterns. 
 Disadvantages. 
 
 Q. How would you extend Rabin-Karp to efficiently search for any
 one of P possible patterns in a text of length N ? ・Arithmetic ops slower than char compares. ・Las Vegas version requires backup. ・Poor worst-case guarantee. 81 Algorithms ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu 5.3 SUBSTRING SEARCH ‣ introduction ‣ brute force ‣ Knuth-Morris-Pratt ‣ Boyer-Moore ‣ Rabin-Karp Knuth-Morris-Pratt substring search Intuition. Suppose we are searching in text for pattern BAAAAAAAAA. i text after mismatch A B A A A A B A A A A A A A A A on sixth char brute-force backs up to try this and this and this and this and this B A A A A A A A A A pattern BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA Text pointer backup in substring searching but no backup is needed 18 Knuth-Morris-Pratt substring search I・ntuition. Suppose we are searching in text for pattern BAAAAAAAAA. Suppose we match 5 chars in pattern, with mismatch on 6th char. i text after mismatch A B A A A A B A A A A A A A A A on sixth char brute-force backs up to try this and this and this and this and this B A A A A A A A A A pattern BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA Text pointer backup in substring searching but no backup is needed 18 Knuth-Morris-Pratt substring search I・ntuition. Suppose we are searching in text for pattern BAAAAAAAAA. ・Suppose we match 5 chars in pattern, with mismatch on 6th char. We know previous 6 chars in text are BAAAAB. i assuming { A, B } alphabet text after mismatch A B A A A A B A A A A A A A A A on sixth char brute-force backs up to try this and this and this and this and this B A A A A A A A A A pattern BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA BAAAAAAAAA Text pointer backup in substring searching but no backup is needed 18 Knuth-Morris-Pratt substring search I・ntuition. Suppose we are searching in text for pattern BAAAAAAAAA. ・Suppose we match 5 chars in pattern, with mismatch on 6th char. ・We know previous 6 chars in text are BAAAAB. Don't need to back up text pointer! i text aftermismatch A B A A A A B onsixthchar B A A A A A assuming { A, B } alphabet brute-force backs up to try this and this and this A A A A A A A A A A B A A A A A B A A A A A A A A A A A A A A A A A A B A A A A B A A A B A A B A A A A A A A A A A A A A A pattern A A A A A A A A A A A and this and this but no backup is needed Text pointer backup in substring searching 18 Knuth-Morris-Pratt substring search I・ntuition. Suppose we are searching in text for pattern BAAAAAAAAA. ・Suppose we match 5 chars in pattern, with mismatch on 6th char. 
 
 
 
 
 
 
 
 
 
 
 text aftermismatch A B A A A onsixthchar B A A A brute-force backs up to try this i A B A A A A A A A A B A A A A A A A A A A A A A A pattern A A A A A A A A A A A ・We know previous 6 chars in text are BAAAAB. Don't need to back up text pointer! assuming { A, B } alphabet B A A B A B and this and this A A A A A AAAAA A A A A A A A A A A A A A and this and this but no backup is needed B A A A A A B A A A A A Text pointer backup in substring searching Knuth-Morris-Pratt algorithm. Clever method to always avoid backup. (!) 18 Deterministic finite state automaton (DFA) DFA is abstract string-searching machine. ・Finite number of states (including start and halt). ・Exactly one transition for each char in alphabet. ・Accept if sequence of transitions leads to halt state. j pat.charAt(j) dfa[][j] C graphical representation =, XCt FA B, A B A B A C A113151 B020404 000006 alphabet 012345 B,C A A B,C C A B 0A1B2A3B4A5C6 C B,C Constructing the DFA for KMP substring search for A B A B A C 19 Deterministic finite state automaton (DFA) DFA is abstract string-searching machine. ・Finite number of states (including start and halt). ・Exactly one transition for each char in alphabet. ・Accept if sequence of transitions leads to halt state. X internal representation X t j012345 j pat.ch0arAt1(j)2A3B4A5 B pat.charAt(j) A BAA1B1A3C 1 A 1 1 3 1 5 1 A variable c If in state j reading char c:
 1 5 dfa[][j]B 0 2 0 4 0 4 C 020404 if j is 6 halt and accept else move to state dfa[c][j] dfa[][j] B C00C0000006006 graphical representation B,CAA AA B,C A AB B 0A1B2A3B4A5C6 C0A1B2A3B4A5C6 B,C C C B,C B,C C B,C Constructing the DFA for KMP substring search for A B A B A C Constructing the DFA for KMP substring search for A B A B A C 19 Knuth-Morris-Pratt demo: DFA simulation A A B A C A -A B A B A C A A 012345 pat.charAt(j) dfa[][j] A ABABAC A113151 B020404 C000006 B, C A A B 0A1B2A3B4A5C6 C B, C C B, C 20 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 21 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 21 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 22 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 22 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 23 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 23 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 24 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 24 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA B, C A A B 012345 pat.charAt(j) dfa[][j] A ABABAC A113151 B020404 C000006 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 25 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA B, C A A B 012345 pat.charAt(j) dfa[][j] A ABABAC A113151 B020404 C000006 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 25 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 26 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 26 Knuth-Morris-Pratt demo: DFA simulation A A B A C A -A B A B A C A A pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 27 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 27 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 28 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 28 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 29 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 29 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA B, C A A B 012345 pat.charAt(j) dfa[][j] A ABABAC A113151 B020404 C000006 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 30 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA B, C A A B 012345 pat.charAt(j) dfa[][j] A ABABAC A113151 B020404 C000006 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 30 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 31 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 31 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 32 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 32 Knuth-Morris-Pratt demo: DFA simulation AABACAABABACAA pat.charAt(j) dfa[][j] A 012345 ABABAC A113151 B020404 C000006 B, C A A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C substring found B, C 33 Interpretation of Knuth-Morris-Pratt DFA Q. What is interpretation of DFA state after reading in txt[i]? A B, C A A B 0A1B2A3B4A5C6 C B, C C B, C 34 Interpretation of Knuth-Morris-Pratt DFA Q. What is interpretation of DFA state after reading in txt[i]? A. State = number of characters in pattern that have been matched. A B, C A A B 0A1B2A3B4A5C6 C B, C C B, C 34 Interpretation of Knuth-Morris-Pratt DFA Q. What is interpretation of DFA state after reading in txt[i]? A. State = number of characters in pattern that have been matched. 
 
 Ex. DFA is in state 3 after reading in txt[0..6]. i 012345678 012345 txt BCBAABACA pat ABABAC A B, C A A B 0A1B2A3B4A5C6 C B, C C B, C 34 Interpretation of Knuth-Morris-Pratt DFA Q. What is interpretation of DFA state after reading in txt[i]? A. State = number of characters in pattern that have been matched. 
 
 Ex. DFA is in state 3 after reading in txt[0..6]. i length of longest prefix of pat[] that is a suffix of txt[0..i] 012345678 012345 txt BCBAABACA pat ABABAC B, C A A B suffix of txt[0..6] prefix of pat[] A 0A1B2A3B4A5C6 C B, C C B, C 34 Knuth-Morris-Pratt substring search: Java implementation Key differences from brute-force implementation. ・Need to precompute dfa[][] from pattern. ・Text pointer i never decrements. public int search(String txt) { int i, j, N = txt.length(); for (i = 0, j = 0; i < N && j < M; i++) if (j == M) return i - M; else return N; } j = dfa[txt.charAt(i)][j]; no backup 35 Knuth-Morris-Pratt substring search: Java implementation K・ey differences from brute-force implementation. ・Need to precompute dfa[][] from pattern. Text pointer i never decrements. 
 
 
 
 
 
 
 
 
 
 Running time. no backup public int search(String txt) { int i, j, N = txt.length(); for (i = 0, j = 0; i < N && j < M; i++) if (j == M) return i - M; else return N; } j = dfa[txt.charAt(i)][j]; 35 Knuth-Morris-Pratt substring search: Java implementation Key differences from brute-force implementation. ・Need to precompute dfa[][] from pattern. ・Text pointer i never decrements. 
 
 
 
 
 
 
 
 
 
 R・unning time. Simulate DFA on text: at most N character accesses. no backup public int search(String txt) { int i, j, N = txt.length(); for (i = 0, j = 0; i < N && j < M; i++) if (j == M) return i - M; else return N; } j = dfa[txt.charAt(i)][j]; 35 Knuth-Morris-Pratt substring search: Java implementation Key differences from brute-force implementation. 
 
 
 
 
 
 
 
 
 no backup ・Need to precompute dfa[][] from pattern. ・Text pointer i never decrements. 
 public int search(String txt) { int i, j, N = txt.length(); for (i = 0, j = 0; i < N && j < M; i++) if (j == M) return i - M; else return N; } j = dfa[txt.charAt(i)][j]; R・unning time. ・Simulate DFA on text: at most N character accesses. Build DFA: how to do efficiently? [warning: tricky algorithm ahead] 35 Knuth-Morris-Pratt substring search: Java implementation K・ey differences from brute-force implementation. ・Need to precompute dfa[][] from pattern. ・Text pointer i never decrements. Could use input stream. public int search(In in) { int i, j; for (i = 0, j = 0; !in.isEmpty() && j < M; i++) if (j == M) return i - M; } else return NOT_FOUND; X j = dfa[in.readChar()][j]; no backup j012345 pat.charAt(j) A B A B A C A113151 dfa[][j]B 0 2 0 4 0 4 C000006 B,C AA AB 0A1B2A3B4A5C6 C B,C C B,C Constructing the DFA for KMP substring search for A B A B A C 36 Knuth-Morris-Pratt demo: DFA construction Include one state for each character in pattern (plus accept state). 012345 ABABAC A B C pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C 37 Knuth-Morris-Pratt demo: DFA construction Include one state for each character in pattern (plus accept state). 012345 ABABAC A B C dfa[][j] Constructing the DFA for KMP substring search for A B A B A C pat.charAt(j) 0123456 37 Knuth-Morris-Pratt demo: DFA construction Match transition. If in state j and next char c == pat.charAt(j), go to j+1. first j characters of pattern next char matches now first j+1 characters of
 have already been matched pat.charAt(j) A B C pattern have been matched 012345 ABABAC dfa[][j] Constructing the DFA for KMP substring search for A B A B A C 0123456 38 Knuth-Morris-Pratt demo: DFA construction Match transition. If in state j and next char c == pat.charAt(j), go to j+1. first j characters of pattern next char matches now first j+1 characters of
 have already been matched pat.charAt(j) pattern have been matched 012345 ABABAC dfa[][j] Constructing the DFA for KMP substring search for A B A B A C A135 B24 C6 0A1B2A3B4A5C6 38 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C j 012345 ABABAC A135 B24 C6 0A1B2A3B4A5C6 39 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C j 012345 ABABAC A135 B02 4 C06 B, C 0A1B2A3B4A5C6 39 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). mismatch t A- A in text by l shift i A 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C j A B A B A C A135 B02 4 C06 B, C 0A1B2A3B4A5C6 40 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). f. mismatch AC c shift by I ⇒ pat.charAt(j) dfa[][j] 012345 ABABAC A113 5 B02 4 C00 6 Constructing the DFA for KMP substring search for A B A B A C j B, C A 0A1B2A3B4A5C6 C 40 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). ABB match at all £ of 012345 BB £ is match at all pat.charAt(j) A B A B A C A113 5 dfa[][j]B 0 2 4 C00 6 ¢ Constructing the DFA for KMP substring search for A B A B A C B, C A j 0A1B2A3B4A5C6 C 41 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C ABABAC A113 5 B0204 C000 6 B, C A j 0A1B2A3B4A5C6 C B, C 41 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C ABABAC A113 5 B0204 C000 6 B, C A j 0A1B2A3B4A5C6 C B, C 42 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A j 012345 ABABAC A11315 B0204 C0000 6 0A1B2A3B4A5C6 C B, C C 42 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). ¢ mismatch B t BA BB A- BAD ¥. so → is → pat.charAt(j) Constructing the DFA for KMP substring search for A B A B A C B, C A A j 012345 A B A B A C A11315 dfa[][j] B 0 2 0 4 C0000 6 0A1B2A3B4A5C6 C B, C C 43 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). 012345 dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A ABABAC A11315 B02040 C000006 pat.charAt(j) j 0A1B2A3B4A5C6 C B, C C B, C 43 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). f- mismatch A- BABA t. BABAI ↳AS B - match A B A B A C A11315 B02040 C000006 of first 4 characters of pattern pat.charAt(j) dfa[][j] I 0 1 2 3 4 5 Constructing the DFA for KMP substring search for A B A B A C B, C A A j 0A1B2A3B4A5C6 C B, C C B, C 44 Knuth-Morris-Pratt demo: DFA construction Mismatch transition: back up if c != pat.charAt(j). pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A A -- . 012345 ABABAC A113151 B020404 C000006 exercise - j B 0A1B2A3B4A5C6 C B, C C B, C 44 Knuth-Morris-Pratt demo: DFA construction pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C A B, C A A 0A1B2A3B4A5C6 C B, C C 012345 ABABAC A113151 B020404 C000006 B B, C 45 How to build DFA from pattern? Include one state for each character in pattern (plus accept state). 012345 ABABAC A B C pat.charAt(j) dfa[][j] 0123456 46 How to build DFA from pattern? Match transition. If in state j and next char c == pat.charAt(j), go to j+1. first j characters of pattern next char matches now first j +1 characters of
 have already been matched pattern have been matched 012345 ABABAC A135 B24 C6 pat.charAt(j) dfa[][j] 0A1B2A3B4A5C6 47 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. j pat.charAt(j) j 012345 ABABAC B, C A A 0A1B2A3B4A5C6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) j pat.charAt(j) j 012345 ABABAC B, C A A 0A1B2A3B4A5C6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Ex. dfa['A'][5] = 1; Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) j pat.charAt(j) j 012345 ABABAC B, C A A 0A1B2A3B4A5C6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Ex. dfa['A'][5] = 1; simulate BABA; j pat.charAt(j) 012345 ABABAC j Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) B, C A A simulation of BABA 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Ex. dfa['A'][5] = 1; simulate BABA; take transition 'A' j pat.charAt(j) 012345 ABABAC j Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) B, C A A simulation of BABA 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Ex. dfa['A'][5] = 1; simulate BABA; take transition 'A' = dfa['A'][3] B, C A A j pat.charAt(j) 012345 ABABAC j Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) A simulation of BABA 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Ex. dfa['A'][5] = 1; simulate BABA; take transition 'A' = dfa['A'][3] B, C A A Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) dfa['B'][5] = 4 j pat.charAt(j) 012345 ABABAC j A simulation of BABA 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Ex. dfa['A'][5] = 1; simulate BABA; take transition 'A' = dfa['A'][3] dfa['B'][5] = 4 Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) simulate BABA; j pat.charAt(j) 012345 ABABAC j B, C A A A simulation of BABA 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Ex. dfa['A'][5] = 1; simulate BABA; take transition 'A' = dfa['A'][3] dfa['B'][5] = 4 Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) simulate BABA; take transition 'B' j pat.charAt(j) 012345 ABABAC j B, C A A A simulation of BABA 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Ex. dfa['A'][5] = 1; simulate BABA; take transition 'A' = dfa['A'][3] dfa['B'][5] = 4 Simulate pat[1..j-1] on DFA and take transition c. still under construction (!) simulate BABA; take transition 'B' = dfa['B'][3] j pat.charAt(j) 012345 ABABAC B, C A A A simulation of BABA j B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. 
 To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Seems to require j steps. still under construction (!) Ex. dfa['A'][5] = 1; simulate BABA; take transition 'A' = dfa['A'][3] dfa['B'][5] = 4 simulate BABA; take transition 'B' = dfa['B'][3] j pat.charAt(j) 012345 ABABAC B, C A A A simulation of BABA j B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 48 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. B, C A A X 012345 ABABAC j 0A1B2A3B4A5C6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; B, C A A X 012345 ABABAC j 0A1B2A3B4A5C6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, 012345 ABABAC j B, C A A X 0A1B2A3B4A5C6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' 012345 ABABAC j B, C A A X 0A1B2A3B4A5C6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] B, C A A A X 012345 ABABAC j 0A1B2A3B4A5C6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] B, C A A dfa['B'][5] = 4 A X 012345 ABABAC j 0A1B2A3B4A5C6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] dfa['B'][5] = 4 from state X, 012345 ABABAC j B, C A A A X 0A1B2A3B4A5C6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] dfa['B'][5] = 4 from state X, take transition 'B' 012345 ABABAC j B, C A A A X 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] dfa['B'][5] = 4 from state X, take transition 'B' = dfa['B'][X] 012345 ABABAC B, C A A A X j B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] dfa['B'][5] = 4 X' = 0 from state X, take transition 'B' = dfa['B'][X] 012345 ABABAC B, C A A A X j B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] dfa['B'][5] = 4 X' = 0 from state X, from state X, take transition 'B' = dfa['B'][X] 012345 ABABAC B, C A A A X j B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] dfa['B'][5] = 4 X' = 0 from state X, take transition 'C' from state X, take transition 'B' = dfa['B'][X] 012345 ABABAC B, C A A A X j B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 49 How to build DFA from pattern? Mismatch transition. If in state j and next char c != pat.charAt(j),
 then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DFA and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa['A'][5] = 1; from state X, take transition 'A' = dfa['A'][X] dfa['B'][5] = 4 X' = 0 from state X, take transition 'C' = dfa['C'][X] from state X, take transition 'B' = dfa['B'][X] 012345 ABABAC B, C A A A X j B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 49 Knuth-Morris-Pratt demo: DFA construction in linear time Include one state for each character in pattern (plus accept state). 012345 ABABAC A B C pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C 50 Knuth-Morris-Pratt demo: DFA construction in linear time Include one state for each character in pattern (plus accept state). 012345 ABABAC A B C dfa[][j] Constructing the DFA for KMP substring search for A B A B A C pat.charAt(j) 0123456 50 Knuth-Morris-Pratt demo: DFA construction in linear time Match transition. For each state j, dfa[pat.charAt(j)][j] = j+1. dfa[][j] Constructing the DFA for KMP substring search for A B A B A C first j characters of pattern have already been matched pat.charAt(j) now first j+1 characters of
 pattern have been matched 012345 ABABAC A B C 0123456 51 Knuth-Morris-Pratt demo: DFA construction in linear time Match transition. For each state j, dfa[pat.charAt(j)][j] = j+1. first j characters of pattern now first j+1 characters of
 have already been matched pattern have been matched pat.charAt(j) 012345 ABABAC dfa[][j] Constructing the DFA for KMP substring search for A B A B A C A135 B24 C6 0A1B2A3B4A5C6 51 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. set dfa[c][0] = 0. For state 0 and char c != pat.charAt(j),
 pat.charAt(j) dfa[][j] 012345 ABABAC A135 B24 C6 Constructing the DFA for KMP substring search for A B A B A C j 0A1B2A3B4A5C6 52 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. set dfa[c][0] = 0. For state 0 and char c != pat.charAt(j),
 012345 pat.charAt(j) dfa[][j] ABABAC A135 B02 4 C06 Constructing the DFA for KMP substring search for A B A B A C j B, C 0A1B2A3B4A5C6 52 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C j X = simulation of empty string 012345 ABABAC A1 3 5 B02 4 C0 6 B, C 0A1B2A3B4A5C6 X 53 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. X = simulation of empty string 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C j ABABAC A113 5 B02 4 C00 6 B, C A 0A1B2A3B4A5C6 C X 53 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. X = simulation of empty string 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C j ABABAC A113 5 B02 4 C00 6 B, C A 0A1B2A3B4A5C6 C X 53 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. X = simulation of B 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C ABABAC A113 5 B02 4 C 0 0 6 B, C A j 0A1B2A3B4A5C6 C X 54 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. X = simulation of B 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C ABABAC A113 5 B0204 C000 6 B, C A j 0A1B2A3B4A5C6 C X B, C 54 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. X = simulation of B 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C ABABAC A113 5 B0204 C000 6 B, C A j 0A1B2A3B4A5C6 C X B, C 54 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. X = simulation of B A 012345 pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C ABABAC A113 5 B 0 2 0 4 C 0 0 0 6 B, C A j 0A1B2A3B4A5C6 C B, C X 55 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A j X = simulation of B A 012345 ABABAC A11315 B 0 2 0 4 C0000 6 0A1B2A3B4A5C6 C B, C X C 55 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A j X = simulation of B A 012345 ABABAC A11315 B 0 2 0 4 C0000 6 0A1B2A3B4A5C6 C B, C X C 55 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. X = simulation of B A B 012345 dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A ABABAC A11315 B 0 2 0 4 C 0 0 0 0 6 pat.charAt(j) j 0A1B2A3B4A5C6 C B, C X C 56 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) X = simulation of B A B 012345 dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A ABABAC A11315 B02040 C000006 j 0A1B2A3B4A5C6 C B, C X C B, C 56 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) X = simulation of B A B 012345 dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A ABABAC A11315 B02040 C000006 j 0A1B2A3B4A5C6 C B, C X C B, C 56 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C B, C A A X = simulation of B A B A 012345 ABABAC A 1 1 3 1 5 B 0 2 0 4 0 C000006 j 0A1B2A3B4A5C6 C B, C X C B, C 57 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C A B, C A A 0A1B2A3B4A5C6 X = simulation of B A B A 012345 ABABAC A113151 B020404 C000006 B j C B, C X C B, C 57 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C A B, C A A 0A1B2A3B4A5C6 X = simulation of B A B A 012345 ABABAC A113151 B020404 C000006 B j C B, C X C B, C 57 Knuth-Morris-Pratt demo: DFA construction in linear time Mismatch transition. For each state j and char c != pat.charAt(j), set dfa[c][j] = dfa[c][X]; then update X = dfa[pat.charAt(j)][X]. pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C A B, C A A 0A1B2A3B4A5C6 C B, C C X = simulation of B A B A C 012345 ABABAC A113151 B020404 C000006 B j X B, C 58 Knuth-Morris-Pratt demo: DFA construction in linear time pat.charAt(j) dfa[][j] Constructing the DFA for KMP substring search for A B A B A C A B, C A A 0A1B2A3B4A5C6 C B, C C 012345 ABABAC A113151 B020404 C000006 B B, C 59 Constructing the DFA for KMP substring search: Java implementation For each state j: ・Copy dfa[][X] to dfa[][j] for mismatch case. ・Set dfa[pat.charAt(j)][j] to j+1 for match case. ・Update X. public KMP(String pat) { this.pat = pat; M = pat.length(); dfa = new int[R][M]; dfa[pat.charAt(0)][0] = 1; for (int X = 0, j = 1; j < M; j++) { } } for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][X]; dfa[pat.charAt(j)][j] = j+1; X = dfa[pat.charAt(j)][X]; copy mismatch cases set match case update restart state 60 Constructing the DFA for KMP substring search: Java implementation F・or each state j: ・Copy dfa[][X] to dfa[][j] for mismatch case. ・Set dfa[pat.charAt(j)][j] to j+1 for match case. Update X. 
 
 
 
 
 
 
 
 
 
 
 copy mismatch cases set match case update restart state public KMP(String pat) { this.pat = pat; M = pat.length(); dfa = new int[R][M]; dfa[pat.charAt(0)][0] = 1; for (int X = 0, j = 1; j < M; j++) { } } for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][X]; dfa[pat.charAt(j)][j] = j+1; X = dfa[pat.charAt(j)][X]; Running time. M character accesses (but space/time proportional to R M). 60 KMP substring search analysis Proposition. KMP substring search accesses no more than M + N chars
 to search for a pattern of length M in a text of length N. 61 KMP substring search analysis Proposition. KMP substring search accesses no more than M + N chars
 to search for a pattern of length M in a text of length N. 
 Pf. Each pattern char accessed once when constructing the DFA;
 each text char accessed once (in the worst case) when simulating the DFA. 61 KMP substring search analysis Proposition. KMP substring search accesses no more than M + N chars
 to search for a pattern of length M in a text of length N. 
 Pf. Each pattern char accessed once when constructing the DFA;
 each text char accessed once (in the worst case) when simulating the DFA. 
 
 Proposition. KMP constructs dfa[][] in time and space proportional to R M. 61 KMP substring search analysis Proposition. KMP substring search accesses no more than M + N chars
 to search for a pattern of length M in a text of length N. 
 Pf. Each pattern char accessed once when constructing the DFA;
 each text char accessed once (in the worst case) when simulating the DFA. 
 
 internal representation 012345 next[j] 0 0 0 0 0 3 j pat.charAt(j) A B A B A C Proposition. KMP constructs dfa[][] in time and space proportional to R M. 
 Larger alphabets. Improved version of KMP constructs nfa[] in time and space proportional to M. graphical representation mismatch transition (back up at least one state) 0A1B2A3B4A5C6 KMP NFA for ABABAC NFA corresponding to the string A B A B A C 61 Knuth-Morris-Pratt: brief history ・Independently discovered by two theoreticians and a hacker. 62 Knuth-Morris-Pratt: brief history ・Independently discovered by two theoreticians and a hacker. – Knuth: inspired by esoteric theorem, discovered linear algorithm Don Knuth 62 Knuth-Morris-Pratt: brief history ・Independently discovered by two theoreticians and a hacker. – Knuth: inspired by esoteric theorem, discovered linear algorithm – Pratt: made running time independent of alphabet size Don Knuth Vaughan Pratt 62 Knuth-Morris-Pratt: brief history ・Independently discovered by two theoreticians and a hacker. – Knuth: inspired by esoteric theorem, discovered linear algorithm – Pratt: made running time independent of alphabet size – Morris: built a text editor for the CDC 6400 computer Don Knuth Jim Morris Vaughan Pratt 62 Knuth-Morris-Pratt: brief history ・Independently discovered by two theoreticians and a hacker. – Knuth: inspired by esoteric theorem, discovered linear algorithm – Pratt: made running time independent of alphabet size ・– Morris: built a text editor for the CDC 6400 computer Theory meets practice. SIAM J. COMPUT. Vol. 6, No. 2, June 1977 FAST PATTERN MATCHING IN STRINGS* DONALD E. KNUTHf, JAMES H. MORRIS, JR.:l: AND VAUGHAN R. PRATT Abstract. An algorithm is presented which finds all occurrences of one. given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems. A theoretical application of the algorithm shows that the set of concatenations of even palindromes, i.e., the language {can}*, can be recognized in linear time. Other algorithms which run even faster on the average are also considered. Key words, pattern, string, text-editing, pattern-matching, trie memory, searching, period of a string, palindrome, optimum algorithm, Fibonacci string, regular expression Text-editing programs are often required to search through a string of characters looking for instances of a given "pattern" string; we wish to find all positions, or perhaps only the leftmost position, in which the pattern occurs as a contiguoussubstringofthetext.Forexample,ca enarycontainsthepattern en,butwedonotregardcanaryasasubstring. Don Knuth Jim Morris Vaughan Pratt The obvious way to search for a matching pattern is to try searching at every starting position of the text, abandoning the search as soon as an incorrect 62