CS计算机代考程序代写 algorithm COMP20007 Design of Algorithms

COMP20007 Design of Algorithms
Input Enhancement Part 2: String Searching
Daniel Beck
Lecture 18
Semester 1, 2020
1

String Search – Recap
2

String Search – Recap
• Goal: given text of size n, find a string (pattern) of size m.
2

String Search – Recap
• Goal: given text of size n, find a string (pattern) of size m.
• Brute force algorithm: O(m × n).
2

String Search – Brute Force
JIM SAW ME IN A BARBERSHOP BARBER
3

Input Enhancement
• Longer shifts can be made if we know statistics about the pattern.
4

Input Enhancement
• Longer shifts can be made if we know statistics about the pattern.
• Same way we could sort arrays more efficiently by knowing statistics about the array (Counting Sort).
4

Input Enhancement
• Longer shifts can be made if we know statistics about the pattern.
• Same way we could sort arrays more efficiently by knowing statistics about the array (Counting Sort).
• The Horspool’s algorithm use this idea to make string search faster.
4

Input Enhancement
• Longer shifts can be made if we know statistics about the pattern.
• Same way we could sort arrays more efficiently by knowing statistics about the array (Counting Sort).
• The Horspool’s algorithm use this idea to make string search faster.
• Key idea: scan the text from left to right but scan the pattern from right to left.
4

Longer shifts – Case 1
The last character is not in the pattern.
JIM SAW ME IN A BARBERSHOP BARBER
Shift the whole pattern.
5

Longer shifts – Case 2
The last character does not match but it is in the pattern.
JIM SAW ME IN A BARBERSHOP BARBER
Shift the pattern until the last occurence of the character.
6

Longer shifts – Case 3
The last character matches but one of the m − 1 characters does not match and the last character is unique.
JIM SAW ME IN A BARBERSHOP SEESAW
Shift the whole pattern.
7

Longer shifts – Case 4
The last character matches but one of the m − 1 characters does not match and the last character is not uniques
JIM SAW ME IN A BARBERSHOP REORDER
Shift the pattern until the last occurence of the character.
8

Horspool – Preprocessing
• The number of allowed shifts depends on the character type only.
9

Horspool – Preprocessing
• The number of allowed shifts depends on the character type only.
• Horspool builds a dictionary with all characters in the alphabet and their corresponding allowed skips for a pattern.
9

Horspool – Preprocessing
• The number of allowed shifts depends on the character type only.
• Horspool builds a dictionary with all characters in the alphabet and their corresponding allowed skips for a pattern.
characterc A B C D E F … R … Z shiftt(c) 42661663666
9

Horspool – FindShifts
function FindShifts(P[0..m − 1]) for i ← 0 to a do
Shift[i] ← m
for j ← 0 to m − 2 do Shift[P[j]] ← m − (j + 1)
10

Horspool – Algorithm
function Horspool(P [0..m − 1], T [0..n − 1]) FindShifts(P )
i←m−1
while i < n do k←0 while k < m and P[m − 1 − k] = T[i − k] do k←k+1 if k = m then return i −m+1 else i ← i + Shift [T [i ]] return −1 � We have a match � Start of the match � Slide the pattern along 11 Horspool - Properties • Worst-case still O(m × n). • For random strings, it’s linear and faster in practice compared to the brute force version. 12 Other String Search algorithms • Boyer-Moore: extends Horspool to allow shifts based on suffixes. 13 Other String Search algorithms • Boyer-Moore: extends Horspool to allow shifts based on suffixes. • Knuth-Morris-Pratt: also preprocess the patten but builds a finite-state automaton. 13 Other String Search algorithms • Boyer-Moore: extends Horspool to allow shifts based on suffixes. • Knuth-Morris-Pratt: also preprocess the patten but builds a finite-state automaton. • Rabin-Karp: uses hash functions to filter negative matches. • Needs a rolling hash function to be efficient. 13 Summary and Practical Uses • String search algorithms can be sped up by input enhancement. 14 Summary and Practical Uses • String search algorithms can be sped up by input enhancement. • Allocates extra memory to preprocess the pattern. 14 Summary and Practical Uses • String search algorithms can be sped up by input enhancement. • Allocates extra memory to preprocess the pattern. • Horspool uses a dictionary of shifts. 14 Summary and Practical Uses • String search algorithms can be sped up by input enhancement. • Allocates extra memory to preprocess the pattern. • Horspool uses a dictionary of shifts. • The Boyer-Moore extension is one of the most used string searching algorithms, for instance in the “grep” Linx command line tool. 14 Summary and Practical Uses • String search algorithms can be sped up by input enhancement. • Allocates extra memory to preprocess the pattern. • Horspool uses a dictionary of shifts. • The Boyer-Moore extension is one of the most used string searching algorithms, for instance in the “grep” Linx command line tool. Next week: trade memory for speed by storing intermediate solutions. 14