COMP90038 Algorithms and Complexity
Time/Space Tradeoffs—String Search Revisited
Olya Ohrimenko
(Slides from Harald Søndergaard)
Lecture 16
Semester 2, 2021
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 1 / 27
Weekly update
Olya’s consultation session shortly after this lecture Next week: non-teaching week
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 2 / 27
Last Lecture: Revision
Balanced trees (AVL, 2-3)
For AVL trees, balance factor at a node: height of the left subtree minus height of the right subtree. Empty subtree has height -1.
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 3 / 27
Spending Space to Save Time
Often we can find ways of decreasing the time required to solve a problem, by using additional memory in a clever way.
For example, in Lecture 6 we considered the simple recursive way of finding the nth Fibonacci number and discovered that the algorithm uses exponential time.
However, suppose the same algorithm uses a table to tabulate the function Fib as we go: As soon as an intermediate result Fib(i) has been found, it is not simply returned to the caller; the value is first placed in slot i of a table (an array). Each call to Fib first looks in this table to see if the required value is there, and only if it is not, the usual recursive process kicks in.
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 4 / 27
Fibonacci Numbers with Tabulation
We assume that, from the outset, all entries of the table F are 0.
function Fib(n)
if n = 0 or n = 1 then
return 1 result ← F[n]
if result = 0 then
result ← Fib(n − 1) + Fib(n − 2) F [n] ← result
return result
(I show this code just so that you can see the principle; in Lecture 6 we already discovered a different linear-time algorithm, so here we don’t really need tabulation.)
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 5 / 27
Sorting by Counting
Suppose we need to sort large arrays, but we know that they will hold keys taken from a small, fixed set (so lots of duplicate keys).
For example, suppose all keys are single digits: 633810879253531876512153
Then we can, in a single linear scan, count the occurrences of each key in array A and store the result in a small table:
key 0123456789 Occ 1 4 2 5 0 4 2 2 3 1
Now use a second linear scan to make the counts cumulative: key 0 1 2 3 4 5 6 7 8 9
Occ 1 5 712121618202324
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 6 / 27
Sorting by Counting
We can now create a sorted array S[1]..S[n] of the items by simply slotting items into pre-determined slots in S (a third linear scan).
633810879253531876512153 key 0 1 2 3 4 5 6 7 8 9
Occ 1 5 712121618202324
Place the last record (with key 3) in S[12] and decrement Occ[3]
(so that the next ‘3’ will go into slot 11), and so on.
for i ← n to 1 do S[Occ[A[i]]] ← A[i] Occ[A[i]] ← Occ[A[i]] − 1
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 7 / 27
Sorting by Counting (Example)
key 0 1 2 3 4 5 6 7 8 9 Occ 1 5 712121618202324
for i ← n to 1 do S[Occ[A[i]]] ← A[i] Occ[A[i]] ← Occ[A[i]] − 1
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 8 / 27
Sorting by Counting
Note that this gives us a linear-time sorting algorithm (for the cost of some extra space).
However, it only works in situations where we have a small range of keys, known in advance.
The method never performs a key-to-key comparison.
The time complexity of key-comparison based sorting has been
proven to be in Ω(n log n).
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 9 / 27
String Matching Revisited
String Matching Applications: text editors, natural language processors, virus scanning, websearch engines.
In Lecture 5 we saw a brute-force approach to string search. “Strings” are usually built from a small, pre-determined alphabet.
Most of the better algorithms rely on some pre-processing of strings before the actual matching process starts.
The pre-processing involves the construction of a small table (of predictable size).
Levitin refers to this as “input enhancement”.
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 10 / 27
Horspool’s String Search Algorithm
Comparing from right to left in the pattern. Very good for random text strings.
STRINGSEARCHEXAMP EXAM
We can do better than just observing a mismatch here.
Because the pattern has no occurrence of I, we might as well slide it 4 positions along.
This decision is based only on knowing the pattern.
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 11 / 27
Horspool’s String Search Algorithm
STRINGSEARCHEXAMP EXAM
EXAM
Here we can slide the pattern 3 positions, because the last occurrence
of E in the pattern is its first position. STRINGSEARCHEXAMP
EXAM
EXAM EXAM
EXAM EXAM
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 12 / 27
Horspool’s String Search Algorithm
What happens when we have longer partial matches?
SEARCHING BIRCH
Char Shift
A5 B4 C1 . .
H5 I3 . .
R2 S5 . .
Z5
BIRCH
The shift is determined by the last character in the
pattern.
BIRCH
Algorithms and Complexity (Sem 2, 2021)
Time/Space Tradeoffs
⃝c University of Melbourne
13 / 27
Horspool’s String Search Algorithm
What happens when we have longer partial matches?
↓
SEARCHING BIRCH
Char Shift
A5 B4 C1 . .
H5 I 3 . .
R2 S5 . .
Z5
BIRCH
The shift is determined by the last character in the
pattern.
Note that this is the same as the character in the text that we first matched against. Hence the skip is always determined by that character, whether it matched or not.
BIRCH
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 14 / 27
Horspool’s String Search Algorithm
Building (calculating) the shift table is easy. We assume indices start from 0.
Let alphasize be the size of the alphabet.
function FindShifts(P[·],m) ⊲ Pattern P has length m for i ← 0 to alphasize − 1 do
Shift[i] ← m
for j ← 0 to m − 2 do
Shift[P[j]] ← m − (j + 1)
Algorithms and Complexity (Sem 2, 2021) Time/Space Tradeoffs ⃝c University of Melbourne 15 / 27
Horspool’s String Search Algorithm
function Horspool(P[·],m,T[·],n) FindShifts(P , m)
i←m−1
while i