Algorithms and Data, Summer 1, 2016 Problem Set 1 Answers
Part I
1. A is O(B), Ω(B), and Θ(B). 2. A is o(B) and O(B).
3. A is o(B) and O(B).
4. A is o(B) and O(B).
5. A is o(B) and O(B).
Part II
1. This is O(N2) because we do Insertion Sort (O(N2)), then two linear-time passes through the data. The sum of an O(N2) function and two linear functions is still O(N2).
2. This is O(log N). The time to repeatedly halve the region where the battleship could be in any particular dimension is O(log N), making the time to reduce all uncertainty in both dimensions O(log N + log N) = O(log N).
3. Just by eyeballing the for loops, we have four nested for loops that could all be N in the worst case, so we could say this is O(N4) and not be wrong. It’s a “reasonable upper bound” and therefore gets full credit.
For a little more rigorous analysis, we could could count the comparison operations in each iteration of the outermost loop. In the first iteration of the algorithm, the length is all of each string, there are no possible shifts, and the total comparisons are just all N characters if all characters match. In the next iteration, we have (N-1) comparisons done for each setting, and two possible places in both string A and B for each substring (index 0 or index 1). In the next iteration, the string being compared is length (N-2), and there are 3*3 possible places to put the two substrings in A and B. The total work done is:
N + 4(N-1) + 9(N-2) + … + N2(1)
=Σi2(N-i+1)=Σi2N-Σi3 +Σi2
= N2(N+1)(2N+1)/6 – (N(N-1)/2)2 + N(N+1)(2N+1)/6
Gathering just the highest order terms, we have N4/3 – N4/4 plus some lower order terms. Still O(N4). So, while such detailed analysis sometimes gives a better bound, it doesn’t here.
But there’s one more kind analysis we could try, and that’s thinking more deeply about the worst case, rather than treating all the bad things that could happen as if they could all happen simultaneously. The worst case must be that there are no matches of characters at all, for any window. Suppose not; then the best match must be for some optimal length of window L*. This must be for the last possible windows within A and B for window size L*, because why not wait until the last possible minute for a match; but then we could achieve at least as many operations by making the first character of the window not actually match, reducing the operations by L*-1 in this round but gaining at least that many operations in the next round (we’re deferring the final match comparisons, plus looking for more strings). This logic inductively shows that it’s always better to not match in the current round if we’re trying to maximize operations, and therefore, the worst case is to not match at all. And in this case, the innermost loop never does more than 1 operation, not N. This results in a tighter bound of O(N3) by just ignoring the innermost loop, which a more precise sum doesn’t change:
1+4+9+…+N2 =Σi2 =N(N+1)(2N+1)/6=O(N3).
4. There are 2N possible subsets, and we sum at most N numbers with each iteration, so this is O(N2N).
5. The innermost loop to check which of two people is preferred for b’ is O(N), to iterate down the preference list of b’. Doing this for every person whom a particular person a prefers to their current match is O(N2), since a might prefer everybody else in group B to their current match. To do this check for every person a is O(N3). Finally, to do this work for every possible matching, and thus every permutation of the members of group B, takes O(N3N!) operations.
Part III
This is also known as the “Dutch National Flag” problem, so named because the problem was originally to sort the colors of the Dutch national flag instead of numbers. Now that you know that, you can find the algorithm on Wikipedia.
https://en.wikipedia.org/wiki/Dutch_national_flag_problem
The loop invariants are that the numbers before i in the array are all 1’s, the numbers from i to j-1 are all 2’s, and the numbers from n+1 to the end of the array are all 3’s. This loop invariant starts true because there are no numbers in these ranges to begin with. Then, if we assume that it’s true at the previous iteration of the algorithm, each step one of three things happens:
A[j] is a 1: Swap with the earliest 2 and advance i and j. The loop invariants still hold because the new 1 is just before i and we got a 2 just before j.
A[j] is a 2: Advance j to move it past this 2, and there are still all 2’s from i to j-1.
A[j] is a 3: Swap with whatever is at n and move n closer by 1. Everything past n is still a 3.
So if our loop invariants start true, we’ve just argued that at every step, they remain true for the regions we affected (and they obviously remain true for the regions we didn’t touch).
At each iteration of the algorithm, the distance between j and n decreases by 1, either because we advanced j or decreased n, and the algorithm terminates when they meet. Since they start N apart, this is an O(N) algorithm.
Part IV
Here are some times I got for these problems on the Macbook Air I use for lecture. I implemented Fibonacci with BigIntegers, but longs would have sufficed for the N I gave you. Note that random noise from other things going on in the computer can make measurements fluctuate a little, since our timing uses the system clock instead of just counting time spent on the thread. The main purpose of the exercise was not to arrive at specific times, but to show how asymptotic running times really do affect performance in the real world.
N
LOL
My2DArray
10
78ms
43ms
100
149ms
45ms
1000
3.126s
72ms
5000
1m, 11.138s
169ms
N
LinearFib
ExpFib
40
1ms
3.438s
42
1ms
8.687s
44
2ms
22.357s
46
1ms
58.082s
48
2ms
2m, 39.147s
N
InsertionSort
QuickSort
1000
6ms
2ms
10000
44ms
11ms
100000
5.294s
82ms
1000000
36m, 8.128s
697ms
Interestingly, it does look like there is a tiny bit of overhead that increases as we increase the size of our 2D array, despite the array accesses being constant time in theory. But that time doesn’t change nearly as much as the time to access the list of lists a million times.