CS 61B Disjoint Sets and Asymptotics
Spring 2021
1 Disjoint Sets, a.k.a. Union Find
Discussion 6: February 22, 2021
In lecture, we discussed the Disjoint Sets ADT. Some authors call this the Union Find ADT. Today, we will use union find terminology so that you have seen both.
(a) What are the last two improvements (out of four) that we made to our naive implementation of the Union Find ADT during lecture 14 (Monday’s lecture)?
1. Improvement 1:
2. Improvement 2:
The naive implementation was maintaining a record of every single connection. Improvements made were:
– Keeping track of sets rather than connections (QuickFind)
– Tracking set membership by recording parent not set # (QuickUnion)
– Union by Size (WeightedQuickUnion)
– Path Compression (WeightedQuickUnionWithPathCompression)
We will focus on attention on the last two, union by size and path compression.
(b) Assume we have nine items, represented by integers 0 through 8. All items are initially uncon- nected to each other. Draw the union find tree, draw its array representation after the series of connect() and find() operations, and write down the result of find() operations using Weight- edQuickUnion without path compression. Break ties by choosing the smaller integer to be the root.
Note: find(x) returns the root of the tree for item x.
connect(2, 3);
connect(1, 2);
connect(5, 7);
connect(8, 4);
connect(7, 2);
find(3);
connect(0, 6);
connect(6, 4);
connect(6, 3);
find(8);
find(6);
find() returns 2, 2, 2 respectively.
2 Disjoint Sets and Asymptotics
The array is [2, 2, -9, 2, 0, 2, 0, 5, 4].
2 0135
467 8
(c) Extra: Repeat the above part, using WeightedQuickUnion with Path Compression. find() returns 2, 2, 2 respectively.
The array is [2, 2, -9, 2, 2, 2, 2, 5, 2]. 2
0134568 7
(d) What is the runtime for ”connect” and ”isConnected” operations using our Quick Find, Quick Union, and Weighted Quick Union ADTs? Can you explain why the Weighted Quick union has better runtimes for these operations than the regular Quick Union?
Runtime comparisons
OPERATION Connect IsConnected
Quick Find O(N)
O(1)
Quick Union O(N)
O(N)
WQU O(logN) O(logN)
The Weighted Quick Union has better run times because by picking the smaller tree to be the child, we can achieve shorter overall heights in our underlying tree. This means that for any child, traversing up the tree to find it’s root, or it’s set representative, is limited to this shortened tree height. For both our standard Quick Union and Weighted Quick Union, the time it takes to connect two items depends on this height, as it requires checking the roots of the current items and then changing one to be the other (if they’re not already connected). Then the time it takes to find the root of the current element is proportional to the time it takes to connect two items. Similarly, the time it takes to check if two items are connected relies on finding the roots of the current elements.
Not included in this chart is the WQU with path compression. While the proof for it’s runtime is out of scope for this class, it achieves amortized constant runtime for both of Connect and isConnected.
2 Asymptotics
(a) Order the following big-O runtimes from smallest to largest.
O(log n), O(1), O(nn), O(n3), O(n log n), O(n), O(n!), O(2n), O(n2 log n)
(b)
O(1) ⊂ O(logn) ⊂ O(n) ⊂ O(nlogn) ⊂ O(n2 logn) ⊂ O(n3) ⊂ O(2n) ⊂ O(n!) ⊂ O(nn)
Are the statements in the right column true or false? If false, correct the asymptotic notation (Ω(·), Θ(·), O(·)). Be sure to give the tightest bound. Ω(·) is the opposite of O(·), i.e. f(n) ∈ Ω(g(n)) ⇐⇒ g(n) ∈ O(f(n)). Hint: Make sure to simplify the runtimes first.
(c)
i) False. Although this bound is technically correct, it is NOT the tightest bound. Θ(·) is a better bound.
ii) False, O(·). Even though n3 is strictly worse than n2, n2 is still in O(n3) because n2 is always as good as or better than n3 and can never be worse.
iii) False. Again, technically correct, but it is not a tight bound. Θ(·) is a better bound. iv) False, O(·).
v) True.
vi) True.
vii) False, Ω(·).
Give the worst case and best case runtime in terms of M and N. Assume ping is in Θ(1) and
returns an int.
if (ping(i, j) > 64) break;
Worst: Θ(MN), Best: Θ(N) We repeat the outer loop N times, no matter what. For the inner loop, we see the amount of times we repeat it depends on the result of ping. In the best case, it returns true immediately, such that we’ll only ever look at the inner loop once and then break the inner loop. In the worst case, ping is always false and we complete the inner loop M times for every value of N in the outer loop.
f(n) = 20501
f(n) = n2 + n
f(n) = 22n + 1000 f(n) = log(n100) f(n)=nlogn+3n +n f (n) = n log n + n2
f (n) = n log n
g(n)=1
g(n) = 0.000001n3 g(n)=4n+n100
g(n) = nlogn g(n)=n2 +n+logn g(n)=logn+n2 g(n) = (log n)2
f (n) ∈ f (n) ∈ f (n) ∈ f (n) ∈ f (n) ∈ f (n) ∈ f (n) ∈
O(g(n)) Ω(g(n)) O(g(n)) Θ(g(n)) Ω(g(n)) Θ(g(n)) O(g(n))
for (int i = N; i > 0; i–) {
for (int j = 0; j <= M; j++) {
1
2
3 4} 5}
Disjoint Sets and Asymptotics 3
4 Disjoint Sets and Asymptotics
(d) Below we have a function that returns true if every int has a duplicate in the array, and false if
10 11 12 13 14
there is any unique int in the array. Assume sort(array) is in Θ(N log N ) and returns sorted.
public static boolean noUniques(int[] array) { array = sort(array);
int N = array.length;
for (int i = 0; i < N; i += 1) {
boolean hasDuplicate = false; for (int j = 0; j < N; j += 1) {
if (i != j && array[i] == array[j]) { hasDuplicate = true;
}
if (!hasDuplicate) return false; }
return true; }
1. Give the worst case and best case runtime where N = array.length.
array
1
2
3
4
5
6
7
8 9}
Its runtime is Θ(NlogN + N2) = Θ(N2) for the worst case the if statement always sets x to true. The best case is if we we don’t set x to be true in the very first loop, which allows us to only go through the entire array once giving us Θ(N logN + N ) = Θ(N logN ).
2. Try to come up with a way to implement noUniques() that runs in Θ(NlogN) time. Can we get any faster?
We should rely on the fact that a sorted array means all duplicates will be adjacent. curr represents the current item we are checking, and we check the item after curr (since our array is sorted) to see if a duplicate exists. There is a possible Θ(N) solution, but that involves data structures we haven’t covered yet!
public static boolean noUniques(int[] array) { array = sort(array);
int N = array.length;
int curr = array[0];
boolean unique = true;
for (int i = 1; i < N; i += 1) {
if (curr == array[i]) { unique = false;
} else if (unique) { return false;
} else {
unique = true;
curr = array[i];
}
}
return !unique; }