CS计算机代考程序代写 Java data structure algorithm CS61B: 2019

CS61B: 2019
Lecture 14: Disjoint Sets
¡ñ Dynamic Connectivity and the Disjoint Sets Problem
¡ñ Quick Find
¡ñ Quick Union
¡ñ Weighted Quick Union
¡ñ Path Compression (CS170 Preview)
datastructur.es

Meta-goals of the Coming Lectures: Data Structure Refinement
Next couple of weeks: Deriving classic solutions to interesting problems, with an emphasis on how sets, maps, and priority queues are implemented.
Today: Deriving the ¡°Disjoint Sets¡± data structure for solving the ¡°Dynamic Connectivity¡± problem. We will see:
¡ñ How a data structure design can evolve from basic to sophisticated.
¡ñ How our choice of underlying abstraction can affect asymptotic runtime
(using our formal Big-Theta notation) and code complexity.
datastructur.es

The Disjoint Sets Data Structure
The Disjoint Sets data structure has two operations:
¡ñ connect(x, y): Connects x and y.
¡ñ isConnected(x, y): Returns true if x and y are connected. Connections can
be transitive, i.e. they don¡¯t need to be direct. Example:
¡ñ connect(China, Vietnam)
¡ñ connect(China, Mongolia)
¡ñ isConnected(Vietnam, Mongolia)? true
¡ñ connect(USA, Canada)
¡ñ isConnected(USA, Mongolia)? false
¡ñ connect(China, USA)
¡ñ isConnected(USA, Mongolia)? true
datastructur.es
Hyperloop

The Disjoint Sets Data Structure
The Disjoint Sets data structure has two operations:
¡ñ connect(x, y): Connects x and y.
¡ñ isConnected(x, y): Returns true if x and y are connected. Connections can
be transitive, i.e. they don¡¯t need to be direct. Useful for many purposes, e.g.:
¡ñ Percolation theory:
¡ð Computational chemistry.
¡ñ Implementation of other algorithms: ¡ð Kruskal¡¯s algorithm.
datastructur.es
Hyperloop

Disjoint Sets on Integers
To keep things simple, we¡¯re going to:
¡ñ Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
¡ñ Declare the number of items in advance, everything is disconnected at start. 3
456
0
1
2
ds = DisjointSets(7) ds.connect(0, 1) ds.connect(1, 2) ds.connect(0, 4) ds.connect(3, 5) ds.isConnected(2, 4): true ds.isConnected(3, 0): false
datastructur.es

Disjoint Sets on Integers
To keep things simple, we¡¯re going to:
¡ñ Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
¡ñ Declare the number of items in advance, everything is disconnected at start. 3
456
0
1
2
ds = DisjointSets(7) ds.connect(0, 1) ds.connect(1, 2) ds.connect(0, 4) ds.connect(3, 5) ds.isConnected(2, 4): true ds.isConnected(3, 0): false ds.connect(4, 2)
datastructur.es

Disjoint Sets on Integers
To keep things simple, we¡¯re going to:
¡ñ Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
¡ñ Declare the number of items in advance, everything is disconnected at start. 3
456
0
1
2
ds = DisjointSets(7) ds.connect(0, 1) ds.connect(1, 2) ds.connect(0, 4) ds.connect(3, 5) ds.isConnected(2, 4): true ds.isConnected(3, 0): false ds.connect(4, 2) ds.connect(4, 6)
datastructur.es

Disjoint Sets on Integers
To keep things simple, we¡¯re going to:
¡ñ ¡ñ
Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA). Declare the number of items in advance, everything is disconnected at start.
3
456
0
1
2
ds = DisjointSets(7) ds.connect(0, 1) ds.connect(1, 2) ds.connect(0, 4) ds.connect(3, 5) ds.isConnected(2, 4): true ds.isConnected(3, 0): false ds.connect(4, 2) ds.connect(4, 6)
ds.connect(3, 6)
datastructur.es

Disjoint Sets on Integers
To keep things simple, we¡¯re going to:
¡ñ ¡ñ
Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA). Declare the number of items in advance, everything is disconnected at start.
3
456
0
1
2
ds = DisjointSets(7) ds.connect(0, 1) ds.connect(1, 2) ds.connect(0, 4) ds.connect(3, 5) ds.isConnected(2, 4): true ds.isConnected(3, 0): false ds.connect(4, 2) ds.connect(4, 6)
ds.connect(3, 6) ds.isConnected(3, 0): true
datastructur.es

The Disjoint Sets Interface
public interface DisjointSets {
/** Connects two items P and Q. */ void connect(int p, int q);
/** Checks to see if two items are connected. */
boolean isConnected(int p, int q);
}
getBack()
connect(int p, int q)
deleteBack()
isConnected(int p, int q)
get(int i)
Goal: Design an efficient DisjointSets implementation.
¡ñ Number of elements N can be huge.
¡ñ Number of method calls M can be huge.
¡ñ Calls to methods may be interspersed (e.g. can¡¯t assume it¡¯s onlu connect
operations followed by only isConnected operations).
datastructur.es

The Naive Approach
Naive approach:
¡ñ Connecting two things: Record every single connecting line in some data structure.
¡ñ Checking connectedness: Do some sort of (??) iteration over the lines to see if one thing can be reached from the other.
0
1
2
4
36
5
datastructur.es

A Better Approach: Connected Components
Rather than manually writing out every single connecting line, only record the sets that each item belongs to.
{0}, {1}, {2}, {3}, {4}, {5}, {6}
{0, 1}, {2}, {3}, {4}, {5}, {6}
{0, 1, 2}, {3}, {4}, {5}, {6}
{0, 1, 2, 4}, {3}, {5}, {6}
{0, 1, 2, 4}, {3, 5}, {6}
{0, 1, 2, 4}, {3, 5}, {6}
{0, 1, 2, 4, 6}, {3, 5}
{0, 1, 2, 3, 4, 5, 6}
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5) isConnected(2, 4): true isConnected(3, 0): false connect(4, 2)
connect(4, 6) connect(3, 6) isConnected(3, 0): true
datastructur.es

A Better Approach: Connected Components
For each item, its connected component is the set of all items that are connected to that item.
¡ñ Naive approach: Record every single connecting line somehow.
¡ñ Better approach: Model connectedness in terms of sets.
¡ð ¡ð
How things are connected isn¡¯t something we need to know.
Only need to keep track of which connected component each item belongs to.
0
1
2
4
36
5
{ 0, 1, 2, 4 }, {3, 5}, {6}
Up next: We¡¯ll consider how to do track set membership in Java.
datastructur.es

Quick Find
datastructur.es

Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 3) operation: After connect(2, 3) operation:
366
454
{ 0, 1, 2, 4 }, {3, 5}, {6} { 0, 1, 2, 4, 3, 5}, {6}
0
1
2
0
1
2
3
5
Assume elements are numbered from 0 to N-1.
datastructur.es

Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 3) operation: After connect(2, 3) operation:
366
454
{ 0, 1, 2, 4 }, {3, 5}, {6} { 0, 1, 2, 4, 3, 5}, {6}
Map — first number represents set and second represents item
¡ñ Slow because you have to iterate to find which set something belongs to.
0
1
2
0
1
2
3
5
Assume elements are numbered from 0 to N-1.
datastructur.es

Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 3) operation: After connect(2, 3) operation:
366
454
{ 0, 1, 2, 4 }, {3, 5}, {6} { 0, 1, 2, 4, 3, 5}, {6}
Map — first number represents the item, and the second is the set number.
¡ñ More or less what we get to shortly, but less efficient for reasons I will explain.
0
1
2
0
1
2
3
5
Assume elements are numbered from 0 to N-1.
datastructur.es

Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 3) operation: After connect(2, 3) operation:
366
454
{ 0, 1, 2, 4 }, {3, 5}, {6} { 0, 1, 2, 4, 3, 5}, {6}
Idea #1: List of sets of integers, e.g. [{0, 1, 2, 4}, {3, 5}, {6}]
¡ñ In Java: List>.
¡ñ Very intuitive idea.
0
1
2
0
1
2
3
5
datastructur.es

Challenge: Pick Data Structures to Support Tracking of Sets
If nothing is connected:
0123456
Idea #1: List of sets of integers, e.g. [{0}, {1}, {2}, {3}, {4}, {5}, {6}]
¡ñ In Java: List>.
¡ñ Very intuitive idea.
¡ñ Requires iterating through all the sets to find anything. Complicated and slow!
¡ð Worst case: If nothing is connected, then isConnected(5, 6) requires iterating through N-1 sets to find 5, then N sets to find 6. Overall runtime of ¦¨(N).
datastructur.es

Performance Summary
Implementation
constructor
connect
isConnected
ListOfSetsDS
¦¨(N)
O(N)
O(N)
Constructor¡¯s runtime has order of growth N no matter what, so ¦¨(N).
ListOfSetsDS is complicated and slow.
¡ñ Operations are linear when number of connections are small. ¡ð Have to iterate over all sets.
¡ñ Important point: By deciding to use a List of Sets, we have doomed ourselves to complexity and bad performance.
Worst case is ¦¨(N), but other cases may be better. We¡¯ll say O(N) since O means ¡°less than or equal¡±.
datastructur.es

My Approach: Just Use a Array of Integers
Before connect(2, 3) operation: After connect(2, 3) operation:
366
0
1
2
0
1
2
3
454
{ 0, 1, 2, 4 }, {3, 5}, {6} { 0, 1, 2, 4, 3, 5}, {6}
5
4
4
4
5
4
5
6
5
5
5
5
5
5
6
int[] id
int[] id
0123456 0123456
Idea #2: list of integers where ith entry gives set number (a.k.a. ¡°id¡±) of item i. ¡ñ connect(p, q): Change entries that equal id[p] to id[q]
datastructur.es

QuickFindDS
public class QuickFindDS implements DisjointSets {
private int[] id;
Very fast:
public boolean isConnected(int p, int q) { return id[p] == id[q];
}
public void connect(int p, int q) { int pid = id[p];
Relatively slow:
int qid = id[q];
for (int i = 0; i < id.length; i++) { if (id[i] == pid) { id[i] = qid; } }... public QuickFindDS(int N) { id = new int[N]; for (int i = 0; i < N; i++) id[i] = i; } } Two array accesses: ¦¨(1) N+2 to 2N+2 array accesses: ¦¨(N) datastructur.es Performance Summary Implementation constructor connect isConnected ListOfSetsDS ¦¨(N) O(N) O(N) QuickFindDS ¦¨(N) ¦¨(N) ¦¨(1) QuickFindDS is too slow for practical use: Connecting two items takes N time. ¡ñ Instead, let¡¯s try something more radical. datastructur.es Quick Union datastructur.es Improving the Connect Operation Approach zero: Represent everything as boxes and lines. Overly complicated. 36 ??? in Java instance variables. 45 ListOfSets: Represent everything as connected components. Represented connected components as list of sets of integers. { 0, 1, 2, 4 }, {3, 5}, {6} QuickFind: Represent everything as connected components. Represented connected components as a list of integers, where value = id. 0 1 2 [{0, 1, 2, 4}, {3, 5}, {6}] List>
{ 0, 1, 2, 4 }, {3, 5}, {6}
[2, 2, 2, 3, 2, 3, 6] int[]
datastructur.es

Improving the Connect Operation
QuickFind: Represent everything as connected components. Represented connected components as a list of integers where value = id.
¡ñ Bad feature: Connecting two sets is slow! { 0, 1, 2, 4 }, {3, 5}, {6}
[2, 2, 2, 3, 2, 3, 6] int[]
Next approach (QuickUnion): We will still represent everything as connected components, and we will still represent connected components as a list of integers. However, values will be chosen so that connect is fast.
datastructur.es

Improving the Connect Operation
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
0
1
2
36 36
4545
{ 0, 1, 2, 4 }, {3, 5}, {6} { 0, 1, 2, 4, 3, 5}, {6}
id
0123456 0123456
0
1
2
0
0
0
3
0
3
6
3
3
3
3
3
3
6
id
datastructur.es

Improving the Connect Operation (Your Answer)
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
¡ñ Suggestion, use pointers!
36 36
4545
{ 0, 1, 2, 4 }, {3, 5}, {6} { 0, 1, 2, 4, 3, 5}, {6}
0
1
2
0
1
2
id
id
0123456 0123456
ZERO
THREE
SIX
THREE
THREE
SIX
datastructur.es

Improving the Connect Operation
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
¡ñ Idea: Assign each item a parent (instead of an id). Results in a tree-like shape.
¡ð An innocuous sounding, seemingly arbitrary solution.
¡ð Unlocks a pretty amazing universe of math that we won¡¯t discuss.
-1
0
1
-1
0
3
-1
parent
Note: The optional textbook has an item¡¯s parent as itself instead of -1 for root items.
0123456
0
0 is the ¡°root¡± of this set.
3
6
1 425
{0, 1, 2, 4} {3, 5} {6}
datastructur.es

Improving the Connect Operation
connect(5, 2)
¡ñ How should we change the parent list to handle this connect operation?
¡ð If you¡¯re not sure where to start, consider: why can¡¯t we just set id[5] to 2?
-1
0
1
-1
0
3
-1
parent
0123456
0
0 is the ¡°root¡± of this set.
3
6
1 425
{0, 1, 2, 4} {3, 5} {6}
datastructur.es

Improving the Connect Operation (Your Answer)
connect(5, 2)
¡ñ One possibility, set id[3] = 2
¡ñ Set id[3] = 0
-1
0
1
2
0
2
-1
parent
0123456
0
0 is the ¡°root¡± of this set.
3
6
1 425
{0, 1, 2, 4} {3, 5} {6}
datastructur.es

Improving the Connect Operation
connect(5, 2)
¡ñ Find root(5). // returns 3
¡ñ Find root(2). // returns 0
¡ñ Set root(5)¡¯s value equal to root(2).
-1
0
1
-1
0
3
-1
parent
0123456
0
0 is the ¡°root¡± of this set.
3
6
1 425
{0, 1, 2, 4} {3, 5} {6}
datastructur.es

Improving the Connect Operation
connect(5, 2)
¡ñ Find root(5). // returns 3
¡ñ Find root(2). // returns 0
¡ñ Set root(5)¡¯s value equal to root(2).
-1
0
1
0
0
3
-1
parent
0123456 0
0 is the ¡°root¡± of this set.
3
6
1 425
{0, 1, 2, 4} {3, 5} {6}
datastructur.es

Set Union Using Rooted-Tree Representation
connect(5, 2)
¡ñ Make root(5) into a child of root(2).
parent
0123456
What are the potential performance issues with this approach? ¡ñ Compared to QuickFind, we have to climb up a tree.
-1
0
1
0
0
3
-1
0
0 is the ¡°root¡± of this set.
3
6
1 425
{0, 1, 2, 4, 3, 5} {6}
datastructur.es

Set Union Using Rooted-Tree Representation
connect(5, 2)
¡ñ Make root(5) into a child of root(2).
parent
0123456
What are the potential performance issues with this approach? ¡ñ Tree can get too tall! root(x) becomes expensive.
-1
0
1
0
0
3
-1
0
0 is the ¡°root¡± of this set.
3
6
1 425
{0, 1, 2, 4, 3, 5} {6}
datastructur.es

The Worst Case
If we always connect the first item¡¯s tree below the second item¡¯s tree, we can end up with a tree of height M after M operations:
¡ñ connect(4, 3)
¡ñ connect(3, 2)
¡ñ connect(2, 1)
¡ñ connect(1, 0)
For N items, what¡¯s the worst case runtime…
¡ñ For connect(p, q)?
¡ñ For isConnected(p, q)?
0
1
2
3
4
datastructur.es

The Worst Case
If we always connect the first item¡¯s tree below the second item¡¯s tree, we can end up with a tree of height M after M operations:
¡ñ connect(4, 3)
¡ñ connect(3, 2)
¡ñ connect(2, 1)
¡ñ connect(1, 0)
For N items, what¡¯s the worst case runtime…
¡ñ For connect(p, q)? ¦¨(N)
¡ñ For isConnected(p, q)? ¦¨(N)
0
1
2
3
4
datastructur.es

QuickUnionDS
public boolean isConnected(int p, int q) { return find(p) == find(q);
} {
Here the find operation is the same as the ¡°root(x)¡± idea we had in earlier slides.
int i = find(p);
int j = find(q);
parent[i] = j;
}
datastruct
public class QuickUnionDS implements DisjointSets { private int[] parent;
public QuickUnionDS(int N) {
parent = new int[N];
for (int i = 0; i < N; i++) { parent[i] = -1; } } private int find(int p) { int r = p; while (parent[r] >= 0) { r = parent[r]; }
return r; }
For N items, this means a worst case runtime of ¦¨(N).
ur.es

Performance Summary
Implementation
Constructor
connect
isConnected
ListOfSetsDS
¦¨(N)
O(N)
O(N)
QuickFindDS
¦¨(N)
¦¨(N)
¦¨(1)
QuickUnionDS
¦¨(N)
O(N)
O(N)
Using O because runtime can
be between constant and linear.
QuickFindDS defect: QuickFindDS is too slow: Connecting takes ¦¨(N) time. QuickUnion defect: Trees can get tall. Results in potentially even worse
performance than QuickFind if tree is imbalanced.
¡ñ Observation: Things would be fine if we just kept our tree balanced.
datastructur.es

Weighted Quick Union
datastructur.es

A Choice of Two Roots: http://yellkey.com/reveal Suppose we are trying to connect(2, 5). We have two choices:
A. Make 5¡¯s root into a child of 2¡¯s root.
B. Make 2¡¯s root into a child of 5¡¯s root.
Which is the better choice?
0
425
3
05
4
13
0
3
1 425
+
1
2 datastructur.es

A Choice of Two Roots
Suppose we are trying to connect(2, 5). We have two choices:
A. Make 5¡¯s root into a child of 2¡¯s root.
B. Make 2¡¯s root into a child of 5¡¯s root.
Which is the better choice?
0
13
Height: 2
0
425
3
05
4
1 425
+
3
1
Height: 3
2
datastructur.es

A Choice of Two Roots
Suppose we are trying to connect(2, 5). We have two choices:
A. Make 5¡¯s root into a child of 2¡¯s root.
B. Make 2¡¯s root into a child of 5¡¯s root.
Which is the better choice?
0
13
Height: 2
0
3
4
+
425
3
05
4
2
1
5
1
Height: 3
2
datastructur.es

Weighted QuickUnion: http://yellkey.com/society Modify quick-union to avoid tall trees.
¡ñ Track tree size (number of elements).
¡ñ New rule: Always link root of smaller tree to larger tree.
New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?
A. parent[3] B. parent[0] C. parent[8] D. parent[6]
0
12345
6
78
9
-1
0
0
0
0
0
-1
6
6
8
parent
0123456789
datastructur.es

Improvement #1: Weighted QuickUnion
Modify quick-union to avoid tall trees.
¡ñ Track tree size (number of elements).
¡ñ New rule: Always link root of smaller tree to larger tree.
New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?
A. parent[3] B. parent[0] C. parent[8] D. parent[6]
1 2
0
3 4 5
6
78
9
Note: The rule I picked is based on weight, not height. We¡¯ll talk about why soon.
-1
0
0
0
0
0
0
6
6
8
parent
0123456789
datastructur.es

Implementing WeightedQuickUnion
Minimal changes needed:
¡ñ ¡ñ ¡ñ
Use parent[] array as before.
isConnected(int p, int q) requires no changes. connect(int p, int q) needs to somehow keep track of sizes.
¡ð See the Disjoint Sets lab for the full details.
¡ð Two common approaches:
¡ö Use values other than -1 in parent array for root nodes to track size. ¡ö Create a separate size array.
parent
0123456789
size
0123456789
-6
0
0
0
0
0
-4
6
6
8
10
1
1
1
1
1
4
1
2
1
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible.
N
H
1
0
0
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible.
N
H
1
0
2
1
0
1
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible.
N
H
1
0
2
1
02
13
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible.
N
H
1
0
2
1
4
2
0
12
3
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible.
N
H
1
0
2
1
4
2
04
125
3
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible.
N
H
1
0
2
1
4
2
046
1257
3
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible.
N
H
1
0
2
1
4
2
04
1256
37
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible.
N
H
1
0
2
1
4
2
8
3
0
124
356
7
datastructur.es

Weighted Quick Union Performance
Let¡¯s consider the worst case where the tree height grows as fast as possible. ¡ñ Worst case tree height is ¦¨(log N).
N
H
1
0
2
1
4
2
8
3
16
4
0
1248
3 5 6 9 10 12
7 11 13 14
15
datastructur.es

Performance Summary
Implementation
Constructor
connect
isConnected
ListOfSetsDS
¦¨(N)
O(N)
O(N)
QuickFindDS
¦¨(N)
¦¨(N)
¦¨(1)
QuickUnionDS
¦¨(N)
O(N)
O(N)
WeightedQuickUnionDS
¦¨(N)
O(log N)
O(log N)
QuickUnion¡¯s runtimes are O(H), and WeightedQuickUnionDS height is given by H = O(log N). Therefore connect and isConnected are both O(log N).
By tweaking QuickUnionDS we¡¯ve achieved logarithmic time performance. ¡ñ Fast enough for all practical problems.
datastructur.es

Why Weights Instead of Heights?
We used the number of items in a tree to decide upon the root.
¡ñ Why not use the height of the tree?
¡ð Worst case performance for HeightedQuickUnionDS is asymptotically
the same! Both are ¦¨(log(N)).
¡ð Resulting code is more complicated with no performance gain.
0
12345
06
12345+78 7
9
6
8
9
datastructur.es

Path Compression (CS170 Spoiler)
datastructur.es

What We¡¯ve Achieved
Implementation
Constructor
connect
isConnected
ListOfSetsDS
¦¨(N)
O(N)
O(N)
WeightedQuickUnionDS
¦¨(N)
O(log N)
O(log N)
Performing M operations on a DisjointSet object with N elements:
¡ñ For our naive implementation, runtime is O(MN).
¡ñ For our best implementation, runtime is O(N + M log N).
¡ñ For N = 109 and M = 109, difference is 30 years vs. 6 seconds.
¡ð Key point: Good data structure unlocks solutions to problems that could otherwise not be solved!
¡ñ Good enough for all practical uses, but could we theoretically do better?
datastructur.es

Suppose we have a ListOfSetsDS implementation of Disjoint Sets. Suppose that it has 1000 items, i.e. N = 1000.
Suppose we perform a total of 150 connect operations and 212 isConnected operations.
¡ñ M = 150 + 212 = 362 operations
So when we say O(NM), we¡¯re saying it¡¯ll take no more than 1000 * 362 units of
time (in some arbitrary unit of time).
¡ñ This is a bit informal. O is really about asymptotics, i.e. behavior for very large N and M, not specific N and Ms that we pick.
datastructur.es

170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion.
¡ñ Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.
¡ð Additional cost is insignificant (same order of growth).
0
24
5 78910
11 12 13 14
1
3
6
15
datastructur.es

Path Compression: Theoretical Performance (Bonus)
Path compression results in a union/connected operations that are very very close to amortized constant time.
¡ñ M operations on N nodes is O(N + M lg* N).
¡ñ A tighter bound: O(N + M ¦Á(N)), where ¦Á is the inverse Ackermann function.
¡ñ The inverse ackermann function is less than 5 for all practical inputs!
¡ð See ¡°Efficiency of a Good But Not Linear Set Union Algorithm.¡±
¡ð Written by Bob Tarjan while at UC Berkeley in 1975.
0
15 11 5
2 3 10 4
1
12 789
6
N
1




¦Á(N)
0
1
2
3
4
5
13 14
datastructur.es

170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion
¡ñ Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.
¡ð Additional cost is insignificant (same order of growth). 0
1
11 15
234
5 78910
12 13 14
6
datastructur.es

Path Compression: Another Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion
¡ñ Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.
¡ð Additional cost is insignificant (same order of growth). 0
15 11 5
2 3 10 4
1
12 789
13 14
6
datastructur.es

Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
0
15 11 5
12 789
13 14
1
2 3 10 4
6
datastructur.es

Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
0
15 11 5
12 789
13 14
1
2 3 10 4
6
datastructur.es

Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
0
151151 23104
12 6 7 8 9 13 14
datastructur.es

Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
0
15 11 5 1 13
12 7 9
3 10 4
8
2
6
14
datastructur.es

170 Spoiler: Path Compression: A Clever Idea
Path compression results in a union/connected operations that are very very close to amortized constant time (amortized constant means ¡°constant on average¡±).
¡ñ M operations on N nodes is O(N + M lg* N) – you will see this in CS170.
¡ñ lg* is less than 5 for any realistic input.
N
lg* N
1
0
2
1
4
2
16
3
65536
4
265536
5
0
15 11 5
2 3 10 4
1
12 789
6
216
13 14
datastructur.es

Path Compression: Theoretical Performance (Bonus)
Path compression results in a union/connected operations that are very very close to amortized constant time.
¡ñ M operations on N nodes is O(N + M lg* N).
¡ñ A tighter bound: O(N + M ¦Á(N)), where ¦Á is the inverse Ackermann function.
¡ñ The inverse ackermann function is less than 5 for all practical inputs!
¡ð See ¡°Efficiency of a Good But Not Linear Set Union Algorithm.¡±
¡ð Written by Bob Tarjan while at UC Berkeley in 1975.
0
15 11 5
2 3 10 4
1
12 789
6
N
1




¦Á(N)
0
1
2
3
4
5
13 14
datastructur.es

A Summary of Our Iterative Design Process
And we¡¯re done! The end result of our iterative design process is the standard way disjoint sets are implemented today – quick union and path compression.
The ideas that made our implementation efficient:
¡ñ Represent sets as connected components (don¡¯t track individual connections).
¡ð ListOfSetsDS: Store connected components as a List of Sets (slow,
complicated).
¡ð QuickFindDS: Store connected components as set ids.
¡ð QuickUnionDS: Store connected components as parent ids.
¡ö WeightedQuickUnionDS: Also track the size of each set, and use size to decide on new tree root.
¡ñ WeightedQuickUnionWithPathCompressionDS: On calls to connect and isConnected, set parent id to the root for all items seen.
datastructur.es

Performance Summary
Implementation
Runtime
ListOfSetsDS
O(NM)
QuickFindDS
¦¨(NM)
QuickUnionDS
O(NM)
WeightedQuickUnionDS
O(N + M log N)
WeightedQuickUnionDSWithPathCompression
O(N + M ¦Á(N))
Runtimes are given assuming:
¡ñ We have a DisjointSets object of size N.
¡ñ We perform M operations, where an operation is defined as either a call to
connected or isConnected.
datastructur.es

Citations
Nazca Lines:
http://redicecreations.com/ul_img/24592nazca_bird.jpg
The proof of the inverse ackermann runtime for disjoint sets is given here:
http://www.uni-trier.de/fileadmin/fb4/prof/INF/DEA/Uebungen_LVA-Ankuendi gungen/ws07/KAuD/effi.pdf
as originally proved by Tarjan here at UC Berkeley in 1975.
datastructur.es