COMP251: Disjoint sets
Jérôme Waldispühl School of Computer Science
McGill University
Based on slides from M. Langer (McGill)
Problem
Let G=(V,E) be undirected graph, and A, B Î V two nodes of G. Question: Is there a path between A and B?
But we are not interested in knowing the path between A and B.
Is there a faster way to solve this problem (faster than DFS or BFS)?
Connected components
Connected component: Set of nodes connected by a path.
12 15
10 2
8 14
1
9
7
3
5 11 6 13 4
Question: Given 2 nodes A & B, are they in the same component?
Partition
Generalization: Set of object partitioned into disjoint subsets.
1 3
5
8 14
12
S1 S2 S6 S5
9 7 15
10
11 6 4
13
S3
2
& S ≠Ø∀i∈{1,…,n} S=S∪S∪…∪S(‘ i
1 2 n () S i ∩ S j = Ø i f f i ≠ j
S4
Motivations
• Data structure used to manage sets and perform classical operations (union, find, intersection…)
• Used in many algorithms (e.g. Kruskal, Floyd-Marshall)
Map vs. Relation
99
Maporf:1 2 1 2
848 SS
Relation R Í { (a,b) : a,b Î S } b a
function 6 3 7 3 4565
7
{(a,f(a))}
01101 01010 10010 11001 01100
Any Boolean matrix defines a relation
Equivalence relation 1234567
1000011 0110000 0110000 0001000 0000100 1000011 1000011
6
12 73
5 4
1 2 3 4 5 6 7
i is equivalent to j if they belong to the same set. (more constrained that general relation)
• Reflexivity • Symmetry • Transitivity
Example:
∀a ∈ S, (a, a) ∈ R
∀a, b ∈ S, (a, b) ∈ R ⇒ (b, a) ∈ R
∀a,b,c ∈ S, (a,b) ∈ R and (b,c) ∈ R ⇒ (a,c) ∈ R
Equivalence relation
For any undirected graph, the connections define an equivalence relation on vertices.
• ForalluÎV,thereisapathoflength0fromutou.
• Forallu,vÎV,Thereisapathfromutov,iffthereisapath
from v to u.
• Forallu,v,wÎV,ifthereisapathfromutovandapathfromv
to w, then there is a path from u to w.
Disjoint set ADT
8
3 97 15
1
14
6 13
11 2
10 4
12
5
Each set in the partition as a representative member.
• find(i) returns the representative of the set that contains i.
• sameset(i,j) returns the boolean value find(i)==find(j)
• union(i,j) merges the sets containing i and j.
Union of disjoint sets union(i,j) merges the sets containing i and j.
• Does nothing if i and j are already in the same set.
• Otherwise, we merge the set and need a policy to decide who will be the representative of the new merged set.
Rep[]
1 93
Quick find
2 6
1
1
2
3
3
3
4
4
5
7
6
1
7
7
8
7
9
1
7
4
Let Rep[i] Î { 1, 2, … , n } be the representative of the set containing i.
5
8
(Quick find) & union
• find(i) { return rep[i]; }
• union(i,j) merges the sets containing i and j.
Example: union(2,6) i
j
1
1
2
3
3
3
4
4
5
7
6
1
7
7
8
7
9
1
1
1
2
3
3
3
4
4
5
7
6
1
7
7
8
7
9
1
1
1
2
1
3
1
4
4
5
7
6
1
7
7
8
7
9
1
• •
store value of rep[i] because it may change during the execution of the algorithm.
O(n) running time… slow.
(Quick find) & union
union(i,j) {
if rep[i] != rep[j] {
prevrepi = rep[i];
for (k=1; k<=n; k++) {
prevrepi {
if rep[k] == rep[i] {
rep[k] = rep[j];
}
} }
}
Tree representation & forests
• Represent the disjoint sets by a forest of rooted trees.
• Roots are the representative (i.e. find(i) == findroot(i)).
• Each node points to its parent.
1
58 11
7
296 4
3
10
Note: The tree structure does not necessarily represent the relationship between the stored objects.
p[]
Array representation
1
1
2
7
3
3
4
9
5
1
6
7
7
7
8
1
9
7
10
3
11
5
173
5 8 2 9 6 10
11
• Non-rootnodesholdindexoftheirparent. • Rootnodesstoretheirownvalue.
4
Find & Union
find(i) {
if p[i] == i {
return i;
} else {
return find(p[i]);
}
}
union(i,j) {
if find(i) != find(j) {
p[find(i)] = find(j);
}
}
Remark: Arbitrarily merge the set on i into the set of j.
Union example
union(9,11) Root of the tree of 11 becomes the
parent of the root of the tree of 9.
71 29658
4 11
1 58
11
7
296 4
Worst case
union(1,2)
union(1,3)
union(1,4)
... union(1,n)
n
n-1
Then, find(1) is O(n)...
4
3
2 1
...
Union by size
Heuristic to control the height to the trees after merging.
Idea: Merge tree with smaller number of nodes into the tree with the largest number of nodes (In practice, we can also use rank which is an upper bound on the height of nodes).
71
29658 4
7
2961
4
58
Union by size
Claim: The depth of any node is at most log n.
Proof:
• If union causes the depth of a node to increase, then this node must belong to the smallest tree.
7
2961
4
• Thus, the size of the (merged) tree containing this node will at least double.
• But we can double the size of a tree at most log n times.
+1
58
Union by height
Idea: Merge tree with smaller height into tree with larger height.
Claim: The height of trees obtained by union-by-height is at most log n.
Corollary: An union-by-height tree of height h has at least nh 3 2h nodes.
Proof (Corollary):
• Base case: a tree of height 0 has one node.
• Induction: (hypothesis) nh 3 2h. Show nh+1 3 2h+1.
Running time
find(i)
Quick find Union by size Union by height
union(i,j)
O(1)
O(n)
O(log n)
O(log n)
O(log n)
O(log n)
Quick Union
Note: These are worst case complexities.
Quick union makes 2 calls to find.
Path compression
• Find path = nodes visited during the execution of find() on the trip to the root.
• Make all nodes on the find path direct children of root. a
b c
d
a dcb
Path compression
find(i) {
if p[i] == i {
return i;
} else {
}
p[i] = find(p[i]);
return find(p[i]);
return p[i];
}
} }
Running time
• Use union by size and path compression.
• Worst case running time is O(log n).
• However, we can show that m union or find operations take O( m α(n) ).
What is α(n) ?
n α(n) 0-2 0 31 4-7 2 8 - 2047 3
2048 – A4(1) 4 Where A4(1) >> 1080 !!