CS代考 FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 1

COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
This material has been reproduced and communicated to you by or on behalf of Monash University pursuant to Part VB of the Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act. Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 1 / 35

Copyright By PowCoder代写 加微信 powcoder

Prepared by: [ ] Extended by:
FIT3155: Advanced Algorithms and Data Structures
Week 4: The disjoint-set data structure
Faculty of Information Technology, Monash University
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 2 / 35

What is covered in this lecture?
The disjoint-set data structure and its analysis
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 3 / 35

Source material and recommended reading
, Data Structures and Algorithm Analysis (Chapter 8).
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 4 / 35

Equivalence relationship
A relation ⊗ is defined over members/elements of some set S. For any pair of elements (a,b) from this set S:
􏰥 a ⊗ b results in a true or false answer.
What is an equivalence relation
An equivalence relation is a relation ⊗ that satisfies:
reflexive property: a ⊗ a for all a in set S
symmetric property: a ⊗ b implies b ⊗ a for all a, b in S
transitive property: a⊗b and b⊗c implies a⊗c for all a,b,c in S
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 5 / 35

Equivalence class
An equivalence class of an element a ∈ S defines a subset of elements from S, where all elements in that subset are related to a.
Every element of S belongs to exactly one equivalence class (subset). To check if two elements a and b are related, we only have to check if
they are in the same equivalence class.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 6 / 35

Basic disjoint-set data structure
Disjoint set data structure supports two basic operations:
find(a): This returns the name/label of the subset (i.e. equivalence class) containing the element a in the set S.
Note: the name/label of the subset itself is arbitrary. All that really matters is this: For two elements a and b to be related, we should check if find(a)==find(b).
union(a,b): Merge the two (disjoint) subsets containing a and b in S. In practice, this is implemented as
union(find(a), find(b)).
The input to this data structure is initially a collection of N elements, that are treated to be disjoint (no relation) with each other. Using find(·) and union(·, ·) operations, the relations are dynamically checked and (new relations) established.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 7 / 35

Some applications of Disjoint set data structure
Kruskal’s algorithm
Keeping track of connected components of a graph Computing Lowest Common Ancestor in trees Checking equivalence of finite state automata Hindley-Milner polymorphic type inference
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 8 / 35

RECALL FROM FIT2004?
Kruskal’s algorithm introduced a basic implementation of disjoint-set data structure. – Revise, if forgotten!
This involved maintaining the disjoint data-structure using:
1 an array of linked-lists to support union(a, b).
2 a membership array to support find(a).
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 9 / 35

RECALL FROM FIT2004? – continued
Using the implementation on previous slide:
find(a) operation can be achieved via array access in O(1)-time.
union(a,b) operation can be achieved in O(N)-time, because it requires:
􏰥 appending two linked lists (each denoting a subset being merged)
􏰥 change membership array for elements in the smaller of the two
subsets, so that they are now merged.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 10 / 35

New disjoint set data structure using just an array
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parentarray −1 −1 −1 −1 −1 −1 −1 −1
The above example shows N = 8 disjoint elements initially. These are numbered {0,1,2,··· ,7}.
A simple parent array is used to capture this information. In the parent array, the imaginary parent is denoted as −1. In general, each element in the array points to its parent.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 11 / 35

Example: Data structure after a series of union(.) operations
union(4, 5) union(6, 7) union(5, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure

Example: Data structure after a series of union(.) operations
union(4, 5) union(6, 7) union(5, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure

Example: Data structure after a series of union(.) operations
union(4, 5) union(6, 7) union(5, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure

Smart Union algorithms – Motivation
union(·) in the earlier examples performed a union by making the second tree/subset a subtree of the first.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 13 / 35

Smart Union algorithms – Motivation
union(·) in the earlier examples performed a union by making the second tree/subset a subtree of the first.
This was really an arbitrary choice.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 13 / 35

Smart Union algorithms – Motivation
union(·) in the earlier examples performed a union by making the second tree/subset a subtree of the first.
This was really an arbitrary choice.
In fact, in the examples on slides (#11-12), the subsets being merged
(“union”-ed) were of the same size. So it did not matter.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 13 / 35

Smart Union algorithms – Motivation
union(·) in the earlier examples performed a union by making the second tree/subset a subtree of the first.
This was really an arbitrary choice.
In fact, in the examples on slides (#11-12), the subsets being merged
(“union”-ed) were of the same size. So it did not matter.
But in general, a smarter way would be to make the smaller tree (in terms of the number of elements in it) the subtree of the larger tree. This is called union-by-size.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 13 / 35

Smart Union algorithms – Motivation
union(·) in the earlier examples performed a union by making the second tree/subset a subtree of the first.
This was really an arbitrary choice.
In fact, in the examples on slides (#11-12), the subsets being merged
(“union”-ed) were of the same size. So it did not matter.
But in general, a smarter way would be to make the smaller tree (in terms of the number of elements in it) the subtree of the larger tree. This is called union-by-size.
An even smarter approach would be to make the shorter/shallower tree (in tree height) the subtree of the taller/deeper tree. This is called union-by-height.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 13 / 35

Union-by-size
When union(a, b) is carried out using union-by-size: Ensure a and b are in two disjoint trees.
Then point the root node of the tree with smaller number of elements to the root for the larger one.
􏰥 If both sizes are equal, break the tie arbitrarily.
The cell corresponding to the merged root in the parent array is
updated to store the merged tree/subset size (as a negative number)
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 14 / 35

union-by-size example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 15 / 35

union-by-size example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure

union-by-size example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 15 / 35

union-by-size example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 15 / 35

union-by-size example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 15 / 35

Implementation of Union-by-size: Initialization
At the start, initialize N disjoint subsets of elements.
This involves initializing the parent array to all −1 values.
Recall: In union-by-size, the cell in the parent array corresponding to any (disjoint) tree’s root node, stores the size of the tree as a negative number.
Initialization
Make_disjoint_set(a)
7 Make_disjoint_set(x) {
parent[x] = -1
InitSet(N) {
for (a = 0 to N-1) {
(FIT3155 S1 2022, Monash University)
L04: Disjoint-set data structure

Implementation of Union-by-size: – find(a)
Search until the root of the tree (containing ‘a’) is reached; return the root’s label/index. find(a)
// find root of the tree containing ‘a’
while (parent[a] >= 0) { // note: while parent[a] not negative
6 return a // ‘a’ now indexes the root node of the subtree/subset 7}
a = parent[a]
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 17 / 35

Implementation of Union-by-size: – union(a, b)
Recall: When executing union(a, b), ensure a and b are in two disjoint trees. Then link the root
node of the smaller (in size) tree to the root for the larger one. If both sizes are equal, break the tie arbitrarily. Update the merged tree size (stored as a negative number)
union(a, b) – by size
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
union(a,b) {
root_a = find(a) // find root of tree containing ‘a’
root_b = find(b) // find root of tree containing ‘b’
if (root_a == root_b) return // ‘a’ and ‘b’ in the same tree
size_a = -parent[root_a] size_b = -parent[root_b]
if (size_a > size_b) {
parent[root_b] = root_a
parent[root_a] = -(size_a+size_b) // update size
else // if (size_b >= size_a) parent[root_a] = root_b parent[root_b] = -(size_a+size_b)
// link smaller tree’s root to larger
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 18 / 35

Union-by-height (and union-by-rank) When union(a, b) is carried out using union-by-height:
Ensure a and b are in two disjoint trees.
Then make the disjoint tree with shorter height a subtree of the taller tree.
That is, make the root of the shorter tree point to the root of the taller one, after union.
􏰥 If both heights are equal, break the tie arbitrarily.
The cell corresponding to the merged root in the parent array is updated to store the merged tree’s (height +1) as a negative number.*
A quick note on Union-by-rank
Union-by-rank is essentially identical to union-by-height, but includes an additional optimization called path compression to be discussed later on, on slide #24.
*Note: We store height + 1 because height can be 0 (for a tree with a singleton node) and hence cannot be coded as a negative number in the parent array.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 19 / 35

union-by-height example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 20 / 35

union-by-height example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure

union-by-height example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 20 / 35

union-by-height example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 20 / 35

union-by-height example
initial state
union(4, 5) union(6, 7) union(5, 7) union(3, 7)
Implicit representation as an array with parent labels
arrayindex 0 1 2 3 4 5 6 7 parent array
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 20 / 35

Implementation of Union-by-height: – initialization
In union by height/rank, the cell in the parent array corresponding to the root element of any
disjoint tree/set, now stores the height of the tree coded as a negative number.􏰤 Initialization
Make_disjoint_set(a)
7 Make_disjoint_set(x) {
InitSet(N) {
for (a = 0 to N-1) {
parent[x] = -1
􏰤Note again, we are storing height + 1 as a negative number, because height = 0 cannot be coded as a negative number.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 21 / 35

Implementation of Union-by-height: – find(a)
Exactly same implementation as for union-by-size (see slide 17)
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 22 / 35

Implementation of Union-by-height: – union(a, b)
When executing union(a, b), ensure a and b are in two disjoint trees. Then link the root node of the shorter (in height) tree to the root of the taller tree. If both heights are equal, break the tie arbitrarily. Notice below how the logic of update to the height of the merged tree (stored at the root as a negative number) now is differs (compared to union-by-size).
union(a, b)
1 union(a,b) {
2 root_a = find(a) // find root of tree containing ‘a’
3 root_b = find(b) // find root of tree containing ‘b’
4 if (root_a == root_b) return // ‘a’ and ‘b’ in the same tree
6 height_a = -parent[root_a] // height of tree containing ‘a’
7 height_b = -parent[root_b] // height of tree containing ‘b’
8 if (height_a > height_b) {
9 parent[root_b] = root_a // link shorter tree’s root to taller
11 else if (height_b > height_a) {
12 parent[root_a] = root_b
14 else { // if (height_a == height_b)
15 parent[root_a] = root_b
16 parent[root_b] = -(height_b+1) // update to height
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 23 / 35

Union-by-rank is same as Union-by-height but with an additional optimization implemented during find(·)
Union-by-height with a rather simple optimization during find(·)operation yields union-by-rank.
This optimization, although simple and straight-forward, has a drastic impact on the amortized complexity of a disjoint-set data structure:
Amortized Complexity
Amortized analysis attempts to track the complexity of performing a sequence of operations on a particular data structure, rather than analysing just the worst-case complexity of a single operation.
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 24 / 35

Path compression during find(·)operation in union-by-rank
Path compression
When executing find(a), every node along the path from the ‘root’ node to ‘a’ has its parent index changed to point directly to the root.
Path compression is performed during find(·)operation.
It is independent of the strategy used for union(·, ·) operation.
But for subsequent discussion, assume we are using union(·, ·) using union by height (on slide #23).
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 25 / 35

Path compression illustration
Consider this example (completely different one from the one used before). Before path compression
Say we are running find(14) on the above state. The resultant path compressed tree will be:
(FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 26 / 35

Path compression based find(a) implementation
After finding the root of the tree containing ‘a’, change the parent pointer of all nodes along the
path to point directly to the root.
find(a) – implementing path compression
// find root of the tree containing ‘a’ if (parent[a] < 0) { // root is reached return parent[a] = find(parent[a]) (FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 27 / 35 Complexity analysis of Disjoint set data structure using union-by-size, union-by-height and union-by-rank. (FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 28 / 35 Analysis of union-by-size any disjoint-set tree constructed using union-by-size with a root node r , with its size being size( r ) and its height being height( r ), we have the following statement: size( r ) ≥ 2height( r ) Proof by Induction Base case: When r is the root of a tree containing just one node (i.e., itself): size( r )=1 height( r ) = 0 Substituting these values into the statement of the Lemma, the base case is true. (FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 29 / 35 Analysis of union-by-size any disjoint-set tree constructed using union-by-size with a root node r , with its size being size( r ) and its height being height( r ), we have the following statement: size( r ) ≥ 2height( r ) Proof by Induction Inductive case (cont’d): Assume the statement is true for all disjoint-trees in the data structure after a sequence of k union(·, ·) operations. In the (k + 1)-th union(·), let us say we are merging two disjointed trees, one rooted at s and other rooted at r Without loss of generality, let size( r ) ≥ size( s ) ) (FIT3155 S1 2022, Monash University) L04: Disjoint-set data structure 29 / 35 Analysis of union-by-size any disjoint-set tree constructed using union-by-size with a root node r , with its size being size( r ) and its height being height( r ), we have the following statement: size( r ) ≥ 2height( r ) Proof by Induction Inductive case (cont’d): Case 1: height( r ) > height( s )
After (k + 1)-th merge: new size( r )
But ‘new height( r )’ = ‘old height( r )

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com