PowerPoint Presentation
Faculty of Information Technology,
Monash University
FIT2004: Algorithms and Data Structures
Week 10: Minimum Spanning Trees
These slides are prepared by M. A. Cheema and are based on the material developed by Arun Konagurthu and Lloyd Allison.
Announcements
Start preparing for the final exam
Listen to the lectures (or read slides)
Read Lecture Notes
Solve tutorial questions
Attempt past exam papers and MSTs (available on Moodle)
Most importantly, do not hesitate to seek help
FIT2004: Lec-10: Minimum Spanning Trees
Recommended reading
FIT2004: Lec-10: Minimum Spanning Trees
Lecture Notes: Chapters 14 and 15
Cormen et al. Introduction to Algorithms.
Chapter 23, Pages 624-638
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Graph/Undirected/
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Graph/DAG/
Outline
FIT2004: Lec-10: Minimum Spanning Trees
Introduction
Prim’s Algorithm
Kruskal’s Algorithm
What is a Spanning Tree
FIT2004: Lec-10: Minimum Spanning Trees
Tree:
A tree is a connected undirected graph with no cycles in it.
Spanning Tree:
A spanning tree of a general undirected weighted graph G is a tree that spans G (i.e., a tree that includes every vertex of G) and is a subgraph of G (i.e., every edge in the spanning tree belongs to G).
C
B
E
D
A
10
5
3
1
9
2
An undirected graph
Tree
C
B
E
D
10
3
1
3
A
Spanning
Spanning Tree Examples
FIT2004: Lec-10: Minimum Spanning Trees
C
B
E
D
A
10
5
3
1
9
2
C
B
E
D
A
5
3
1
2
C
B
E
D
A
5
3
9
2
3
Graph
Spanning Tree 1
Spanning Tree 2
C
B
E
D
10
3
1
3
A
Spanning Tree 3
What is a Spanning Tree
FIT2004: Lec-10: Minimum Spanning Trees
Tree:
A tree is a connected undirected graph with no cycles in it.
Spanning Tree:
A spanning tree of a general undirected weighted graph G is a tree that spans G (i.e., a tree that includes every vertex of G) and is a subgraph of G (i.e., every edge in the spanning tree belongs to G).
Is it true that a spanning tree of a connected graph G is a maximal set of edges of G that contains no cycles?
Yes
Is it true that a spanning tree of a connected graph G is a minimal set of edges that connect all vertices?
Yes
Minimum Spanning Tree (MST)
FIT2004: Lec-10: Minimum Spanning Trees
Weight of a spanning tree is the sum of the weights of the edges in the tree.
A Minimum spanning tree of a weighted general graph G is a tree that spans G, whose weight is minimum over all possible spanning trees for this graph.
There may be more than one minimum spanning tree for a graph G (e.g., two or more spanning trees with the same minimum weight).
What is the weight of the MST in this graph?
C
B
E
D
A
10
5
3
1
9
2
3
How many MSTs are in this graph?
Spanning Trees and MSTs
FIT2004: Lec-10: Minimum Spanning Trees
C
B
E
D
A
5
3
1
2
C
B
E
D
A
5
3
9
2
Spanning Tree 2: Weight 17
Spanning Tree 3: Weight 19
C
B
E
D
10
3
1
3
A
Spanning Tree 4: Weight 11
C
B
E
D
A
5
1
2
3
Spanning Tree 1: Weight 11
Minimum Spanning Tree
Minimum Spanning
Tree
MST Algorithms
FIT2004: Lec-10: Minimum Spanning Trees
Let M denote the MST we are constructing, initialized to be empty
An edge e is said to be safe if {M U e} is a subset of some MST
General Strategy:
M = null
while M can be grown safely:
find an edge e=
M = {M} union {
return M
We will study two greedy algorithms that follow this strategy
MST Algorithms
FIT2004: Lec-10: Minimum Spanning Trees
Prim’s Algorithm (very similar to Dijkstra’s Algorithm)
In fact, Dijkstra published his algorithm for both MST and shortest path in the same paper (1959)
The algorithm is also often called Prim-Dijkstra Algorithm
M is always a tree and, in each iteration, we choose the shortest edge connected to M avoiding cycles
Time complexity: O(E log V)
Kruskal’s Algorithm
We process edges in ascending order of edge weights and M is a forest (i.e., a set of trees)
Time complexity: O(E log V)
Outline
FIT2004: Lec-10: Minimum Spanning Trees
Introduction
Prim’s Algorithm
Kruskal’s Algorithm
Prim’s Algorithm: Overview
FIT2004: Lec-10: Minimum Spanning Trees
Start by picking any vertex v to be the root of the tree M.
While the tree M does not contain all vertices in the graph
Find shortest edge e connected to the growing subtree M that does not create a cycle
add e to the tree M
C
B
E
D
A
6
5
3
8
9
2
9
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
C
B
E
D
A
10
5
3
1
9
2
6
Q:
A B C D E
0 Inf Inf Inf inf
A B C D E
– – – – –
A B C D E
0 Inf Inf Inf Inf
Q is a priority queue, where priority is based on distance
Pred and Dist are the usual ID-indexed arrays
u:
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 Inf Inf Inf inf
B C D E
Inf Inf Inf Inf
Q is a priority queue, where priority is based on distance
Pred and Dist are the usual ID-indexed arrays
u:
A
A B C D E
– – – – –
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 Inf Inf Inf inf
B C D E
Inf Inf Inf Inf
For each neighbour v of u, try to update distance
5 < inf
Dist[C] = 5
Pred[C] = A
u:
A
A B C D E
- - - - -
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 Inf Inf Inf inf
B C D E
Inf Inf Inf Inf
For each neighbour v of u, try to update distance
This time we just care about the edge, not the distance to u + the edge
u:
A
A B C D E
- - - - -
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 Inf Inf Inf inf
B C D E
Inf Inf Inf Inf
For each neighbour v of u, try to update distance
W(A,C) = 5
Dist[C] = inf
5 < inf, so update
Dist[C] = 5
Pred[C] = A
u:
A
A B C D E
- - - - -
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 Inf 5 Inf inf
A B C D E
- - A - -
C B D E
5 Inf Inf Inf
Note that this changes the order of Q
u:
A
1
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 10 5 Inf inf
C B D E
5 10 Inf Inf
u:
A
Doing the same for B
0 + 10 < inf
Dist[B] = 10
Pred[B] = A
A B C D E
- - A - -
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 10 5 Inf inf
C B D E
5 10 Inf Inf
u:
A
Doing the same for B
0 + 10 < inf
Dist[B] = 10
Pred[B] = A
A B C D E
- A A - -
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
1
Q:
A B C D E
0 10 5 Inf inf
C B D E
5 10 Inf Inf
u:
A
Finished with A, so pop from Q
Notice that this will always be the vertex with the smallest dist
A B C D E
- A A - -
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
1
Q:
A B C D E
0 10 5 Inf inf
B D E
10 Inf Inf
u:
C
Finished with A, so pop from Q
Notice that this will always be the vertex with the smallest dist
This vertex is now in the MST
A B C D E
- A A - -
C
B
E
D
A
10
5
3
1
9
2
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 10 5 Inf inf
B D E
10 Inf Inf
u:
C
The edge we add is the edge between u and pred[u]
So in this case, A->C
A B C D E
– A A – –
C
B
E
D
A
10
5
3
9
2
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 10 5 Inf inf
B D E
10 Inf Inf
u:
C
The edge we add is the edge between u and pred[u]
So in this case, A->C
A B C D E
– A A – –
C
B
E
D
A
10
5
3
9
2
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
2
Q:
A B C D E
0 10 5 Inf inf
B D E
10 Inf Inf
u:
C
Update B from C
w(C, B) = 3
Dist[B] = 10
So update dist[B] and pred[B]
A B C D E
– A A – –
C
B
E
D
A
10
5
3
9
2
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 3 5 Inf inf
B D E
3 Inf Inf
u:
C
A B C D E
– C A – –
C
B
E
D
A
10
5
3
9
6
2
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 3 5 Inf inf
B D E
3 Inf Inf
u:
C
update E from C
A B C D E
– C A – –
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
Pred:
Dist:
Q:
A B C D E
0 3 5 Inf 2
u:
C
A B C D E
– C A – C
E B D
2 3 inf
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
C
Update D from C
Pred:
Dist:
Q:
A B C D E
0 3 5 Inf 2
A B C D E
– C A – C
E B D
2 3 inf
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
1
u:
C
Done with C
Pop another vertex from Q and add it to the MST
Pred:
Dist:
Q:
A B C D E
0 3 5 9 2
A B C D E
– C A C C
E B D
2 3 9
1
2
C
B
E
D
A
10
5
3
9
6
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
E
Pred:
Dist:
Q:
A B C D E
0 3 5 9 2
A B C D E
– C A C C
B D
3 9
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
E
Pred:
Dist:
Q:
A B C D E
0 3 5 6 2
A B C D E
– C A E C
B D
3 6
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
B
Pred:
Dist:
Q:
A B C D E
0 3 5 6 2
A B C D E
– C A E C
D
6
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
B
Pred:
Dist:
Q:
A B C D E
0 3 5 6 2
A B C D E
– C A E C
D
6
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
B
Pred:
Dist:
Q:
A B C D E
0 3 5 1 2
A B C D E
– C A B C
D
1
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
B
Pred:
Dist:
Q:
A B C D E
0 3 5 1 2
A B C D E
– C A B C
D
1
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
D
Pred:
Dist:
Q:
A B C D E
0 3 5 1 2
A B C D E
– C A B C
2
C
B
E
D
A
10
5
3
9
6
1
Prim’s Algorithm
FIT2004, Lec-8: Graphs and Shortest Path Algorithms
u:
D
Pred:
Dist:
Q:
A B C D E
0 3 5 1 2
A B C D E
– C A B C
2
C
B
E
D
A
5
3
1
Prim’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
Prim’s Algorithm: Complexity
FIT2004: Lec-10: Minimum Spanning Trees
It is very similar to Dijkstra’s Algorithm and its complexity is the same as Dijkstra’s Algorithm
O(V log V + E log V) if min-heap is used
Since the input graph G is connected, E ≥ V-1. Hence, the complexity can be simplified to O(E log V).
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Prim’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
T
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Prim’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
T
Green edges = M
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Prim’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Let e = (u,v) be the lightest edge that connects some v in T to some u not in T (i.e. this is the edge Prim’s algorithm will choose in this iteration)
If e is in M, then T ∪ {e} is a subset of M, which is an MST, so the invariant holds
T
e
u
v
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Prim’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Let e = (u,v) be the lightest edge that connects some v in T to some u not in T (i.e. this is the edge Prim’s algorithm will choose in this iteration)
If e is in M, then T ∪ {e} is a subset of M, which is an MST, so the invariant holds
T
Green edges = M
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Prim’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Let e = (u,v) be the lightest edge that connects some v in T to some u not in T (i.e. this is the edge Prim’s algorithm will choose in this iteration)
If e is in M, then T ∪ {e} is a subset of M, which is an MST, so the invariant holds
The interesting case is where e is not in M. In this case we have to show that there is some other MST which contains T ∪ {e}
T
e
u
v
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Prim’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Let e = (u,v) be the lightest edge that connects some v in T to some u not in T (i.e. this is the edge Prim’s algorithm will choose in this iteration)
If e is in M, then T ∪ {e} is a subset of M, which is an MST, so the invariant holds
The interesting case is where e is not in M. In this case we have to show that there is some other MST which contains T ∪ {e}
T
Green edges = M
e
u
v
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Since M is a tree, there is exactly one path from u to v in M (shown in blue)
u and v are not connected in T (since v is not in T). Consider the first edge on the blue path which is not contained in T (call this edge x).
T
Green edges = M
u
v
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Since M is a tree, there is exactly one path from u to v in M (shown in blue)
u and v are not connected in T (since v is not in T). Consider the first edge on the blue path which is not contained in T (call this edge x).
One vertex of this edge is in T, the other is not.
Removing this edge would disconnect M
T
Green edges = M
u
v
x
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Since M is a tree, there is exactly one path from u to v in M (shown in blue)
u and v are not connected in T (since v is not in T). Consider the first edge on the blue path which is not contained in T (call this edge x).
One vertex of this edge is in T, the other is not.
Removing this edge would disconnect M
T
Green edges = M
u
v
x
Prim’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Since M is a tree, there is exactly one path from u to v in M (shown in blue)
u and v are not connected in T (since v is not in T). Consider the first edge on the blue path which is not contained in T (call this edge x).
One vertex of this edge is in T, the other is not.
Removing this edge would disconnect M
T
Green edges = M
u
v
Adding the edge (u,v) would form a new spanning tree, M’
Since the algorithm always selects the lightest edge incident to T, we know that w(e) ≤ w(x)
So the weight of M’ is no greater than the weight of M, therefore choosing e is correct
e
x
Outline
FIT2004: Lec-10: Minimum Spanning Trees
Introduction
Prim’s Algorithm
Kruskal’s Algorithm
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
C
B
E
D
A
Sorted Edges:
Finalized (in MST):
Kruskals(G(V, E))
Sort the edges in ascending order of weights
Let T be a graph with V as its vertices, and no edges
For each edge (v, u) in ascending order
If adding (v,u) does not create a cycle in T
Add (v,u) to T
Return T
How to determine if the edge will create
a cycle???
BD,1
10
5
3
1
9
2
4
CD,9
CB,3
CE,2
ED,4
AC,5
AB,10
Finalized:
BD
CE
CB
AC
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
At some point in the algorithm, we want to add an edge
Can we add the red edge?
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
At some point in the algorithm, we want to add an edge
Can we add the red edge?
Can we define what makes an edge “okay?
Quiz time!
https://flux.qa – RFIBMB
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
At some point in the algorithm, we want to add an edge
Can we add the red edge?
Can we define what makes an edge “okay?
An edge can be added if it does not create a cycle
OR
An edge can be added if it is not between two vertices which belong to the same component
Importantly, adding an edge between two components “merges” those components into one new component.
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
3
4
0
8
6
5
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
3
4
0
8
6
5
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
0
4
0
8
6
5
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
0
4
0
8
6
5
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
0
4
0
6
6
5
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
0
4
0
6
6
5
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
0
4
0
6
6
4
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
0
4
0
6
6
4
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
0
0
0
6
6
0
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
1
0
0
0
6
6
0
7
2
9
Kruskal’s Algorithm
FIT2004: Lec-10: Minimum Spanning Trees
Kruskals(G(V, E))
Sort the edges in ascending order of weights
Let each vertex in V be given a unique ID
Let T be a graph with V as its vertices, and no edges
For each edge (v, u) in ascending order
#If adding (v,u) does not create a cycle in T
If set_lookup(v) != set_lookup(u)
Add (v,u) to T
union(set of u, set of v)
Return T
How can we quickly look up the set of a vertex, and also union two sets?
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Define two operations: find(u) and union(u,v)
Find(u): Given a vertex u, return its set ID
Union(u,v): Given two vertices u and v, if they have different set IDs, union the two sets they belong to (and update all the set IDs of the vertices in one of the sets)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
We need both find(u) and union(u,v) to be fast
If we just store the set ID of each vertex in an array, then find is O(1)
Union(u, v) requires us to loop through the whole array, looking for elements of find(u) and changing them to find(v), which is O(V)
Vertex ID 0 1 2 3 4 5 6 7 8 9
Set ID 0 0 0 7 6 5 6 7 5 9
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Vertex ID 0 1 2 3 4 5 6 7 8 9
Set ID 0 0 0 7 6 5 6 7 5 9
1
3
4
0
8
6
5
7
2
9
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Vertex ID 0 1 2 3 4 5 6 7 8 9
Set ID 0 0 0 7 6 5 6 7 5 9
1
3
4
0
8
6
5
7
2
9
Union(2, 6)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Vertex ID 0 1 2 3 4 5 6 7 8 9
Set ID 0 0 0 7 6 5 6 7 5 9
1
3
4
0
8
6
5
7
2
9
Union(2, 6)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Vertex ID 0 1 2 3 4 5 6 7 8 9
Set ID 0 0 0 7 6 5 6 7 5 9
1
3
4
0
8
6
5
7
2
9
Union(2, 6)
Need to find all 6 and change to 0… O(V)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
We want to union faster
Linked lists are fast to union (i.e. append one linked list to another)
0
3
1
4
2
ID: 0
ID: 4
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
We want to union faster
Linked lists are fast to union (i.e. append one linked list to another)
0
3
1
4
2
ID: 0
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
We want to union faster
Linked lists are fast to union (i.e. append one linked list to another)
How do we find, with linked lists?
0
3
1
4
2
ID: 0
Quiz time!
https://flux.qa – RFIBMB
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Linked lists are fast to union (i.e. append one linked list to another)
How do we find, with linked lists?
Heads know their set ID
Traverse to the head to find the ID of an element in the list
This is O(size of the linked list)
0
3
1
4
2
ID: 0
Find(4)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Linked lists are fast to union (i.e. append one linked list to another)
How do we find, with linked lists?
Heads know their set ID
Traverse to the head to find the ID of an element in the list
This is O(size of the linked list)
0
3
1
4
2
ID: 0
Find(4)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Linked lists are fast to union (i.e. append one linked list to another)
How do we find, with linked lists?
Heads know their set ID
Traverse to the head to find the ID of an element in the list
This is O(size of the linked list)
0
3
1
4
2
ID: 0
Find(4)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Linked lists are fast to union (i.e. append one linked list to another)
How do we find, with linked lists?
Heads know their set ID
Traverse to the head to find the ID of an element in the list
This is O(size of the linked list)
0
3
1
4
2
ID: 0
Find(4)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Linked lists are fast to union (i.e. append one linked list to another)
How do we find, with linked lists?
Heads know their set ID
Traverse to the head to find the ID of an element in the list
This is O(size of the linked list)
0
3
1
4
2
ID: 0
Find(4)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Alternatively, every node could know its ID
Now find is O(1)
But Union is now slower, we have to update all the IDs
0
3
1
4
2
ID: 0
ID: 4
ID: 0
ID: 0
ID: 4
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Alternatively, every node could know its ID
Now find is O(1)
But Union is now slower, we have to update all the IDs
ID: 0
ID: 0
ID: 4
0
3
1
4
2
ID: 0
ID: 4
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Alternatively, every node could know its ID
Now find is O(1)
But Union is now slower, we have to update all the IDs
ID: 0
ID: 0
ID: 0
0
3
1
4
2
ID: 0
ID: 4
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Alternatively, every node could know its ID
Now find is O(1)
But Union is now slower, we have to update all the IDs
ID: 0
ID: 0
ID: 0
0
3
1
4
2
ID: 0
ID: 0
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
Where are we?
We want find and union to be fast
Linked lists are an improvement, since they stop us looking at items which are not relevant to the union we are currently doing
Linked lists allows O(1) union
We can’t make find O(1) because to do that, we have to store the ID at every node which makes union slow (we have to change all the IDs)
Solution: Change from linked list to linked tree
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 1 2 3 4 5
Operations:
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 1 2 3 4 5
Operations:
Union(0,1)
Find(0) = 0
Find(1) = 1
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 3 4 5
Operations:
Union(0,1)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 3 4 5
Operations:
Union(0,1)
Union(2,3)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 3 4 5
Operations:
Union(0,1)
Union(2,3)
Find(2) = 2
Find(3) = 3
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union(0,1)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union(0,1)
Find(0) = 0
Find(1) = 0
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union(0,1)
Union(1,3)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 2 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union(0,1)
Union(1,3)
Find(1) = 0
Find(3) = 2
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 0 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union(0,1)
Union(1,3)
Find(1) = 0
Find(3) = 2
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 0 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union(0,1)
Union(1,3)
Find(1) = 0
Find(3) = 2
Find: traverse parent pointers until you find a vertex who is its own parent (i.e. a root). That vertex ID is the set ID
Union(u,v): If find(u) != find(v), set parent(u) = v (or vice versa)
So union is O(find). Find could in theory be O(V), but if we can keep the heights of the trees low, then it will be at most O(max height)
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent 0 0 0 2 4 5
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union(0,1)
Union(1,3)
Find(1) = 0
Find(3) = 2
Optimisation: When we union, we have a choice of which for the new root.
What should we choose?
The set with fewer nodes!
Union-Find Data Structure
FIT2004: Lec-10: Minimum Spanning Trees
0
1
2
3
4
5
Vertex ID 0 1 2 3 4 5
Parent -4 0 0 2 -1 -1
Operations:
Union(0,1)
Union(2,3)
Find(3)=2
Union(0,1)
Union(1,3)
Find(1) = 0
Find(3) = 2
Optimisation: When we union, we have a choice of which for the new root.
What should we choose?
The set with fewer nodes!
When doing a union, add the sizes and update the parent value of the root appropriately
Parent array values are now either parents OR –ve sizes
-1
-1
-4
Kruskal’s Algorithm: Complexity
FIT2004: Lec-10: Minimum Spanning Trees
Time Complexity:
Initialization: O(V)
Sorting edges: O(E log E)
E log E <= E log V2 = 2 E log V O(E log V)
For loop executes O(E) times
SET_ID() takes O(x) where x is height of the tree
UNION_SET() takes O(1) + 2 finds, so it takes O(x) where x is the height of the deeper of the two trees to be unioned (which could be at most V)
Total cost: O(EV)
But this is not tight as we assumed the cost of UNION_SETS to be O(V) for each call leading to overall cost of O(EV). A closer look reveals that the total cost of UNION_SETS is O(V log V)
Complexity of UNION_SETS
FIT2004: Lec-10: Minimum Spanning Trees
We can show that, when using the union-by-size rule, the number of elements N in any tree is at least , where h is the height of the tree
In other words,
Unioning takes 2 finds + O(1) effort
Find takes effort equal to the height of the tree, which is
So union is , where N is the height of the taller tree being unioned.
We need to do V-1 unions
Each one has a worst case cost of O(log(V)) (this is a significant over-estimation, but it makes the maths easy)
So the total cost of all unions is bounded by O(VlogV)
Kruskal’s Algorithm: Complexity
FIT2004: Lec-10: Minimum Spanning Trees
Time Complexity:
Sorting edges: O(E log E)
Initialization of union find: O(V)
E log E = E log V2 = 2 E log V O(E log V)
For loop executes O(E) times
The two finds take the same effort as the union, log(v)
UNION takes O(V log V) in total
Total cost: O(E log V + V log V) O(E log V)
Complexity of UNION_SETS
FIT2004: Lec-10: Minimum Spanning Trees
We can improve the disjoint sets data structure significantly with 2 other optimisations
Union by rank
Path compression
These are discussed in the notes, but are not examinable
The complexity can be improved from VlogV to Vα(V), where α denotes the inverse Ackermann function, an extremely slow growing function
Note: α(any number which can be represented using the matter in the universe) < 5, so Vα(V) is effectively O(V)
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Kruskal’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Kruskal’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Green edges = M
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Kruskal’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Let e = (u,v) be the lightest edge that connects two vertices in different components of T (i.e this is the edge Kruskal’s will choose in this iteration)
If e is in M, then T ∪ {e} is a subset of M, which is an MST, so the invariant holds
e
u
v
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Kruskal’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Let e = (u,v) be the lightest edge that connects two vertices in different components of T (i.e this is the edge Kruskal’s will choose in this iteration)
If e is in M, then T ∪ {e} is a subset of M, which is an MST, so the invariant holds
Green edges = M
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Kruskal’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Let e = (u,v) be the lightest edge that connects two vertices in different components of T (i.e this is the edge Kruskal’s will choose in this iteration)
If e is in M, then T ∪ {e} is a subset of M, which is an MST, so the invariant holds
The interesting case is where e is not in M. In this case we have to show that there is some other MST which contains T ∪ {e}
e
u
v
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
#INV: Every iteration of Kruskal’s algorithm, the current set of selected edges in T is a subset of some minimum spanning tree of G
Base Case:
The invariant is true initially when T is empty
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Assume T is a subset of some MST M
Let e = (u,v) be the lightest edge that connects two vertices in different components of T (i.e this is the edge Kruskal’s will choose in this iteration)
If e is in M, then T ∪ {e} is a subset of M, which is an MST, so the invariant holds
The interesting case is where e is not in M. In this case we have to show that there is some other MST which contains T ∪ {e}
Green edges = M
e
u
v
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Since M is a tree, there is exactly one path from u to v in M (shown in blue)
u and v are not connected in T (since we are adding (u,v), and we never create a cycle)
Consider the first edge on the blue path which is not contained in T (call this edge x).
Green edges = M
u
v
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Since M is a tree, there is exactly one path from u to v in M (shown in blue)
u and v are not connected in T (since we are adding (u,v), and we never create a cycle)
At least one edge on the blue path is not currently in T (otherwise u and v would be connected in T)
Removing this edge would disconnect M
Green edges = M
u
v
x
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Since M is a tree, there is exactly one path from u to v in M (shown in blue)
u and v are not connected in T (since we are adding (u,v), and we never create a cycle)
At least one edge on the blue path is not currently in T (otherwise u and v would be connected in T)
Removing this edge would disconnect M
Green edges = M
u
v
x
Kruskal’s Algorithm: Correctness
FIT2004: Lec-10: Minimum Spanning Trees
Inductive step:
We want to show that, if T is a subset of some MST at the start of some iteration, it is still a subset of some MST at the start of the next iteration
Since M is a tree, there is exactly one path from u to v in M (shown in blue)
u and v are not connected in T (since we are adding (u,v), and we never create a cycle)
At least one edge on the blue path is not currently in T (otherwise u and v would be connected in T)
Removing this edge would disconnect M
Removing this edge would disconnect M
Green edges = M
u
v
Adding the edge (u,v) would form a new spanning tree, M’
Since the algorithm always selects the lightest edge incident to T, we know that w(e) ≤ w(x)
So the weight of M’ is no greater than the weight of M, therefore choosing e is correct
e
x
Summary
FIT2004: Lec-10: Minimum Spanning Trees
Take home message
Prim’s Algorithm and Kruskal’s algorithm both are greedy algorithm that correctly determine minimum spanning trees.
Things to do (this list is not exhaustive)
Make sure you understand
the two algorithms especially how to implement Union-Find data structure for Kruskal’s algorithm
the proofs of correctness for each of the two algorithms
Start preparing for the final exam
Coming Up Next
Network Flow
/docProps/thumbnail.jpeg