Disjoint sets
Disjoint set ADT
• Maintainsacollection!={S1,…,Sk}ofdisjointsets
• Eachsetisidentifiedbyarepresentative,whichis an element of the set
• Operations:
– MAKE-SET(x): creates a new set containing only x,
and makes x the representative
– FIND-SET(x): returns the representative of x’s set
– UNION(x, y): merges the sets containing x and y, and chooses a new representative
• Note:Noduplicateelementsareallowed!
Disjoint set application
• Example: Determine whether two nodes are in the same connected component of an undirected graph
• Connected component: a maximal subgraph such that any two vertices are connected to each other by a path
Disjoint sets for connected components
fg cehi
• How do you use disjoint sets to solve this problem?
abd
Disjoint sets for connected components Connected-Components(G):
for each vertex v ∊ V[G] do MAKE-SET(v)
for each edge (u,v) ∊ E[G] do
if FIND-SET(u) $ FIND_SET(v) then UNION(u,v)
Disjoint sets for connected components Connected components:
fg cehi
Process the edges:
(a,b) (f,g) (g,i) (d,e) (c,b) (a,c) (f,h) (h,g)
abd
Disjoint sets for connected components
Same-Component(u,v):
if FIND-SET(u) = FIND-SET(v) then
return True else
return False
Linked list implementation of Disjoint Sets
Implementing a single set nil
• The representative of the set = the first element in the list
• Other elements may appear in any order in the list
c
h
e
head tail
Implementing a single set nil
• A node contains pointers to: – The next element
– Its representative
• + each set has pointer to head and tail of its list
c
h
e
head tail
Implementing the data structure
che igf
head tail
nil head nil tail
adb
nil
head tail
• Collection of several sets, each a linked list
• How do we do FIND-SET(h)?
– Do we have to search through every list?
Implementing the data structure
nil
• In practice, we rename the elements to 1..n, and maintain an array A where A[i] points to the list element that represents i.
• Now, how do we do FIND-SET(3)?
3 (c)
1 (h)
A
1234
4 (e)
2 (i)
head tail
head tail
Implementing the data structure
nil
• Harder question: how about FIND-SET(e)?
– When you rename h->1, i->2, c->3, e->4 you store these
mappings in a dictionary D.
– Later, you can call D.get(e) to retrieve the value 4.
– So, you call FIND-SET(D(e)), which becomes FIND-SET(4).
3 (c)
1 (h)
A
1234
4 (e)
2 (i)
head tail
head tail
Naïve implementation of Union(u,v)
• Append v’s list onto the end of u’s list:
– Change u’s tail pointer to the tail of v’s list = %(1)
– Update representative pointers for all elements in the v’s list = %(|v’s list|)
• Can be a long time if |v’s list| is large! • In fact, n-1 Unions can take %(n2)
c
u’s list
v’s list
h
e
nil
nil
i
g
f
head tail
head tail
Weighted-union heuristic for Union(u,v)
• SimilartothenaïveUnionbutusesthefollowing rule/heuristic for joining lists:
• Appendthesmallerlistontothelongerone(and break ties arbitrarily)
• Does this help us do better than O(n2)?
• Worst-casetimeforasingleUnion(u,v)–NO
• Worst-casetimeforasequenceofnUnion operations – YES
Weighted-union running time analysis
• We will analyze the running times of disjoint- set data structures in terms of two parameters:
– n = the number UNION operations
– m = the number of FIND-SET operations
Weighted-union running time analysis
• Theorem:
– Suppose a disjoint set implemented using linked- lists and the weighted-union heuristic initially contains n singleton sets.
– Performing a sequence of n UNIONs and m FIND- SETs takes O(m + n lg n) time.
• Compare: for the naïve Union implementation, n UNIONs and m FIND-SETs takes O(m + n2) time.
Weighted-union running time analysis • Let’s prove the easy part first
• FIND-SET operations:
– each FIND-SET operations takes O(1) time – so m FIND-SET operations takes O(m) time
Weighted-union running time analysis • Now the harder part – UNION operations:
• What takes time in a UNION operation? – Update head and tail pointers, a single next
pointer, and a bunch of representative pointers. – Representative pointers take time.
– Everything else is O(1).
• How many times can an element’s representative pointer be updated?
Weighted-union running time analysis
• Fixanelementx.
• IfxisinasetSanditsrepresentativepointer changes, then S is being attached to another set with size at least |S|.
• Aftertheunion,x’ssetcontainsatleast2|S| elements.
– Initially, x’s set contains 1 element (itself).
– After x’s set is UNIONed once, it has size at least 2.
– After x’s set is UNIONed twice, it has size at least 4.
– After x’s set is UNIONed thrice, it has size at least 8. –…
– After x’s set is UNIONed k times, it has size at least 2k.
Weighted-union running time analysis
⇨ The total update time for all n elements is • After x’s representative pointer has been
⇨
O(n lg n) k updated k times the new set has at least 2
members
*Updating the head and tail pointers takes %(1) per
• Since the largest set has at most n members, operation, thus total time to update the pointers
over at most n UNION operations is we have 2k ≤ n
%
2k ≤ n
k ≤ ⌈lg n⌉
• ⇒ x’s representative is updated at most k = ⌈lg n⌉ times
%(n) apply log2
Weighted-union running time analysis
• Summary:
– m FIND-SET operations take O(m) – n UNION operations take O(n lg n)
⇒ The total time of n UNIONs and m FIND-SET operations is O(m + n log n)