COMP251: Topological Sort & Strongly Connected Components
Jérôme Waldispühl School of Computer Science
McGill University
Based on (Cormen et al., 2002) Based on slides from D. Plaisted (UNC)
Recap: Breadth-first Search • Input: Graph G = (V, E), either directed or undirected,
and source vertex s Î V. • Output:
– d[v] = distance (smallest # of edges, or shortest path) from s to v, for all v Î V. d[v] = ¥ if v is not reachable from s.
– p[v] = u such that (u, v) is last edge on shortest path s v. • u is v’s predecessor.
– Builds breadth-first tree with root s that contains all reachable vertices.
Recap: BFS Example
rs
tu
1023
21
23
vwxy
Recap: Depth-first Search
• Input:G=(V,E),directedorundirected.Nosourcevertex given.
• Output:
– 2timestampsoneachvertex.Integersbetween1and2|V|.
• d[v] = discovery time (v turns from white to gray) • f [v] = finishing time (v turns from gray to black)
– p[v] : predecessor of v = u, such that v was discovered during the scan of u’s adjacency list.
• UsesthesamecoloringschemeforverticesasBFS.
Recap: DFS Example
uv
w
1/8
F
2/7 9/12
BC
4/5 3/6
10/11
B
Starting time d(x)
xyz
Finishing time f(x)
Recap: Parenthesis Theorem
Theorem 1:
For all u, v, exactly one of the following holds:
1. d[u] < f [u] < d[v] < f [v] or d[v] < f [v] < d[u] < f [u] and neither u nor
v is a descendant of the other.
2. d[u] < d[v] < f [v] < f [u] and v is a descendant of u. 3. d[v] < d[u] < f [u] < f [v] and u is a descendant of v.
w So d[u] < d[v] < f [u] < f [v] cannot happen. w Like parentheses:
OK: ( { } ) [ ] Not OK: ( { ) } 123456 1234
Corollary
v is a proper descendant of u if and only if d[u] < d[v] < f [v] < f [u].
White-path Theorem
Theorem 2
v is a descendant of u if and only if at time d[u], there is a path u v consisting of only white vertices (Except for u, which was just colored gray).
Example (white-path theorem)
uv
1/
w
xyz
v, y, and x are descendants of u.
Forward edge
uv
1/8 2/7
BC
4/5 3/6
w
9/12
B
Cross edge
Tree edge
Edge classification with DFS
F
xyz
Back edge
10/11
Classification of Edges
• Treeedge:inthedepth-firstforest.Foundbyexploring
(u, v).
• Backedge:(u,v),whereuisadescendantofv(inthe
depth-first tree).
• Forwardedge:(u,v),wherevisadescendantofu,but not a tree edge.
• Crossedge:anyotheredge.Cangobetweenverticesin same depth-first tree or in different depth-first trees.
Theorem 3
In DFS of a connected undirected graph, we get only tree and back edges. No forward or cross edges.
Identification of Edges
• Edge type for edge (u, v) can be identified when it is first explored by DFS.
• Identification is based on the color of v. – White – tree edge.
– Gray – back edge.
– Black – forward or cross edge.
Directed Acyclic Graph
• DAG–Directedgraphwithnocycles.
• Goodformodelingprocessesandstructuresthat have a partial order:
– a>bandb>cÞa>c.
– Butmayhaveaandbsuchthatneithera>bnorb>a.
• Canalwaysmakeatotalorder(eithera>borb>a for all a 1 b) from a partial order.
Example
DAG of dependencies for putting on goalie equipment.
socks
shorts
T-shirt chest pad
sweater mask
batting glove
hose
pants skates
leg pads catch glove blocker
Characterizing a DAG
Proof:
• (Þ) Show that back edge Þ cycle.
– Suppose there is a back edge (u, v). Then v is ancestor
of u in depth-first forest.
– Therefore, there is a path v u, so v u v is a
Lemma 1
A directed graph G is acyclic iff a DFS of G yields no back edges.
cycle.
vTTTu B
Characterizing a DAG
Proof (Contd.):
• (Ü ) Show that a cycle implies a back edge.
– c : cycle in G; v : first vertex discovered in c;
(u, v) : preceding edge in c.
– At time d[v], vertices of c form a white path v u.
– By white-path theorem, u is a descendent of v in depth-first forest.
– Therefore, (u, v) is a back edge. vTTTu
Lemma 1
A directed graph G is acyclic iff a DFS of G yields no back edges.
B
Topological Sort
Want to “sort” a directed acyclic graph (DAG).
ABD CE
ABCDE Think of original DAG as a partial order.
Want a total order that extends this partial order.
Topological Sort
• Performed on a DAG.
• Linear ordering of the vertices of G such that if (u, v) Î E, then u appears somewhere before v.
Topological-Sort (G)
1. call DFS(G) to compute finishing times f [v] for all v Î V
2. as each vertex is finished, insert it onto the front of a linked list
3. return the linked list of vertices
Time: Q(V + E).
Example 1
AB
D
1/
C
Linked List:
E
Example 1
AB
C
Linked List:
D
1/ 2/
E
Example 1
AB
D
1/ 2/3
E
C
Linked List:
2/3
E
Example 1
AB
D
1/4 2/3
E
C
Linked List:
1/4 2/3
E
D
Example 1
AB
C
Linked List:
D
5/
2/3
E
1/4 2/3
E
1/4
D
Example 1
AB
C
Linked List:
1/4 2/3
E
D
5/
6/ 2/3
1/4
E
D
6/7
1/4 2/3
Example 1
AB
6/7
C
Linked List:
5/
D
1/4 2/3
E
CD
E
Example 1
AB
6/7
C
Linked List:
5/8
D
1/4 2/3
E
5/8
6/7
1/4 2/3
BCD
E
Example 1
AB
9/ 6/7
5/8
D
1/4 2/3
E
C
Linked List:
5/8
6/7
1/4 2/3
BCD
E
Example 1
AB
9/10 6/7
C
5/8
D
1/4 2/3
E
Linked List:
1/4 2/3
6/7
ABCD
9/10
5/8
E
25/26 15/24 socks shorts
16/23 hose
17/22 18/21
7/14 T-shirt
chest pad 9/12
sweater mask
catch glove 2/5 blocker
1/6 batting glove
26 socks
24 shorts
23 hose
22 pants
21 skates
20 leg pads 14 t-shirt
13 chest pad 12 sweater 11 mask
6 batting glove 5 catch glove 4 blocker
Example 2
8/13
pants skates
10/11
19/20 leg pads
3/4
Correctness (1)
“Linear ordering of the vertices of G such that if (u, v) Î E, then u
appears somewhere before v.”
⇒ We need to show if (u, v) Î E, then f [v] < f [u].
When we explore (u, v), what are the colors of u and v? Assume we just discovered u, which is thus gray. Then, what are the possible colors of v ?
– Can v be gray?
– Can v be white? – Can v be black?
Correctness (2)
When we explore (u, v), what are the colors of u and v? – Assume u is gray.
– Is v gray, too?
No, because then v would be ancestor of u.
Þ (u, v) is a back edge.
Þ contradiction of Lemma 1 (DAG has no back edges).
– Is v white?
• Then becomes descendant of u.
• By parenthesis theorem, d[u] < d[v] < f [v] < f [u].
– Is v black?
• Then v is already finished.
• Since we are exploring (u, v), we have not yet finished u. • Therefore, f [v] < f [u].
Strongly Connected Components • G is strongly connected if every pair (u, v) of
vertices in G is reachable from one another.
• A strongly connected component (SCC) of G is a maximal set of vertices C Í V such that for all u, v Î C, both u v and v u exist.
Component Graph
• GSCC = (VSCC, ESCC).
• VSCC has one vertex for each SCC in G.
• ESCC has an edge if there is an edge between the corresponding SCC’s in G.
Example:
GSCC is a DAG
Lemma 2
LetCandC¢ bedistinctSCC’sinG,letu,vÎC&u¢,v¢ÎC¢,and suppose there is a path u u¢ in G. Then there cannot also be a path v¢ v in G.
Proof:
• Suppose there is a path v¢ v in G.
• Thentherearepathsu u¢ v¢andv¢ v uinG.
• Therefore,uandv¢arereachablefromeachother,so they are not in separate SCC’s.
Transpose of a Directed Graph
• GT = transpose of directed G. –GT =(V,ET),ET ={(u,v):(v,u)ÎE}. – GT is G with all edges reversed.
• Can create GT in Θ(V + E) time if using adjacency lists.
• G and GT have the same SCC’s. (u and v are reachable from each other in G if and only if reachable from each other in GT.)
Algorithm to determine SCCs
SCC(G)
1. call DFS(G) to compute finishing times f [u] for all u
2. compute GT
3. call DFS(GT), but in the main loop, consider vertices in order
of decreasing f [u] (as computed in first DFS)
4. output the vertices in each tree of the depth-first forest formed in second DFS as a separate SCC
Time: Q(V + E).
G
Example
abcd
ef
g
h
G
Example
abcd
13/14
11/16 1/10
8/9
12/15
e
3/4 2/7
5/6
h
f
g
After the first DFS. We computed all finishing times in G.
GT
13/14
11/16 1/10
8/9
Example
abcd
12/15 3/4 2/7 efgh
5/6
Then, we compute the transpose GT of G and sort the vertices with the finishing time calculated in G.
GT
Example
abcd
efg
h
(b (a (e e) a) b) (c (d d) c) (g (f f) g) (h)
• Idea:
How does it work?
– By considering vertices in second DFS in decreasing order of finishing times from first DFS, we are visiting vertices of the component graph in topologically sorted order.
– Because we are running DFS on GT, we will not be visiting any v from a u, where v and u are in different components.
• Notation:
– d[u] and f [u] always refer to first DFS.
– Extend notation for d and f to sets of vertices U Í V: – d(U) = minuÎU{d[u]} (earliest discovery time)
– f (U) = maxuÎU{ f [u]} (latest finishing time)
SCCs and DFS finishing times
Lemma 3
Let C and C¢ be distinct SCC’s in G = (V, E). Suppose there is an edge (u, v) Î E such that u Î C and v ÎC¢. Then f (C) > f (C¢).
Proof:
• Case 1: d(C) < d(C¢)
– Letxbethefirstvertexdiscovered
in C.
C
u x
C¢
v
– Attimed[x],allverticesinCandC¢ are white. Thus, there exist paths of white vertices from x to all vertices in C and C¢.
– Bythewhite-paththeorem,all vertices in C and C¢ are descendants of x in depth-first tree.
– Bytheparenthesistheorem, f [x] = f (C) > f(C¢).
SCCs and DFS finishing times
Lemma 3
Let C and C¢ be distinct SCC’s in G = (V, E). Suppose there is an edge (u, v) Î E such that u Î C and v ÎC¢. Then f (C) > f (C¢).
Proof:
• Case 2: d(C) > d(C¢)
– Let y be the first vertex discovered in C¢.
– At d[y], all vertices in C¢ are white and there is a white path from y to each vertex in C¢ Þ all vertices in C¢ become descendants of y. Again, f [y] = f (C¢).
C
C¢
– At d[y], all vertices in C are also white.
– Bylemma2,sincethereisanedge(u,v),we
u
v
cannot have a path from C¢ to C.
x
– So no vertex in C is reachable from y.
– Therefore, at time f [y], all vertices in C are still white.
– Therefore, for all w Î C, f [w] > f [y], which implies that f (C) > f (C¢).
y
SCCs and DFS finishing times
Proof:
• (u,v)ÎETÞ(v,u)ÎE.
• Since SCC’s of G and GT are the same, f(C¢) > f (C), by Lemma 3.
Corollary 1
Let C and C¢ be distinct SCC’s in G = (V, E). Suppose there is an edge (u, v) Î ET, where u Î C and v Î C¢. Then f(C) < f(C¢).
Correctness of SCC
1) At beginning, DFS visit only vertices in the first SCC
• WhenwedothesecondDFS,onGT,startwithSCCC
such that f(C) is maximum.
• ThesecondDFSstartsfromsomexÎC,anditvisits
all vertices in C.
• Corollary1saysthatsincef(C)>f(C¢)forallC1C¢,
there are no edges from C to C¢ in GT.
• Therefore,DFSwillvisitonlyverticesinC.
• Whichmeansthatthedepth-firsttreerootedatx contains exactly the vertices of C.
Correctness of SCC
2) DFS doesn’t visit more than one new SCC at the time
• ThenextrootinthesecondDFSisinSCCC¢suchthat f (C¢) is maximum over all SCC’s other than C.
– DFS visits all vertices in C¢, but the only edges out of C¢ go to C, which we have already visited.
– Therefore, the only tree edges will be to vertices in C¢.
• Iteratetheprocess.
• Eachtimewechoosearoot,itcanreachonly:
– vertices in its SCC—get tree edges to these,
– vertices in SCC’s already visited in second DFS—get no tree edges to these.