Tutorial
1. Graph representations
(a) Adjacency lists:
A → B, C
B → A, C
C → A, B, D D→C
E→
Adjacency matrix:
ABCDE
A01100
B10100 C1 1 0 1 0
D00100 E00000
Sets of vertices and edges:
G = (V,E), where V = {A,B,C,D,E}, and
E = {{A,B},{A,C},{B,C},{C,D}} Degree is the number of edges connected to
a vertex. Node C has the highest degree (3).
(b)
Adjacency lists:
A → B, C B→A C→
D → A, C E→
Adjacency matrix:
ABCDE A0 1 1 0 0 B1 0 0 0 0
COMP20007 Design of Algorithms
Week 6 Workshop Solutions
2. Graph representations continued Note that in this question we assume that self-loops are not allowed.
Additionally, we could always check whether a graph is constant time if we know (in constant time) the number of edges. If the number of edges is exactly n = n(n − 1)/2 then the graph is complete,
2
otherwise it is not.
(a) Determining whether a graph is complete.
(i) In the adjacency list representation we just want to check that for each node u, every other node is in its adjacency list.
If we can check the size of the list in O(1) time then we just need to go through each vertex and check that the size of the list is n − 1, and thus this operation will be linear in the number of nodes: O(n).
If we have to do a scan through the adjacency list then this would take O(m), which for a complete graph will be asymptotically equivalent to O(n2).
(ii) In the adjacency matrix representation, a complete graph would contain 1’s in all position except for the diagonal (since we can not have self loops). So there will be n2 − n positions in the matrix to lookup, and thus this operation will take O(n2) time.
1
C0 0 0 0 0
D10100 E00000
Sets of vertices and edges:
G = (V,E), where V = {A,B,C,D,E}, and
E = {(A,B),(A,C),(B,A),(D,A),(D,C)}
For a directed graph, degree is separated into in-degree and out-degree: the number of edges going into or out of each vertex, re- spectively. Nodes A and C have the highest in-degree (2).
(iii) For the graph to be complete the set of edges must contain each pair of vertices. There will be n edges if the graph is indeed complete, so we could just check the size of the edge set
2
E. This would be a constant time operation: O(1).
However if we can’t just check the size of the set we’ll have to look through all edges which
will take O(m) time.
(b) Determining whether the graph has an isolated node.
We’ll have a think about how long it takes to determine if a single node is isolated, and then apply that to each vertex (i.e., n times)
(i) In the adjacency list representation, a node is isolated if its adjacency list is empty. This is O(1) to check per node, so O(n) for the whole graph.
(ii) In the adjacency matrix representation to check if a node is isolated we must look at each entry in that node’s row or column and confirm that all entries are 0. This is O(n) per node so O(n2) in total.
(iii) In the sets of vertices and edges representation we can loop through the set of edges and confirm that a vertex does not appear. This will take O(m) for a single node.
However we can also check a whole graph in a single pass by keeping track of all the nodes at once (in an array or a hash table for instance) and ticking them off as we see them, at the end we can iterate through this array/hash table to check if there were any isolated nodes. So this will take O(n + m) time.
3. Sparse and dense graphs (optional) Graphs with a constant number of edges per vertex (i.e., the degree of the vertices doesn’t grow with n) are sparse. Some examples of these are cycles and grid graphs:
Examples of dense graphs are complete graphs:
Real world examples of sparse graphs might arise from a graph representing the internet, and for dense graphs we could consider a network of cities, connected by all the possible aeroplane routes.
2
Storing sparse graphs using the various graph representations give rise to the following space com- plexities:
(i) To store an adjacency list we need to store one piece of data per edge. So the space complexity is O(m). Thus in a sparse graph the space complexity is O(n).
(ii) To store an adjacency matrix we need to store n2 pieces of information, regardless of m. So a sparse graph is still O(n2) space.
(iii) Sets of vertices and edges just require n items for the vertices and m items for the edges, so O(n + m). In a sparse graph this becomes O(n + n) = O(n).
4. Depth First Search and Breadth First Search
(i) Depth First: ABDEGFC The nodes visited are as follows. The node currently being visited is in red, the previously visited nodes are in blue and the nodes currently being considered are in solid black. Others are in gray.
ACF ACF ACF
BDBDBD
1. EG 4. EG 7. EG ACF ACF ACF
BDBDBD
2. EG 5. EG 8. EG
ACF ACF ACF BDBDBD
3. EG 6. EG 9. EG
(ii) Breadth First: ABCDFEG
ACF ACF ACF BDBDBD
1. EG 3. EG 5. EG ACF ACF ACF
BDBDBD
2. EG 4. EG 6. EG 3
7.
ACF BD
EG
5. Tree, Back, Forward and Cross Edges
looks like this:
For the directed version of this graph the DFS tree
AB
C
DE
The solid edges are tree edges. The dashed edge (A, D) is a forward edge as it is from A to one of its non-children descendants. The edge (E, A) is a back edge as it connects E to a non-parent ancestor. The remaining edges (i.e., (E, D), (B, A) and (B, C)) are all cross edges, as they connect vertices which are neither descendants nor ancestors of each other.
In the undirected version of this graph we get:1
A
B
C
D
E
We can see that only tree edges and back edges appear in the DFS forest for the undirected graph. In fact undirected graphs only have tree and back edges:
• Suppose we had a forward edge (x, y), i.e., y is a descendent of x. Since the graph is undirected, x is connected to y and visa versa. However we would have either visited y from x (making (x,y) a tree edge) or seen x while we’re visiting y before y is popped from the stack (making (y, x) a back edge). So (x, y) can not be a forward edge in an undirected graph.
• Suppose we have a cross edge (x, y), i.e., x is visited during some other part of the tree (i.e., y has already been visited and popped). This cannot arise in an undirected graph though, since
1Update: this has been updated to add a previously missing back edge from E to A 4
we would have visited x while we were visiting y since y connects to x and visa versa. Thus we can’t have a cross edge.
6. Finding Cycles First, looking at DFS – it turns out that an undirected graph is cyclic if and only if it contains a back edge. We change the exploration strategy to find back edges:
function Cyclic(⟨V, E⟩)
mark each node in V with 0 foreachvinV do
if v is marked 0 then
if DfsExplore(v) = T rue then
return True return False
function DfsExplore(v) mark v with 1
for each edge (v, w) do
if w is marked with 0 then if DfsExplore(w) then
For breadth first search however we get cycles when there exist cross edges. Similar alterations to the breadth first search algorithm could be made to check for cross edges.
Sometimes depth-first search finds a cycle faster, sometimes not. Below, on the left, is a case where depth-first search finds the cycle faster, before all the nodes have been visited. On the right is an example where breadth-first search finds the cycle faster.
AB AB
DC DC
7. 2-Colourability First, the (only possible, up to swapping colours) 2-colouring of this graph is: SA
EFB
DC
An undirected graph can be checked for two-colourability by performing a DFS traversal.
This begins by first assigning a colour of 0 (that is, no colour) to each vertex. Assume the two possible “colours” are 1 and 2.
return True return True
◃ a back edge was found
◃ w is v’s neighbour
◃ (v, w) already has w marked explored so is a back edge
else return False
5
Then traverse each vertex in the graph, colouring the vertex and then recursively colouring (via DFS) each neighbour with the opposite colour. If we encounter a vertex with the same colour as its sibling then a two-colouring is not possible.
In simpler terms, we’re just doing a DFS and assigning layers alternating colours: i.e., the first layer gets colour 1, then the second layer colour 2 etc. We know that we are not 2-colourable if we ever find a node which is adjacent to a node which has already been coloured the same colour.
function IsTwoColourable(G) let ⟨V,E⟩ = G
foreachvinV do
colour[v] ← 0
foreachvinV do
if colour[v] = 0 then
Dfs(v, 1) output True
function Dfs(v, currentColour)
colour[v] ← currentColour
for each node u in V adjacent to v do
if u is marked with currentColour then output False and exit
if u is marked with 0 then Dfs(u, 3 − currentColour)
As for 3-colourability and onwards, it turns out this is an NP-Complete problem, that is, it’s the hardest class of problem we know. In practice this means that we only have exponential time algorithms to compile such a property of a graph.
To understand why we can’t apply the same strategy for more than 2 colours we just need to think about the “choices” the algorithm needs to make at each step. For 2-colourability there is no choice, as a node has to be different to the node we’re coming from. In 3-colourability and onwards the algorithm would have to start trying mutliple different combinations – this gives rise to the need for exponential time algorithms.
8. Minimum Spanning Tree with Prim’s Algorithm Running Prim’s is almost the same as Dijkstra’s, making a greedy (locally optimal) decision at each time step and taking the lowest cost vertex from the priority queue. The main difference is we take the cost of the single edge which connects each new vertex to the tree, rather than a cumulative cost. Again, keeping track of the vertex we come from gives us an easy way to read off the minimum spanning tree from the table.
Node A0
B ∞ 2A 2A
C ∞ 4A 2D
D ∞ 1A
E ∞∞ 7D
F ∞ ∞ 8D
G ∞ ∞ 4D
2D
7D 7D 6G 6G 8D 5C 1G
4D 4D
Summing up the final entry in each row gives us the total cost of the minimum spanning tree: 16. To find the edges in the minimum spanning tree we reach the vertex from the end of each row (except for A), e.g., (B, A), (C, D), (D, A), (E, G), (F, G), (G, D).
6
A2B 1
C2D E 46
F1G
9. Single Source Shortest Path with Dijkstra’s Algorithm
A2B 4139 C2D7E 5846 F1G
The following table provides the values of the priority queue at each time step (each column corresponds to each time step). The vertices which have already been removed from the queue do not have entries in that column. The subscript for each distance in the table indicates the vertex from which the vertex in question was added to the priority queue from. This is useful for tracing back shortest paths.
First with E as the source:
Now with A as the source
So the shortest path from E to A is cost 8 and goes E → D → A. The shortest path from A to F is cost 6 and goes A → D → G → F.
Node
A ∞ ∞ ∞ 8D 8D
B ∞9E9E 9E
C ∞ ∞ ∞ 9D
D ∞ 7E 7E
E0
F ∞ ∞ 7G 7G G ∞ 6E
Node A0
B ∞ 2A 2A
C ∞ 4A
D ∞ 1A
E ∞∞
F ∞ ∞
G ∞ ∞
9E 9E
9D 9D 9D
3D 3D
8D 8D
9D 9D
5D 5D 5D
7
8D 8D 8D 8C 6G