1. For a connected graph G with distinct weights, initialize an edgeless forest, F , as V
connected components; one per vertex of G. Here are two ideas for transforming F
into a MST of G. Do they work? Why?
(a) arbitrarily choose two components C1, C2 of F that have at least one edge connect-
ing them in G, and add the lightest edge between C1 and C2 to F . Merge C1, C2, and
repeat until F is reduced to one component.
(b) simultaneously (in parallel) let every component select the lightest edge that con-
nects it to the set of vertices not in that component. If an edge is selected twice, just
count it once. After adding all such selected edges to F and merging components in
the standard way, repeat until F is reduced to one component.
See CLRS 23-4 for other similar proposed MST ideas.
Answer:
(a) doesn’t work. We might arbitrarily merge two components (vertices) that share
the heaviest edge, e, of G. If there is any other path between those two vertices, then
e is a bad choice. Be careful not to apply the Cut Lemma on arbitrary components.
It only works if you partition all vertices into two groups.
(b) works, and is called “Boruvka’s algorithm”, which you can look up online. It
preceded Kruskal’s algorithm but really works on the same principle ideas. We start
with just the vertices, and for any component we know (by the Cut Lemma) that we
can add the lightest edge connecting to the rest of the graph. So, essentially all I
was asking here is an explanation of the intuition behind Kruskal’s algorithm, with a
minor twist. In fact the intuition behind Prim’s algorithm is the same. All that really
changes is the order in which we add edges and grow components.
Note that if you allow non-distinct weights, then components could form a cycle by this
greedy strategy. For instance take three vertices forming a triangle, with all weights
equal. Without some way to tie-break, all three edges could be chosen in the first
iteration.
2. (a) Given a connected graph G with distinct weights, explain how to construct a
spanning tree T with maximum sum of weights. Mention the time complexity and
briefly explain why the algorithm is correct. If you rely on concepts covered in class,
you don’t need to prove their properties from scratch, instead just refer to them.
(b) Suppose that you have a road network, modeled as a graph, G. Every road (edge)
has a weight, that corresponds to the maximum weight that a truck may carry. You
have a fleet of trucks, and each one must make deliveries between a specific pair of
locations (vertices). You want to maximize the amount of weight that each truck
carries. Show that each truck should move along the maximum weight spanning tree,
T , of G.
In other words, show that for every two vertices x, y in G the following holds:
Every path from x to y in G contains an edge with a weight that is no greater than
the smallest weight in the path from x to y in T . That is, T maximizes the “weakest
connecting link”, for every pair of vertices.
Answer:
(a) You can run Kruskal but process weights in reverse order, or instead change the
sign of every weight and run Kruskal. The proof is identical to the regular one, based
on cuts. The time complexity is of course just that of Kruskal’s algorithm.
(b) Consider your favorite x and y in G. There is a unique path p between x and y
in T . Let e be the edge with minimum weight, w, among the edges in p. So, e is the
weakest link on p. Make a cut in G to partition its vertices according to which side of
e they are on in T . In other words this cut goes through e, but no other edge of T . It
may help to visualize e as horizontal, with a vertical cut going through it, a subtree of
T containing x on the left, and a subtree of T containing y on the right. Given that T
is a maximum spanning tree, any edge of G that crosses our cut must have weight less
than e, otherwise we would swap and get a tree with more weight.
Now suppose that there is some other path p′ from x to y in G, that has a minimum
edge weight w′ > w. In other words, suppose that there is an edge e′ that is the weakest
link on p′, and that e′ is not as weak as e. Notice that p′ and p need not be disjoint
(they may share some edges).
Clearly, e can’t be on p′, because that would contradict our hypothesis that e′ has
the minimum weight (w′) on p′. Thus to get from x to y, path p′ will have to cross
the cut that we made, without using e. So, suppose that p′ crosses the cut at edge
e′′, with weight w′′. We have already established that any edge crossing the cut must
have a weight less than w, thus w′′ < w. But this contradicts the assumption that the
minimum weight on p′ is greater than w.
3. Let G be a graph where all edge weights are integers in the range (1, . . . , k), for some
k = O(1). Show how fast can you compute the MST of G, by modifying Kruskal’s
algorithm and then by modifying Prim’s algorithm, without using any advanced data
structures.
Answer:
We can use Kruskal’s algorithm with more efficient sorting, given this context. Use
counting sort on the edge weights, in time O(E + k) = O(E). After that, the regular
analysis of the algorithm shows that the remaining work is O(V log V ). So all together
we get O(E + V log V ).
But it is better to use Prim’s algorithm and realize that all vertices will have scores
in the same range as the edges. So extracting a vertex of minimum score costs
O(k) = O(1) time, if our secondary data structure is a sorted array of linked lists,
holding the vertices. The array has size k. The max total size of the linked lists is
V . Decreasing the score of a vertex just means moving it from one list to another, so
this also takes constant time. Recall that we access the vertex directly using a pointer
from the primary structure of G, so we don’t have to search a linked list of vertices
that have equal scores. The overall result is that Prim’s algorithm will take O(V +E)
time.
4. Let V be a set of cities. For every pair of cities in V , there is a direct flight. You
start at your home city and want to visit every other city in V precisely once before
returning home. Among all routes that do this, there is some route C that uses the
shortest total flight time, T . Assume that the earth is flat (only for this problem) and
that flight time is proportional to distance covered.
(a) Design a route that may visit cities multiple times, can be computed in polynomial
time, and results in a total flight time of at most 2T . Prove that your route has this
property.
(b) Do the same as for part (a), but now you must visit each city precisely once.
Finally, if you have an algorithm that can find C in polynomial time for any set V , you
get an A in this course, and a million dollars if you sell me your idea (honest offer).
Answer:
Let G be a complete graph on V . Every pair of cities is linked with an edge, weighted by
distance. Finding route C is the Euclidean traveling salesman problem. It is NP-hard,
which among other things implies that there is no known polynomial-time algorithm
capable of solving all possible instances of this problem. There is a million dollar
reward for finding such a solution, plus other riches that would dwarf that amount.
(a) C is a cycle that spans V . By removing any of its edges we get a spanning path of
G, which also happens to qualify as a spanning tree, and is thus at least as costly as a
MST on G. Thus the MST will have a total weight that is at most T , and of course it
can be constructed in polynomial time. The only problem is, the MST is not a route.
So, traverse the MST (in-order) from your home city, tracing every edge twice (once
forwards and once by backtracking). This allows you to visit every city, and return
home, with total cost at most 2T . However, you will visit some cities more than once.
(b) We can still start with the MST, and then modify it. Traverse as in (a), but when
you reach a leaf in the MST, instead of backtracking, jump straight to the next city
that would have been visited for the first time. This direct jump costs less than any
backtracking path, because you are traveling using the shortest distance (by triangle
inequality). Continue the traversal. Every time you find a leaf, hop to the next un-
visited vertex instead of backtracking. This way we never visit a vertex more than
once, and the cost will be less than that of the pure in-order traversal of the MST,
i.e., less than 2T . The time complexity is linear for the traversal. Kruskal’s algorithm,
or Prim’s via priority queue gives O(E log V ) = O(V 2 log V ) time to construct the
MST. However, answering O(V 2) via Prim’s algorithm for dense graphs is better here,
because we are computing the MST on the complete graph.
FYI, for a complete graph of Euclidean distances, it is possible to compute the MST
in Θ(V log V ) time. See
http://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree
5. Let G be a graph for which all edge weights are greater than 1. For two vertices x, y
in G, we want to find a path from x to y that minimizes the product of edge-weights.
Show how to do this.
Answer: Just repeat the Dijkstra proof (using a cut), for a slightly modified algorithm
where we use a weight of 1 for the source, and multiply instead of adding when we relax.
The proof relies on weights being greater than 1, to guarantee that each greedy choice
that crosses a cut is optimal. Involving weights that are less than 1 in a multiplication
has the effect of reducing scores, just like involving negative weights when adding.
Another neat trick is to construct a copy G′ of the given graph G, where each edge
in G′ has a weight that is the logarithm of the corresponding weight in G. Because
each weight in G is greater than 1, its corresponding weight in G′ will be positive.
Now consider two paths, p, q in G, such that the product of weights,
∏
pi on p is
larger than the product of weights
∏
qj on q (i.e.,
∏
pi >
∏
qj). Then we also have
log(
∏
pi) > log(
∏
qj). By a standard log rule we get
∑
log pi >
∑
log qj. Thus the
sum of the weights on the corresponding path p′ in G′ is larger than that on q′. So
we just need to find the (standard) single-source shortest (weighted) path in G′, to
identify the answer in G. Because G′ has only positive weights, we can use Dijkstra.
6. An infectious disease starts at a node x in a weighted graph. The time it takes to move
along an edge is equal to the weight of the edge. Whenever it gets to an unprotected
node, that node gets infected and the disease keeps moving in parallel along all incident
edges. In the beginning, all nodes are unprotected. However, at some node y in the
graph, an antidote is released, at the same time as the disease. It moves along the
graph just as the disease does, and if it reaches an uninfected node, it protects that
node forever.
At any node that is reached simultaneously, the disease and the antidote neutralize
(forever), and neither gets to keep traveling along incident edges. However, the node
is left partially damaged.
Show how to determine whether more nodes will be (fully) infected or protected.
Answer: By context, the weights are positive. Recall that Dijkstra’s algorithm ini-
tializes scores of nodes to be infinity, except for the source which is zero. Then it
repeatedly extracts the vertex v with the minimum score, adds it to the SSSP tree,
and updates scores of neighbors of v by relaxing edges.
Consider running Dijkstra in parallel from x and y. Each version will have a current
minimum score. We always continue processing the version with the minimum score. If
one score is strictly smaller than the other, we get a clearly infected or protected node
accordingly, as long as it hasn’t already been marked. In this case, we mark the vertex
and update the structure that just got to that vertex, as in regular Dijkstra. Otherwise
(for instance when the disease finds a protected vertex) we extract but skip the relaxing
step. In other words, every vertex will get extracted once from each data structure,
but only the first of those two extractions will be followed by relaxing. Finally, both
structures might have the same min score. That might be because the same node is
reached simultaneously, or it could just be a coincidence. If it’s the same node, we
extract but do not add to either SSSP tree, and skip relaxing incident edges. If the
tying nodes are different, and assuming we are using a min-heap, we need to consider
that the root of each heap might have the same score as some other nodes below it. In
other words we can’t just infect the node at the top of one heap and protect the node at
the top of the other heap. Their corresponding clones might still have the same score
but be located further down. So we extract all nodes with the min score and sort them
by label, then remove duplicates and classify all remaining as infected or protected.
We can use linear-time sorting because the labels are easily indexed. Thus the time
complexity is still logarithmic per node and all together we get the time complexity of
Dijkstra, O(E log V ). Note that I arbitrarily chose to describe this using an adjacency
list representation.
A simpler variant is to use one heap and run Dijkstra once. We initialize x and y as
zero, and give each node a pair of values, representing its distance from x and y respec-
tively. We use the heap based on the minimum of the two values at each node. When
a node reaches the root, if its values are distinct, we label it as infected or protected
accordingly, and then decrease only one of the two values for each relaxed neighbor. If
both values of the root are equal, we delete it without updating neighbors.
7. Prove or disprove that Dijkstra’s algorithm can be modified to work on a graph with
negative edge weights, if we first scan all edges, find the minimum weight w, and add
|w| to the weight of every edge.
Answer: The problem here is that any path is “penalized” by w times the path length.
But Dijkstra and SSSP have nothing to do with path lengths. Even on graphs with
positive weights, we simply can’t add a value to every edge and expect the solution to
remain unchanged. Suppose the source has two paths to a particular target; one using
two edges of weight w, and the other with just one edge of weight 2w+1. The shortest
path is 2w, but if we add 2 to every edge then the shortest path is the single edge of
weight 2w+3 (instead of 2w+4).
8. Let G = {V,E} be a weighted directed graph with no negative cycles. However, G
contains precisely one negative edge e, from vertex x to vertex y.
Given a source, s, show how to solve SSSP in O(E log V ) time.
Answer: The least-cost path p from s to any particular vertex v will either use e or
not. We don’t know which case holds in advance, so we compare both scenarios. In
the latter case, we would get p by running Dijkstra on the graph G−e, which is just
G with e removed. In the former case we would get p by somehow forcing the use of
e. This can be done by computing the least-cost path from s to x in G−e, adding the
weight of e, and then using y as a source to find the least-cost path to v in G−e.
Using Dijkstra, we get all of this information not only for an arbitrary v, but for all
vertices. Then for every vertex, we compare the outcome of each of the two trials, and
select the minimum.
Note that it’s also valid to run the second scenario on G instead of G−e, because the
least-cost path from s to x, as well as the least-cost path from y to v, cannot use e.
That is because there are no negative cycles.