CSCI 570 – Summer 2020 – HW 2
Deadline: July 17th
1 Graded Problems
1. Design a data structure that has the following properties (assume n elements in the data structure,
and that the data structure properties need to be preserved at the end of each operation):
• Find median takes O(1) time
• Extract-Median takes O(log n) time
• Insert takes O(log n) time
• Delete takes O(log n) time
Do the following:
(a) Describe how your data structure will work
(b) Give algorithms that implement the Extract-Median() and Insert() functions.
Hint : Read this only if you really need to. Your Data Structure should use a min-heap and a max-heap
simultaneously where half of the elements are in the max-heap and the other half are in the min-heap.
Solution: We use the dn/2e smallest elements to build a max-heap and use the remaining bn/2c
elements to build a min-heap. The median will be at the root of the max-heap and hence accessible
in time O(1) (we assume the case of even n, median is (n/2)th element when elements are sorted in
increasing order).
Insert() algorithm: For a new element x,
(a) Compare x to the current median (root of max-heap).
(b) If x < median, we insert x into the max-heap. Otherwise, we insert x into the min-heap. This
takes O(log n) time in the worst case.
(c) If size(maxHeap) > size(minHeap)+1, then we call Extract-Max() on max-heap and insert the
extracted value into the min-heap. This takes O(log n) time in the worst case.
(d) Also, if size(minHeap) >size(maxHeap), we call Extract-Min() on min-heap and insert the ex-
tracted value in to the max-heap. This takes O(log n) time in the worst case.
Extract-Median() algorithm: Run ExtractMax() on max-heap. If after extraction, size(maxHeap) <
size(minHeap) then execute Extract-Min() on the min-heap and insert the extracted value into the
max-heap. Again worst case time is O(log n).
2. There is a stream of integers that comes continuously to a small server. The job of the server is to
keep track of k largest numbers that it has seen so far. The server has the following restrictions:
1
(a) It can process only one number from the stream at a time, which means it takes a number from
the stream, processes it, finishes with that number and takes the next number from the stream.
It cannot take more than one number from the stream at a time due to memory restriction.
(b) It has enough memory to store up to k integers in a simple data structure (e.g. an array), and
some extra memory for computation (like comparison, etc.).
(c) The time complexity for processing one number must be better than Θ(k). Anything that is Θ(k)
or worse is not acceptable.
Design an algorithm on the server to perform its job with the requirements listed above.
Solution: Use a binary min-heap on the server.
(a) Do not wait until k numbers have arrived at the server to build the heap, otherwise you would
incur a time complexity of O(k). Instead, build the heap on-the-fly, i.e. as soon as a number
arrives, if the heap is not full, insert the number into the heap and execute Heapify(). The first
k numbers are obviously the k largest numbers that the server has seen.
(b) When a new number x arrives and the heap is full, compare x to the minimum number r in the
heap located at the root, which can be done in O(1) time. If x ≤ r, ignore x. Otherwise, run
Extract-min() and insert the new number x into the heap and call Heapify() to maintain the
structure of the heap.
(c) Both Extract-min() and Heapify() can be done in O(log k) time. Hence, the overall complexity is
O(log k).
3. When we have two sorted lists of numbers in non-descending order, and we need to merge them into
one sorted list, we can simply compare the first two elements of the lists, extract the smaller one and
attach it to the end of the new list, and repeat until one of the two original lists become empty, then
we attach the remaining numbers to the end of the new list and it’s done. This takes linear time. Now,
try to give an algorithm using O(n log k) time to merge k sorted lists (you can also assume that they
contain numbers in non-descending order) into one sorted list, where n is the total number of elements
in all the input lists. (Hint: Use a min-heap for k-way merging.)
Solt. Construct a min-heap of the k sublists, that is each sublist is a node in the constructed min-heap.
When comparing two sublists, compare their first elements (that is their minimum elements). The
creation of this min-heap will cost O(k) time.
Then we run the extract min algorithm and extract the minimum element from the root list. Then
update the root sublist and heapify the min-heap according to the new minimum element in the root
sublist. (If the root sublist becomes empty during this step, take any leaf sublist and make it the root
and then heapify). Each extraction takes O(log k) time.
Since we extract n elements in total, the running time is O(n log k + k) = O(n log k + k) (since
k < n).
4. Suppose you are given two sets A and B, each containing n positive integers. You can choose to reorder
each set however you like. After reordering, let ai be the i-th element of set A, and let bi be the i-th
element of set B. You then receive a payoff on
∏n
i=1 ai
bi . Give an algorithm that will maximize your
payoff. Prove that your algorithm maximizes the payoff,and state its running time.
Solt. Sort both A and B in the same order (either both ascending or both descending). This takes
O(nlogn). What’s important is that the largest element of A is matched with the largest of B, the
second-largest of each are matched, and so on (greedy solution).
For purposes of the proof, fix the order of A to be in ascending order, i.e., a1 ≤ a2 ≤ · · · ≤ an;
we will consider all solutions in terms of how their B arranged relative to this A. My solution has B
Page 2
sorted ascendingly.
For any arbitrary solution (could be an optimal solution): {bi}, where bi is the element of B matched
with ai in this solution. Suppose there exist an inversion bi > bj , for i > j. Now we prove that undoing
this inversion will make the result no less than the original one, and has the effect of multiplying the
original solution’s quantity by this value:
ai
bj × ajbi
aibi × ajbj
=
aj
bi−bj
ai
bi−bj
Because
aj
ai
≥ 1 and bi−bj > 0, the result above should be no less than 1, i.e., every inversion relative to
this solution can be removed without negatively affecting the quantity, which indicates the optimality
of the greedy solution.
5. Suppose you were to drive from USC to Santa Monica along I-10. Your gas tank, when full, holds
enough gas to go p miles, and you have a map that contains the information on the distances between
gas stations along the route. Let d1 < d2 < ... < dn be the locations of all the gas stations along the
route where di is the distance from USC to the gas station. We assume that the distance between
neighboring gas stations is at most p miles. Your goal is to make as few gas stops as possible along
the way. Give the most efficient algorithm to determine at which gas stations you should stop and
prove that your strategy yields an optimal solution. Give the time complexity of your algorithm as a
function of n.
Solution: The greedy strategy we adopt is to go as far as possible before stopping for gas. That is
when you are at the ith gas station, if you have enough gas to go the i+ 1th gas station, then skip the
ith gas station. Otherwise stop at the ith station and fill up the tank.
Let {g1, g2, . . . , gm} be the set of gas stations at which our algorithm made us refuel. We next prove
that our choice is optimal.
Let {h1, h2, . . . , hk} be an optimal solution.
Since it is not possible to get to the g1 + 1
th gas station without stopping, any solution should stop at
either g1 or a gas station before g1, thus h1 ≤ g1. If h1 < g1, then we can swap its first stop with g1
without changing the number of stops. The new solution we obtain {g1, h2, . . . , hk} is legal since when
leaving g1 we have at least as much fuel now as we had before swapping. Hence {g1, h2, . . . , hk} is an
optimal solution.
Assume that {g1, g2, . . . , gc−1, hc, . . . , hk} is an optimal solution (induction hypothesis). From the
greedy strategy taken by our algorithm, hc ≤ gc. If hc < gc, then by swapping gc and hc, we get
{g1, g2, . . . , gc−1, gc, hc+1, . . . , hk} which is indeed a legal solution. The legality follows from the same
reasoning as above. That is, when leaving gc we now have at least as much fuel as we did before swap-
ping and can hence reach the destination. Thus {g1, g2, . . . , gc, hc+1, . . . , hk} is an optimal solution.
By induction, it thus follows that {g1, g2, . . . , gm} is an optimal solution.
The running time is O(n) since we at most make one computation/decision at each gas station.
6. Consider the following modification to Dijkstra’s algorithm for single source shortest paths to make it
applicable to directed graphs with negative edge lengths. If the minimum edge length in the graph is
−w < 0, then add w + 1 to each edge length thereby making all the edge lengths positive. Now apply
Dijkstra’s algorithm starting from the source s and output the shortest paths to every other vertex.
Does this modification work? Either prove that it correctly finds the shortest path starting from s to
every vertex or give a counter example where it fails.
Page 3
Solution 1: No, the modification does not work. Consider the following directed graph with edge set
{(a, b), (b, c), (a, c)} all of whose edge weights are −1. The shortest path from a to c is {(a, b), (b, c)}
with path cost −2. In this case, w = 1 and if w + 1 = 2 is added to every edge length before running
Dijkstra’s algorithm starting from a, then the algorithm would output (a, c) as the shortest path from
a to c which is incorrect.
Solution 2: Another way to reason why the modification does not work is that if there were a directed
cycle in the graph (reachable from the chosen source) whose total weight is negative, then the shortest
path is not well defined since you could traverse the negative cycle multiple times and make the length
arbitrarily small. Under the modification, all edge lengths are positive and hence clearly the shortest
paths in the original graph are not preserved.
7. Solve Kleinberg and Tardos, Chapter 4, Exercise 3.
Solution: Assume the greedy algorithm currently in use fits boxes b1, b2, . . . , bj into the first k trucks.
We prove that no other algorithm can fit more boxes into k trucks, i.e., if an algorithm fits boxes
b1, b2, . . . , bi into the first k trucks, then i ≤ j. We prove this claim by induction on k:
• For k = 1: the greedy fits as many boxes into one truck as possible, it is clear that no other
algorithm can fix more boxes into the truck, thus, i ≤ j.
• Assume it holds for k − 1, i.e., if the greedy algorithm fits boxes b1, b2, . . . , bj into the first k − 1
trucks, and the other algorithm fits boxes b1, b2, . . . , bi into the first k − 1 trucks, then i ≤ j.
• For k: the greedy algorithm fits j′ boxes into the first k−1 trucks, the other algorithm fits i′ boxes
into the first k− 1 trucks, and i′ ≤ j′; now, for the k-th truck, the other algorithm packs in boxes
bi′+1, . . . , bi; since i
′ ≤ j′, the greedy algorithm is able at least to fit all the boxes bj′+1, . . . , bi
into the k-th truck, and it may be able to fit more
8. Solve Kleinberg and Tardos, Chapter 4, Exercise 5.
Solution:
(a) Sort the list of houses from west to east.
(b) Place a base station exactly 4 miles east of the first house in the list. Remove all the houses in
the list that are covered by the base station and then repeat step 2.
The algorithm terminates and every house is covered since we remove houses only after they are cov-
ered and at each iteration of step 2 at least one house is covered. The running time of the algorithm
is dominated by the sorting in step 1 and is hence O(n log n).
Let {s1, s2, . . . , sk} denote the list of base station locations as determined by our algorithm and let
{t1, t2, . . . , tm} denote the choice of locations made by an optimal solution. Assume that both the lists
are sorted in order from west to east.
We claim that for all i ∈ {1, 2, . . . ,m}, either ti = si or ti is to the west of si.
This is true when i = 1 for otherwise the western most house is not covered. Assume that the
claim is true for i = c− 1 (induction hypothesis).
All the houses to the west of sc−1 are covered by {s1, s2, . . . , sc−1} by the construction of our al-
gorithm. By the induction hypothesis, tc−1 is either sc−1 or tc−1 is further west of sc−1. Thus the first
house eastward of sc−1 is not covered by {t1, t2, . . . , tc−1}. This implies that either tc = sc or tc is to
the west of sc for otherwise the first house eastward of sc−1 is not covered by the optimal solution.
Our claim thus follows by induction.
Page 4
The claim implies in particular that the number of base station in our solution to the west of any
point is at most as many as the number of base stations in the optimal solution to the west of that
point. Thus k ≤ m and our solution is optimal.
2 Practice Problems
1. The police department in a city has made all streets one-way. The mayor contends that there is still
a way to drive legally from any intersection in the city to any other intersection, but the opposition is
not convinced. A computer program is needed to determine whether the mayor is right. However, the
city elections are coming up soon, and there is just enough a time to run a linear-time algorithm.
a Formulate this as a graph problem and design a linear-time algorithm. Explain why it can be
solved in linear time.
b Suppose it now turns out that the mayors original claim is false. She next makes the following
claim to supporters gathered in the Town Hall: ”If you start driving from the Town Hall (located
at an intersection), navigating one-way streets, then no matter where you reach, there is always a
way to drive legally back to the Town Hall.” Formulate this claim as a graph problem, and show
how it can also be varied in linear time
Solt.
(a) The mayor is merely contending that the graph of the city is a strongly connected graph, denoted
as G. Form a graph of the city (intersections become nodes, one-way streets become directed
edges). Suppose there are n nodes and m edges. The algorithm is:
i. Choose an arbitrary node s in G, and run BFS from s. If all the nodes are reached the BFS
tree rooted at s, then go to step ii, otherwise, the mayor’s statement must be wrong. This
operation takes time O(m+ n).
ii. Reverse the direction of all the edges to get the graph Ginv, this step takes time O(m+ n).
iii. Do BFS in Ginv starting from s. If all the nodes are reached, then the mayor’s statement is
correct; otherwise, the mayor’s statement is wrong. This step takes time O(m+ n).
Explanation: BFS on G tests if s can reach all other nodes; BFS on Ginv tests if all other nodes
reach s. Suppose you can reach node v by BFS on Ginv. It means there is a path from s to v in
the Ginv. Now by reversing all edges of sv path, you will get a path from v to s in the G. There
If these two conditions are not satisfied, the mayor’s statement is wrong obviously; if these two
conditions are satisfied, any node v can reach any node u by going through s.
(b) Now the mayor is contending that the intersections which are reachable from the city form a
strongly-connected component. Run the first step of the previous algorithm while setting the
Town Hall as s (test to see which nodes are reachable from Town Hall). Remove any nodes
which are unreachable from the town hall, and this is the component which the mayor is claiming
is strongly connected. Run the previous algorithm on this component to verify it is strongly
connected.
2. You are given a weighted directed graph G = (V,E,w) and the shortest path distances δ(s, u) from a
source vertex s to every other vertex in G. However, you are not given π(u) (the predecessor pointers).
With this information, give an algorithm to find a shortest path from s to a given vertex t in O(|V |+|E|)
Solution1:
(a) Starting from node t, go through the incoming edges to t one at a time to check if the condition
δ(s, t) = w(u, t) + δ(s, u) is satisfied for some (u, t) ∈ E.
(b) There must be at least one node u satisfying the above condition, and this node u must be a
predecessor of node t.
Page 5
(c) Repeat this check recursively, i.e. repeat the first step assuming node u was the destination
instead of node t to get the predecessor to node u, until node s is reached.
Complexity Analysis: Since each directed edge is checked at most once, the complexity is O(|V |+ |E|).
Note that constructing the adjacent list of Grev in order to facilitate access to the incoming edges is
needed, which is still bounded by O(|V |+ |E|) complexity.
Solution 2:Run a modified version of Dijkstra’s algorithm without using the priority queue. Specifi-
cally,
(a) Set S = {s}.
(b) While t /∈ S
i. For each node u satisfying u /∈ S, (v, u) ∈ E and v ∈ S, check if the condition δ(s, u) =
w(v, u) + δ(s, v) is true.
ii. If true, add u to S, and v is the predecessor of u.
iii. EndWhile
Complexity Analysis: Since each directed edge is checked at most once, the complexity is O(|V |+
|E|).
3. Solve Kleinberg and Tardos, Chapter 4, Exercise 4.
Solution: Let the two input sequences be S = (s1, s2, . . . , sn) and S
′ = (r1, r2, . . . , rm).
We propose the following greedy algorithm. Find the first event in S that is the same as r1. If
you cannot find such an event, output ”no” and terminate. Say si1 is the first event in S that is
the same as r1. Remove the first i1 events from S, that is S = (si1+1, ss1+2, . . . , sn). In the second
iteration, find the first event in S that is the same as r2. If you cannot find such an event, output ”no”
and terminate. Say si2 is the first event in S that is the same as s2. Set S = (si2+1, si2+2, . . . , sn) and
so on. If the algorithm runs successfully for m, iterations then output ”yes”.
Clearly, if the algorithm runs successfully for m iterations, then S′ is indeed a substring of S (since
(si1 , si2 , . . . , sim) = S
′) and thus and our algorithm is correct.
The harder direction is to prove that if S′ is a substring of S, then our algorithm indeed outputs
”yes”. We prove the following slightly stronger claim.
Claim: If (sk1 , sk2 , . . . , skm) = S
′, then sij = skj and ij ≤ kj for all j ∈ {1, 2, . . . ,m}
We prove the claim by induction on j. The case j = 1 follows from the first step of our algorithm.
Assume (induction hypothesis) that the claim holds for j = c − 1. The induction hypothesis implies
that the algorithm ran successfully for at least c− 1 steps. Since skc = rkc and kc > kc−1 ≥ ic−1, the
cth step of the algorithm ran successfully and found sic such that sic = skc . Further since ic is the
smallest index greater than ic−1 such that sic = ric , it is true that kc ≥ ic and the claim follows.
The running time of the algorithm is O(n + m) as we examine each element in the sequences at
most once.
4. Solve Kleinberg and Tardos, Chapter 4, Exercise 8.
Solution: Suppose by way of contradiction that T and T̂ are two distinct minimum spanning trees
for the graph. Since T and T̂ have the same number of edges but are not equal, there exists an edge
e ∈ T such that e /∈ T̂ . Adding e to T̂ results in a cycle C. As edge weights are distinct, take e′ to be
the uniquely most expensive edge in C. By the cycle property, no minimum spanning tree can contain
e′, contradicting the fact that the edge e′ is in atleast one of the trees T or T̂ .
Page 6
5. Solve Kleinberg and Tardos, Chapter 4, Exercise 22.
Solution: Consider the following counterexample. Let G = (V,E) be a weighted graph with vertex
set V = {v1, v2, v3, v4} and edges (v1, v2), (v2,
v3), (v3, v4), (v4, v1) of cost 2 each and an edge (v1, v3) of cost 1. Then every edge belongs to some
minimum spanning tree, but a spanning tree consisting of three edges of cost 2 would not be minimum.
Page 7