Algorithms and Data, Summer 1 2016 Problem Set 2 answers
Part I
1. The optimal strategy here is the following:
weightInCurrentBox = 0
For i = 1 to N // one-indexing
If weightInCurrentBox + wi <= W Add item i to current box weightInCurrentBox += wi
else
Ship the current box Add item i to a new box weightInCurrentBox = wi
If there are any items in the final box, ship it
We can argue that this works with a “greedy stays ahead” argument. For shipping one item, obviously we put it in the empty box and ship that, so greedy is optimal for N=1. Then, if greedy has been optimal so far, we argue that it is always correct to put an item that fits in the current box instead of starting a new one.
Suppose not; suppose an optimal solution and the greedy solution disagree about the current item. This has to be because we want to place the item in the current box, and the optimal wants to start a new box. (They can’t disagree the other way — if optimal wants to place the item in the current box, that means there’s room, so we want that, too.) Whatever an optimal algorithm does with all the remaining boxes, the greedy algorithm could do that as well, because it will have strictly more room — it could use exactly that many boxes, and in addition has wi extra weight to play with because the current item isn’t in those boxes. That means our greedy algorithm remains at least as good as the optimal at step i+1, which means it remains at least as good as the optimal for the rest of the algorithm.
2. The algorithm here is the following. Assume without loss of generality that the perimeter is a line going from left to right. Let Wl = {wl1, wl2, ... wlL} be the x-coordinates of the left-hand walls of each of the L lanes, and let Wr = {wr1, wr2, ... wrL} be the x-coordinates of the right-hand walls of the lanes.
maxCovered = wl1 // Up to the leftmost wall doesn’t need to be covered
c = 1 // currentLane, shortened because we’ll use it in subscripts While maxCovered < wrL
Place a tower at min(maxCovered + r, wrc) maxCovered = min(maxCovered + 2r, wrc) if maxCovered >= wrc && c < L
// New lane
c++
maxCovered = wlc
In other words, we greedily assign watchtowers so that we place them r from the leftmost uncovered place we need to watch, or closer if there’s a wall closer than that.
The argument for this being optimal is another greedy stays ahead argument, although it also looks a little like an exchange argument. Suppose there is some optimal solution. It must cover the leftmost point of the leftmost lane somehow. We could push that watchtower to the right until its radius barely touches the leftmost lane’s left wall, and now that watchtower matches the placement of one of our watchtowers, and it will be at least as good as the original placement; it’s covering all of the original points and more. So for one tower solutions, our algorithm is as good as the optimal.
Now consider the case where k towers previously have all been shown to be as good as the optimal placement, through the argument above. Of the remaining watchtowers in the optimal solution, pick the leftmost one in this lane. The first point in this lane not covered by the k watchtowers must be covered somehow. If the leftmost of the remaining towers leaves this point uncovered with the new placements of the other towers, the optimal solution never covered this point, because we only moved the earlier towers right instead of left and covered everything in that region. So that’s an impossibility, and the leftmost tower in the optimal solution must cover this first uncovered point; and we again lose nothing by shifting this tower to the right until its radius barely touches the radii of the other moved towers, again matching our solution. This argument holds for the whole lane, so lane-by-lane, our approach is optimal, and it’s therefore optimal for the whole problem.
3. The strategy is just to “unroll” the 24-hour cycle for each request, so it treats its start time as the start of the day and the end of the day.
Sort the requests by finishing time (O(N log N)) bestSoFar = 0
For firstRequest = 1 to N
bestWithThisRequest = result of normal greedy interval scheduling with a start time of the beginning of this request, stopping when the finish time of the next request overlaps with this one’s start time
maxNumberOfRequests = max(bestSoFar, bestWithThisRequest)
This is just N iterations of the O(N) part of interval scheduling, moving through the requests in sorted order and picking out the earliest-finishing request that doesn’t overlap with anything picked so far. The final cyclical schedule must include at least one request, and requests from the previous day that extend past that request’s start time therefore won’t be in the final solution. So we can try starting with each different request, only scheduling everything else up to that request’s start time, and this will match the optimal for some choice of first request.
4.
Reasonable approach #1:
Just make a Minimal Spanning Tree for the whole graph, including all cities. Then, for each leaf of that tree, while the leaf is not a destination, prune the edge to that leaf, creating a new leaf.
Using Kruskal’s algorithm for the minimal spanning tree, this algorithm is O(|E| log |E|). We sort the edges of the graph in O(|E| log |E|) time, then can do a DFS from an arbitrary vertex to find
the extra edges that lead to unnecessary leaves, and remove them. That running time of
O(|V| + |E|) is dominated by the minimal spanning tree time.
This is not optimal in cases where the interior of the MST contains more edges than it needs to, containing cities that are not destinations. A simple example:
55
6
The MST here will choose the two five-weight nodes because V must be connected, while the optimal choice for connecting just the destinations is to pick the 6-weight edge.
Reasonable approach #2: Dikjstra’s algorithm
Another reasonable approach is to perform Dijkstra’s algorithm, either for a just a single node or repeatedly for all nodes. If just a single node, we can get the tree in O(|V|log|E|) time; if we run Dijkstra’s with every node as the start, we can do this in O(|V|2log|E|) time.
Just using Dijkstra’s algorithm from an arbitrary destination node has the faster running time, but it clearly isn’t optimal, because paths between any two nodes must run through our chosen start node, which denies us opportunities to connect things more directly.
5
1
5
We could also try running Dijkstra’s algorithm from all possible starts, then greedily choose short paths connecting destinations until everything is connected (essentially Dijkstra’s, then Kruskal’s or Prim’s using those paths as the edges). But, this misses opportunities where a particular vertex is on no particular shortest path, but allows us to tie many destinations together.
D1
V
D2
D1
D3
D2
D3
10
5
10
6
10 6
D1
V
D2
Shortest paths all 10, tree with them is 20; vs. optimal tree = 17
5. We can perform a modified Dijkstra’s algorithm here, choosing the largest bandwidth edge to add to our tree at any given step instead of the edge producing the shortest path. We can do this because a modified version of the cut property holds here: for any set of vertices S and the rest of the vertices, V-S, it is not wrong to choose the largest bandwidth edge connecting those two sets, since any other edge connecting them would be strictly worse. A similar argument shows why we only need to do Dijkstra’s algorithm and start from one source; if there were a path between two vertices that had better bandwidth than the path provided between them in Dijkstra’s tree, all of those edges should have been added before the edge causing the bottleneck in the other path, causing a contradiction.
The running time is the running time of Dijkstra, since it’s doing the same thing (we can modify the heap to return the biggest element instead of the smallest). So that’s O(|V| log |E|).
Part II
The expected optimization for the online algorithm is to check in a lookup table whether a dropped edge is actually used in the tree returned by Dijkstra’s. If it’s not, this doesn’t affect the shortest paths, and we don’t bother recomputing anything.
One possible optimization for extra credit would be to memorize in a hash table what the shortest distances were for a previous state of the graph, so when a link goes back up, we instantly have the shortest paths. Another optimization would be to check which vertex of an edge is closer to our source when it goes down or up, and if we need to redo Dijkstra’s algorithm, only redo the work starting from that earlier vertex. The new tree must have a shortest path to either of the two vertices that doesn’t involve this edge that goes down or up, and so that shortest path must exist in the previous tree, and it must be the path to the closer of the two vertices.