COMP3027/COMP3927 Tutorial questions The University of Sydney 2020 Semester 1 Tutorial School of Computer Science
1 Greedy algorithms
Greedy algorithms can be some of the simplest algorithms to implement, but they’re often among the hardest algorithms to design and analyze. You can often stumble on the right algorithm but not recognize that you’ve found it, or might find an algorithm you’re sure is correct and be unable to prove its correctness.
The standard way of proving the correctness of a greedy algorithm is by using an exchange argument. They work by showing that you can iteratively transform any optimal solution into the solution produced by the greedy algorithm without changing the cost of the optimal solution, thereby proving that the greedy solution is optimal. Typically, exchange arguments are set up as follows:
1. Define your solutions. You will be comparing your greedy solution X to an optimal solution Xopt, so it’s best to define these variables explicitly.
2. Compare solutions. Next, show that if X ̸= Xopt, then they must differ in some specific way. This could mean that there’s a piece of X that’s not in Xopt, or that two elements of X that are in a different order in Xopt, etc. You might want to give those pieces names.
3. Exchange Pieces. Show how to transform Xopt by exchanging some piece of Xopt for some piece of X. You’ll typically use the piece you described in the previous step. Then, prove that by doing so, you did not increase the cost of Xopt and you therefore have a different optimal solution.
4. Iterate. Argue that you have decreased the number of differences between X and Xopt by performing the exchange, and that by iterating this process you can turn Xopt into X without impacting the quality of the solution. Therefore, X must be optimal. This last step might require a formal argument using an induction proof. However, in most cases this is not needed.
Problem 1
A computer network can be modeled as an undirected graph G = (V, E) where vertices represent computers and edges represent physical links between computers. The maximum transmission rate or bandwidth varies from link to link; let b(e) be the bandwidth of edge e ∈ E. The bandwidth of a path P is defined as the minimum bandwidth of edges in the path, i.e., b(P ) = mine∈P b(e).
Suppose we number the vertices in V = {v1, v2, . . . , vn}. Let B ∈ Rn×n a matrix where Bij is the maximum bandwidth between vi and vj. Give an algorithm for computing B. Your algorithm should run in O(n3) time.
Problem 2
Your friend is working as a camp counselor, and he is in charge of organizing activities for a set of junior- high-school-age campers. One of his plans is the following mini-triathlon exercise: each contestant must swim 20 laps of a pool, then cycle 10 km, then run 3 km. The plan is to send the contestants out in a staggered fashion, via the following rule: the contestants must use the pool one at a time. In other words, first contestant swims the 20 laps, gets out, and starts biking. As soon as the first contestant is out of the pool, the second contestant begins swimming the 20 laps; as soon as he/she’s out and starts cycling, a third contestant begins swimming . . . and so on.)
Each contestant has a projected swimming time (the expected time it will take him or her to complete the 20 laps), a projected cycling time (the expected time it will take him or her to complete the 10 km of
1
cycling), and a projected running time (the time it will take him or her to complete the 3 km of running. Your friend wants to decide on a schedule for the triathlon: an order in which to sequence the starts of the contestants. Let’s say that the completion time of a schedule is the earliest time at which all contestants will be finished with all three legs of the triathlon, assuming they each spend exactly their projected swimming, biking, and running times on the three parts.
What’s the best order for sending people out, if one wants the whole competition to be over as early as possible? More precisely, give an efficient algorithm that produces a schedule whose completion time is as small as possible. Prove the correctness of your algorithm.
Problem 3
Assume that you are given n white and n black dots, lying on a line. The dots appear in any order of black and white. Design a greedy algorithm which connects each black dot with a (different) white dot, so that the total length of wires used to form such connected pairs is minimal. The length of wire used to connect two dots is equal to their distance along the line.
2 Divide-and-Conquer
The divide-and-conquer strategy solves a problem by:
1. Breaking it into subproblems that are themselves smaller instances of the same type of the original problem.
2. Recursively solving these subproblems.
3. Appropriately combining (merging) their answers.
The real work is done in three different places: in the partitioning of problems into subproblems; when the subproblems are so small that they are solved outright; and in the gluing together of partial answers.
The standard way of proving correctness for a divide-and-conquer algorithm is by using induction as follows.
• Base case: Solve trivial instances directly, without recursing.
• Inductive step: Reduce the solution of a given instance to the solution of smaller instances, by recursing. For divide-and-conquer algorithms it usually requires a bit of work to prove that the step of merging two (or more) solutions to smaller problems into the solution for the larger problem.
Problem 4
(Due to Jeff Erickson.) For this problem, a subtree of a binary tree means any connected subgraph. A binary tree is complete if every internal node has two children, and every leaf has exactly the same depth. Describe and analyze a recursive algorithm to compute the largest complete subtree of a given binary tree. Your algorithm should return both the root and the depth of this subtree. See Figure 1 for an example.
Problem 5
Suppose we are given numbers a, n, where n > 0 is an integer. We wish to calculate the number an. What is the quickest way to do this? How many multiplication operations do we have to perform? Of course, we may compute 198 by calculating 19 × 19 = 361, then calculating 193 = 361 × 19 = 6859, then 194 = 6859 × 19 = 130321, and so on, until we get 198. This takes seven multiplications in total. Is this the quickest possible? Note that 8 = 2 × 4, so we can also write 198 = 194 × 194. If we compute 194 first, and then square it, we need only one more multiplication. The straightforward method would require four more multiplications: 198 = 194 × 19 × 19 × 19 × 19. Similarly, 194 = 192 × 192. So if we calculate 192 = 361 with one multiplication, 194 = 3612 = 130321 with one more, we get 198 = 1303212 = 16983563041 with the third
2
multiplication. This cleverer method requires only three multiplications. The method above seems to work when the exponent n is even. What do we do when it is odd? Say, we would like to calculate 197. We may write 7 = 6+1, so 197 = 196 ×19, then 196 = 193 ×193, and finally 193 = 192 ×19. So 193 = 361×19 = 6859, 196 = 68592 = 47045881, and 197 = 47045881 × 19 = 893871739. The straightforward method of calculation requires 6 multiplications, and we needed only 4 here. We can combine the ideas from the two examples above to get a procedure to calculate the power an for any pair a, n.
Algorithm 1 Power
1: 2: 3: 4: 5: 6: 7: 8: 9:
10: 11: 12:
functionPower(a,n) if n=1then
return a end if
if n is even then
b = Power(a, n/2) return b2
else
b = Power(a, (n − 1)/2)
return a × b2 end if
end function
Prove that the algorithm correctly computes an. Can you bound the number of multiplications for each n?
3
Dynamic programming
Like greedy algorithms, dynamic programming algorithms can be deceptively simple. The algorithms, once written out, are often so straightforward that it’s hard to believe they work correctly. Consequently, one of the challenges in writing dynamic programming algorithms is rigorously establishing their correctness. Fortunately, dynamic programming proofs are often relatively straightforward and follow a standard pattern.
Typically, dynamic programming algorithms are based on a recurrence relation involving the optimal solution, so the correctness proof will primarily focus on justifying why that recurrence relation is correct. The general outline of a correctness proof for a dynamic programming algorithm is as following:
Figure 1: The largest complete subtree of this binary tree has depth 3.
3
1. Define subproblems. Dynamic programming algorithms usually involve a recurrence involving some quantity OP T (…) over one or more variables (usually, these variables represent the size of the problem along some dimension). Define what this quantity represents and what the parameters mean. This might take the form “OPT(k) is the maximum number of people that can be covered by the first k cell towers” or “OP T (u, v, i) is the length of the shortest path from u to v of length at most i.”
2. Write a recurrence. Now that you’ve defined your subproblems, you will need to write out a recurrence relation that defines OPT(…) in terms of some number of subproblems. Make sure that when you do this you include your base cases.
3. Prove that the recurrence is correct. Having written out your recurrence, you will need to prove it is correct. Typically, you would do so by going case-by-case and proving that each case is correct.
4. Prove the algorithm evaluates the recurrence. Next, show that your algorithm actually evaluates the recurrence by showing that the table values match the value of OPT and that as you fill in the table, you never refer to a value that hasn’t been computed yet. To be fully rigorous, you would probably need to prove this by induction. However, in most cases a few sentences should suffice here.
5. Prove the algorithm is correct. Having shown that you’ve just evaluated the recurrence correctly, your algorithm will probably conclude with something like “return A[m,n]”. Prove that this table value is the one that you actually want to read.
Problem 6
In the game of Nim, there are three heaps of toothpicks on a table and two players take turns to remove toothpicks. In her turn, a player must remove at least 1 toothpick from one of the heaps. The player that removes the last toothpick from the table wins.
A game configuration is captured by a triplet (a,b,c) denoting how many toothpicks are there in each heap. Some configurations are winning for the first player in the sense that it does not matter what the second player does, there is always a way for the first player to win. Similarly, other configurations are losing in the sense that it does not matter what the first player does, there is always a way for the second player to win.
Design an O(a2b2c2) time algorithm that given a triplet (a,b,c) tests whether it is a winning or a losing configuration.
Problem 7
Let G = (V,E) be an undirected graph with n nodes. Recall that a subset of the nodes is called an independent set if no two of them are joined by an edge. Finding large independent sets is difficult in general; but here we’ll see that it can be done efficiently if the graph is “simple” enough.
Call a graph G = (V,E) a path if its nodes can be written as v1,v2,…,vn, with an edge between vi and vj if and only if the numbers i and j differ by exactly 1. With each node vi, we associate a positive integer weight wi.
Give an algorithm that takes an n-node path G with weights and returns an independent set of maximum total weight. independent of the values of the weights. Prove the correctness and complexity of your algorithm.
4 Network Flow
The general idea we’ve used to solve a problem X with network flows is to “reduce” X to the problem of computing a max flow (or something similar). That is, we modify X into an equivalent problem that can be solved using network flows (for example, bipartite matching). Since we are reducing a problem X to another problem Y the correctness proof requires us to prove that a solution for Y is a solution for X and vice versa. For example, for bipartite matching we proved that if there is a matching of size k in the bipartite graph G
4
then there’s a flow of value at most k in the corresponding flow network G′, and if there’s a flow of value k in G′ then there’s a matching of size at most k in G.
Problem 8
We have data about n countries that are engaged in trade with one another. For each country i, we have the value si of its budget surplus; this number may be positive or negative, with a negative number indicating a deficit. For each pair of countries (i,j) we have the total value e(i,j) of exports from i to j; this number is always non-negative. We say that a subset S of countries is free-standing if the sum of the budget surpluses of the countries in S, minus the total value of all exports from countries in S to countries not in S, is non-negative.
Give a polynomial-time algorithm that takes this data for a set of n countries and decides whether it contains a non-empty free-standing subset that is not equal to the full set of countries.
Problem 9
Back in the euphoric early days of the Web, people liked to claim that much of the enormous potential in a company like Yahoo! was in the “eyeballs” — the simple fact that it gets millions of people looking at its pages every day. And further, by convincing people to register personal data with the site, it can show each user an extremely targeted advertisement whenever he or she visits the site, in a way that TV networks or magazines couldn’t hope to match. So if the user has told Yahoo! that they’re a 20-year old computer science student from USyd, the site can throw up a banner ad for apartments in Glebe; on the other hand, if they’re a 50-year-old investment banker from Vaucluse the site can display a banner ad pitching Luxury Cars instead.
But deciding on which ads to show to which people involves some serious computation behind the scenes. Suppose that the managers of a popular Web site have identified k distinct demographic groups G1, G2, . . . , Gk. (These groups can overlap; for example G1 can be equal to all residents of Sydney, and G2 can be equal to all people with a degree in computer science.) The site has contracts with m different advertisers, to show a certain number of copies of their ads to users of the site. Here’s what the contract with the ith advertiser looks like:
• For a subset Xi ⊆ {G1,…,Gk} of the demographic groups, advertiser i wants its ads shown only to users who belong to at least one of the demographic groups in the set Xi.
• For a number ri, advertiser i wants its ads shown to at least ri users each minute.
Now, consider the problem of designing a good advertising policy — a way to show a single ad to each user of the site. Suppose at a given minute, there are n users visiting the site. Because we have registration information on each of these users, we know that user j (for j = 1, 2, . . . , n) belongs to a subset Uj ⊆ {G1, . . . , Gk} of the demographic groups. The problem is: is there a way to show a single ad to each user so that the site’s contracts with each of the m advertisers is satisfied for this minute? (That is, for each i = 1, 2, . . . , m, at least ri of the n users, each belonging to at least one demographic group in Xi, are shown an ad provided by advertiser i.)
Give an efficient algorithm to decide if this is possible, and if so, to actually choose an ad to show each user.
5 NP-completeness and Reductions
Problem 10
In the Degree ∆ Spanning Tree problem we are given a graph G = (V,E) and we have to decide whether there is a spanning tree of G whose maximum degree is at most ∆. (Note that ∆ is not part of the input, but rather part of the problem definition.) It is known that the Degree 2 Spanning Tree problem (D2ST)
5
is NP-complete. Using this fact, prove that Degree ∆ Spanning Tree is NP-complete for all fixed ∆ > 2 (D∆ST).
Problem 11
The decision version of the 3-coloring problem is “Given an undirected graph G, does it have a 3-coloring?”. The search version of the 3-coloring problem is similar but we are asked to actually find the 3-coloring of the graph if it exists.
Assuming you have a polynomial time algorithm for the decision version of 3-coloring, design an algorithm for the search version of 3-coloring.
6