Algorithms and Data, Summer 1 2016 Problem Set 5 (last one!)
Due June 22 at 9 PM
Part I – All Together Now
The following problems can be solved with either greedy, divide and conquer, dynamic programming, or network flow techniques. Choose one of those strategies to solve the problem, describe the steps of your algorithm (writing pseudocode if it is necessary to make your algorithm clear), explain why your algorithm produces a correct solution, and give a big-O bound on the running time. Do not use a strategy more than once.
1. Given a list of jobs that take time ti and a processor that can only do one job at a time, find an ordering of the jobs that minimizes the average time of completion across jobs, where the time of completion of a job is the sum of its time and the times of all jobs that preceded it. For example, if t1 = 2 and t2 = 4, then doing job 1 then job 2 has an average completion time of (2 + (2 + 4))/2 = 4, while doing job 2 then 1 has an average completion time of (4 + (4 + 2))/2 = 5.
2. A group of N people must be divided into two teams, A and B. For every pair of people i and j, we have a nonzero integer unhappiness score uij that represents how unhappy those people will be if they are on different teams from each other. We also have a (nonzero integer) unhappiness score for each person uiA and uiB that suggests how unhappy that person will be to not be on team A or B, respectively. Find the division into teams that makes everyone least unhappy. Do not give an algorithm with a running time that scales with the unhappiness score values; the running time should be polynomial in N.
3. Given a list of words {w1, w2, … wn} of varying lengths {l1, l2, … ln} that will go into a paragraph, we want to minimize the “messiness” of the paragraph as it is printed on a page with a maximum width of W characters. Words are always separated by single spaces on a particular line, and the “messiness” is the cube of the number of spaces left at the end of a line between the final word of the line and W. We want to minimize the sum of the messiness scores of all the lines besides the last one. (Note that the number of characters on a line that includes words i through j is j – i + Σ li, counting spaces.)
Part II – Understanding Data Structures
Draw the following data structures after the keys 7, 6, 5, 4, 3, 2, 1 have been inserted, in that order, into an empty tree or heap.
1. Normal binary search tree.
2. B-Tree with t = 2. (When inserting in a B-tree, don’t add the key you’re trying to insert until you’re at a leaf. The root will act as a leaf until it is full.)
3. Binomial heap.
4. Fibonacci heap. (Should be pretty easy.)
Part III – Understanding NP-completeness
Answer the following questions.
1. The website of the cryptography company RSA used to brag that their cryptography was hard to break because breaking it seems to require factoring, which is a problem known to be in NP. Is the fact that factoring is in NP a reasonable argument that their cryptography is secure? Why or why not? (Factoring is not known to be NP-complete.)
2. Suppose I argue that 2-SAT is NP-complete because (a) an answer is checkable in polynomial time and (b) I can use a 3-SAT solver to solve 2-SAT problems, by just adding an extra literal xn+1 to each 2-SAT clause along with a final clause (¬xn+1 v ¬xn+1 v ¬xn+1) that forces each other clause to not use the extra variable. There is a valid solution to this iff there is a valid solution to the 2-SAT problem. What is wrong with this proof? (Hint: there is nothing wrong with the final clause.)
3. Prove that determining whether a directed graph has a simple cycle of at least k vertices is NP-complete. (Hint: Reduce from a special case of this problem.)
Part IV – Approximation in Practice
In this exercise, you’ll try solving an NP-complete problem both exactly and approximately, and see the time difference and the difference in optimality.
The problem we’ll be solving is machine scheduling with two machines. Write a program called MachineScheduling.java that takes as a command-line argument a number of random jobs to schedule — for example,
java MachineScheduling 100
will generate 100 random jobs. Each job should have a random duration in the range [1, 100). Your program should find the solution in one of two ways:
1) Optimally, via brute force. Try all possible assignments of jobs to either machine 1 or machine 2, and find the best assignment that minimizes the deadline by which both machines finish. (The finish time of a machine is simply the sum of the durations of its jobs.)
2) Using the sorted greedy heuristic described in class. Sort the jobs from biggest to smallest, and greedily schedule them to the machine that currently has less work scheduled to it.
Your program should print out (1) the real time required to compute the brute force solution, (2) the real time required to compute the greedy approximation, (3) the optimality ratio achieved by the greedy approximation, which is the ratio of the greedy heuristic’s later machine time to the optimal solution’s later machine time. For example, if the brute force version achieved a balance of 90 on machine 1 and 100 on machine 2, and the greedy version achieved a balance of 50 on machine 1 and 140 on machine 2, the optimality ratio is 140/100 = 1.4.
Please include functions with the following signatures for our ease of testing:
static boolean[] optimallySchedule(double[] durations) static boolean[] greedilySchedule(double[] durations)
Both should take an array of job durations, and return an array that is true at position i if job i should be scheduled on machine 1, and false if job i should be scheduled on machine 2.