CPSC 320 2020W1: Assignment 5
This assignment is due Wednesday December 2nd, 2020 at 22:00 Pacific Time. Assignments submitted within 24 hours after the deadline will be accepted, but a penalty of 15% will be applied. Please follow these guidelines:
Prepare your solution using LATEX, and submit a pdf file. Easiest will be to use the .tex file provided. For questions where you need to select a circle, you can simply change \fillinMCmath to \fillinMCmathsoln .
Enclose each paragraph of your solution with \begin{soln}Your solution here…\end{soln}.Your solution will then appear in dark blue, making it a lot easier for TAs to find what you wrote.
Start each problem on a new page, using the same numbering and ordering as this assignment handout.
Submit the assignment via GradeScope at https://gradescope.ca. Your group must make a single submission via one group member’s account, marking all other group members in that submission using GradeScope’s interface.
After uploading to Gradescope, link each question with the page of your pdf containing your solution. There are instructions for doing this on the CPSC 121 website, see https://www.students.cs.ubc.ca/ ~cs-121/2020W1/index.php?page=assignments&menu=1&submenu=3. Ignore the statement about group size that is on that page.
Before we begin, a few notes on pseudocode throughout CPSC 320: Your pseudocode should commu- nicate your algorithm clearly, concisely, correctly, and without irrelevant detail. Reasonable use of plain English is fine in such pseudocode. You should envision your audience as a capable CPSC 320 student unfamiliar with the problem you are solving. If you choose to use actual code, note that you may neither include what we consider to be irrelevant detail nor assume that we understand the particular language you chose. (So, for example, do not write #include
Remember also to justify/explain your answers. We understand that gauging how much justifi- cation to provide can be tricky. Inevitably, judgment is applied by both student and grader as to how much is enough, or too much, and it’s frustrating for all concerned when judgments don’t align. Justifica- tions/explanations need not be long or formal, but should be clear and specific (referring back to lines of pseudocode, for example). Proofs should be a bit more formal.
On the plus side, if you choose an incorrect answer when selecting an option but your reasoning shows partial understanding, you might get more marks than if no justification is provided. And the effort you expend in writing down the justification will hopefully help you gain deeper understanding and may well help you converge to the right selection :).
Ok, time to get started…
1
1 Statement on Collaboration and Use of Resources
To develop good practices in doing homeworks, citing resources and acknowledging input from others, please complete the following. This question is worth 2 marks.
1. All group members have read and followed the guidelines for groupwork on assignments in CPSC 320 (see https://www.students.cs.ubc.ca/~cs-320/2020W1/index.php?page=assignments&menu=1& submenu=3).
Yes No
2. We used the following resources (list books, online sources, etc. that you consulted):
3. One or more of us consulted with course staff during office hours. Yes No
4. One or more of us collaborated with other CPSC 320 students; none of us took written notes during our consultations and we took at least a half-hour break afterwards.
Yes No
If yes, please list their name(s) here:
5. One or more of us collaborated with or consulted others outside of CPSC 320; none of us took written notes during our consultations and we took at least a half-hour break afterwards.
Yes No
If yes, please list their name(s) here:
2
2 Mommy, I want ice cream!
The owners of an ice cream truck store their ice cream not only in the small freezer in their truck, but also in a large freezer at their home. Altogether, the two freezers can contain up to L litres of ice cream at one time. Getting new ice cream is annoying, because they have to rent a larger freezer truck and drive to the ice cream factory out of town. So they want to pick up ice cream relatively rarely. Each time they pick up ice cream, it costs them a fixed amount P for the larger truck rental in addition to the cost of the ice cream they buy. However, it costs c in electricity to store a litre of ice cream for an extra day, so ordering too much ahead also increases their cost.
They are planning to close for the Winter at the end of September, and they want their freezers to be empty by the time they close. Luckily, based on years of experience, they have accurate projections for how much ice cream they will sell each day until this point in time. Assume that there are n days left until they close for the Winter, and that they need li litres of ice cream for each of the days i = 1,2,…,n, for some value li ≤ L. Assume that the freezers are empty at the end of day 0. Give an algorithm to decide on which days they should pick up ice cream, and how much to order so as to minimize their total cost.
Hint: you should make the following assumptions:
When the ice cream truck owners need to buy ice cream, they do it early in the morning (so the ice cream they buy does not count towards the previous day’s electricity cost).
It is not possible to buy fractional litres of ice cream (so they can buy 1 litre, or 2 litres, but not 1.5 or 1.345789345 litres).
They can buy any amount x of ice cream from 1 litre to L litres, although the amount they bought plus the amount they had left in their freezers can not exceed L litres.
Buying x litres of ice cream costs P + px where p is the price of a litre of ice cream.
If the owners have l litres of ice cream in their freezers at the end of day i, how many litres will they
have at the end of day i+1?
1. [4 marks] Define a recurrence for C[d,l]: the minimum cost for the first d days assuming you want
to be left with l litres of ice-cream in the freezers at the end of day d.
Solution: Let us assume that a litre of ice cream costs p. When we define the recurrence for C[d,l], we need to consider all possible orders of ice cream for day d and their costs. If the freezers had y litres of ice cream left at the end of day d − 1, and the ice cream truck owners buy x litres of ice cream at midnight, then the freezers will contain y + x − ld litres of ice cream at the end of day d. Hence
l = y + x − ld which means that y = l + ld − x
The smallest amount of ice cream that the owners of the ice cream truck might buy on day d in order to have l litres left by the end of the day is 0 (this corresponds to y = l + ld). The largest amount of ice cream that the owners of the ice cream truck might need (or be able) to buy would be l + ld litres (that’s if y = 0). The number must also be ≤ L (the owners can’t buy overfill their freezers).
Let P(x) be the transportation cost for x litres of ice cream:
0 ifx=0
P(x) =
P ifx>0
3
We end up with the following recurrence:
+∞ if l + ld > L min {P(x)+p∗x+C[d−1,l+ld −x]+c(l+ld −x)} otherwise
C[d,l] =
0≤x≤l+ld
where P(x) is the delivery cost, p is the price of one litre of ice cream, C[d − 1,l + ld − x] is the price for the previous d − 1 days, and c(l + ld − x) accounts for the ice cream that was left in the freezers at the end of day d−1.
2. [6 marks] Write a dynamic programming algorithm that returns a schedule indicating how much ice cream to buy on each of the n days.
Solution:
Algorithm FindMinIceCreamCost(l1, . . . , ln) C[0, 0] ← 0
for l ← 1 to L do
C[0, l] ← +∞
for d ← 1 to n do for l ← 0 to L do
X[d, l] ← -1
C[d, l] ← +∞ ifl+ld ≤Lthen
forx←0tol+ld do cost←P(x)+p*x+C[d-1,l+ld -x]+c(l+ld -x) if cost < C[d, l] then
C[d, l] ← cost
X[d, l] ← x endif
endfor endif
endfor
endfor
l←0
Schedule ← []
for d ← n downto 1 do
if X[d, l] > 0
add (d, X[d, l]) to Schedule
endif
l ← l + ld – x endfor
return Schedule
3. [2 marks] Analyze the worst-case running time and the space requirements of your algorithm from part 2.
Solution: We use Θ(Ln) memory for the table, and the algorithm runs in Θ(L2n) time.
4
3 The ISGM problem is NP-complete
In this question, we will once again revisit the ISGM problem you encountered in assignment 2, on the midterm, and on assignment 4. We will however modify the problem slightly so it’s now become a decision problem. The new definition is:
Given an undirected graph G = (V,E) with vertices V and edges E, an integer (positive, zero, or negative) weight hv for each v ∈ V , an integer (positive, zero, or negative) weight Ju,v for each edge {u, v} ∈ E, and an integer K, is there an assignment of +1 or −1 to the variables xv for each v ∈ V for which the objective function
v∈V (u,v)∈E
is less than or equal to K?
hvxv + Ju,vxuxv
1.
[4 marks] Prove that this decision problem is NP-Complete. Hint: there are at least two slightly different ways to complete the “harder” part of this proof that rely on nothing more than problems mentioned in the first 10 pages of Chapter 8 of your textbook, and course material you can simply refer us to. So your answer should be short.
Solution: First, we need to prove that ISGM belongs to NP. Given an instance of the ISGM decision problem, a certificate will be an assignment of ±1 x-values to each vertex P of the graph. Given a valid certificate, we can verify that the answer to the ISGM decision problem is True by computing the value of the objective function using the x-values provided, and checking whether or not it is ≤ K.
Now it remains to reduce a known NP-complete problem to ISGM. We can either use SAT or Vertex Cover. A reduction from SAT to ISGM and its proof is correctness was given to you in assignment 2, question 2 (we use K = −1nn − 3na − 3no − nc − 1), while a reduction from Vertex Cover to ISGM including a proof of correctness is given in the midterm, question 4 (we use K = −15|E| − |V | + 2k). There was no need to rewrite either reduction or their proof of correctness here.
Identifying Superspreader Events
4
(You can skip to the definition of SUPERSPREAD without missing any vital information for this question.) Imagine a hypothetical, moderately contagious disease that spreads via respiratory droplets and aerosols. Research has shown that this disease spreads mainly via “superspreader” events, where many people are
gathered together in the same space. Public health officials want you to help them create an algorithm to identify these superspreader events.
Specifically, out of a population of people, some subset has been identified as having gotten the disease. The public health officials also have a list of all events where people congregated, and they want to find a small, “highly likely” (for some definition of “likely”) set of events that include all the infected people.
Formally, we define our problem SUPERSPREAD as a decision problem as follows: An instance of SUPERSPREAD consists of a set P, with |P| = n; a set E = {E1,…,Em}, where each Ei ⊆ P; a set I ⊆ P; and a numerical bound k. SUPERSPREAD asks whether there exists a subset E′ ⊆ E, such that
I⊆ Ei′ E i′ ∈ E ′
and
weight(Ei′) ≤ k, E i′ ∈ E ′
5
where weight(Ei′) is defined to be equal to weight(E′) = |(P − I) ∩ Ei′| + 1.
i | I ∩ E i′ | + 1
Intuitively, P is the total population of n people. E is the set of events E1, . . . , Em, where each event is a
set of people (who attended that event). I is the set of infected people. We are trying to find a subset E′
of all the events, such that each infected person attended at least one event in E′, and we want E′ to be
a “small” set of events (the sum of the weights of events in E′ is ≤ k). Without this last constraint, we
could always return E′ = E as a solution. The weight function might seem a bit odd, but it’s designed to
discourage including events where a lot of people did not get infected (which is the number |(P − I) ∩ Ei′|)
versus the number of people who did get infected (which is the number |I ∩ Ei′|). For example, for a choir
practice where 52 out of 61 were infected, the weight would be 9+1 ≈ 0.19, whereas a dental conference 52+1
where 44 out of 15,000 people were infected would get a weight of 14956+1 ≈ 332.38. It wouldn’t be very 44+1
useful to try to contact-trace through 15,000 attendees, so we would prefer a solution that selected smaller sub-events, even if we needed several smaller sub-events (e.g., a specific dinner, a specific seminar, a specific party, etc.) to explain all 44 attendees who got sick.
1. [3 marks] Explain why SUPERSPREAD is in NP.
Solution: As usual, we’ll be more detailed in these solutions than we’d expect of you. For example, for
this part of the question, an answer roughly of the form:
SUPERSPREAD is in NP because given a subset E′ that purports to be a solution, it is obviously polynomial time to just traverse all the sets in E′ to make sure that all of I is included in their union, and similarly, we can traverse through the sets in E′, compute each of their weights, add the weights up, and compare to k.
would be fine.
However, let’s be a bit more precise here, just to be sure. We’ll assume we represent the set P by the integers 1,…,n. This means the input instance doesn’t have to list P, but just provide the value n (and to be really formal, we might specify that n is given in binary). The set E is then given by m lists of numbers between 1 and n inclusive. Then, the set I is also a list of numbers. And k is a number, also given in binary. This means, if we wanted to be super formal, that the length of the input is bounded by the number of elements in all the sets Ei plus the number of elements in I, all times lgn (plus 2lgn for n and k and maybe some formatting bits, both those are all lower-order terms). Let’s call this input length N.
Now, we can read the input and store the data in any convenient data structure. Reading all the input will take O(N) time, as we don’t have to re-read anything. We can store I as a linked list of numbers, and E as an array of linked lists of numbers.
Now, given this (rather inefficient, but good enough) representation, what is our runtime to check that I ⊆ Ei′∈E′ Ei′? We can copy the list I, and then make a single pass through all the E′, deleting elements from the copy of I for each element in each Ei′. Since the length of I and the length of all of E′ are both bounded by N, we can obviously bound the runtime by O(N2).
Similarly, to check the sum of the weights, we can traverse all of E′ again. For each element in each Ei′, we can check in O(N) time whether the element is in I or not, which lets us compute the weight for each subset Ei′. So, again, we get a bound of O(N2).
Combined, this bounds our overall checking runtime by O(N2), which is polynomial.
(Again, we emphasize that we do NOT expect you to write out such a detailed solution. We provided this just so you can see that it’s possible to account for things carefully, and so you can convince yourself it really is polynomial if it wasn’t obvious to you.)
6
2. [12 marks] Complete the proof that SUPERSPREAD is NP-complete by providing a reduction from the SET COVER problem to SUPERSPREAD, and proving your reduction correct. The SET COVER problem is defined in your textbook in Section 8.1 as follows:
Given a set U of n elements, a collection S1, . . . , Sm of subsets of U, and a number k, does there exist a collection of at most k of these sets whose union is equal to all of U?
Solution: A typical reduction/proof like this will consist of four parts: (1) explaining the reduction, (2) proving that if the original instance is a “yes” instance, then the instance produced by the reduction is also a “yes” instance (for the different problem), (3) proving that if the instance proved by the reduction is a “yes” instance, then the original instance was a “yes” instance, and (4) explaining that the reduction is polynomial time.
(1) (We’ll first give some intuition, but this wouldn’t be formally part of the answer.) SUPERSPREAD looks a lot like SET COVER, so the reduction shouldn’t be too complicated. In SUPERSPREAD, we have to cover the set I, which is a subset of the overall population P; in SET COVER, we have to cover the entire set U. In SUPERSPREAD, we do the covering with events Ei, which are subsets of P that can contain members who are in or not in I; in SET COVER, we do the covering with the subsets Si, which are subsets of U, all of which we have to cover. In SUPERSPREAD, we add up the weights of the subsets Ei′ and make sure the sum is less than or equal to k; in SET COVER, we count the subsets we use in the cover, and make sure that we use at most k subsets. Note that this is equivalent to giving each subset weight 1 and adding up the “weights”. So, to reduce a SET COVER instance into a SUPERSPREAD instance, it seems like we’d want to make I = U (since we have to cover all of U), and then somehow fiddle with the subsets Ei′ to make their weights equal to 1. And that’s exactly what we’ll do. Here’s the formal reduction:
Given an instance of SET COVER with set U with n elements, subsets S1, . . . , Sm of U, and a number k, we construct an instance of SUPERSPREAD as follows: Let P = U ∪ V , where the set V contains n new elements, in 1-1 correspondance with the elements in U, i.e., if U = {u1,…,un}, then let V = {v1,…,vn}, with each vi considered to be the “match” to ui. Next, let I = U. Now, create m subsets E1,…,Em of P as follows: Let each Ei = Si ∪ Ti, where Ti = {vj | uj ∈ Si}, i.e., Ei is just Si unioned with a set containing the corresponding elements to Si taken from V . Notice that this makes the weight(Ei) = 1 for all i, because each Ei has equal numbers of elements from U (which is equal to I) and from V (which are not in I). The bound for our SUPERSPREAD instance will be the same bound k as in the SET COVER instance.
(2) If the SET COVER instance is a “yes” instance, then there are k of the subsets from S1, . . . , Sm, whose union is equal to all of U. Without loss of generality, let those k subsets be S1,…,Sk. (If they weren’t, we could renumber the subsets.) Now, let E′ = {E1, . . . , Ek}. Since each subset Ei includes all of Si, and the union i∈1,…,k Si = U = I then we have:
I⊆ Ei′. E i′ ∈ E ′
And since the weight of each Ei′ is 1, then the sum: weight(Ei′)= 1=k≤k.
Ei′ ∈E ′ i∈1,…,k
Therefore, E′ is a witness that the SUPERSPREAD instance is a “yes” instance.
(3) If the SUPERSPREAD instance is a “yes” instance, then there is a subset E′ ⊆ E, such that
I⊆ Ei′ E i′ ∈ E ′
7
and
weight(Ei′) ≤ k.
E i′ ∈ E ′
Recall that the reduction guarantees that the weight of any Ei is equal to 1, so the fact that the sum of the weights ≤ k tells us that there are at most k subsets in E′. Now, for each Ei′ ∈ E′, select the correponding subset in the SET COVER instance (e.g., you can compute it by intersecting the set with U). As noted there are at most k selected from the original collection S1,…,Sm of subsets of U. And since the union of the Ei′ ∈ E′ covers all of I, and I = U, the union of the selected subsets of U also covers all of U. Thus, we have a solution to the SET COVER instance.
(4) The reduction is obviously polynomial time (linear time, really), as we are essentially making two copies of the SET COVER instance.
8