[This is a cleaned-up composite of what was covered in both lecture sections. Each week, I plan to streamline the lectures into cleaned-up notes.]
Greedy Algorithms
Copyright By PowCoder代写 加微信 powcoder
Here’s a problem where a greedy algorithm works:
making change in Canada. What is the minimum number of coins for a given amount of money?
e.g. Someone wants the minimum number of coins for 55 cents.
We keep giving them the biggest possible coin.
This is greedy.
What’s interesting: you can have coin denominations where this doesn’t work.
General principle: if a greedy algorithm works, and you make some small change to a problem, then greedy might not work anymore.
Pros/cons of greedy:
Pro: very fast; think of runtimes like O(n log n)
Con: they are often not correct
Con: even if it works, if you change the problem slightly then greedy might not work anymore
Con: if it does work, it can be hard to prove
Greedy choice for the coin-changing problem is: largest coin first.
If we do coin-changing using the greedy choice “smallest coin first”, would that work?
Smallest coin first: 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
———-
Input: activities a1, a2, …, an
Each activity has start time si and finish time fi
We want to schedule as many activities as possible with no overlap.
Start by listing some possible greedy choices.
A. Schedule activities in order from least number of conflicts to most number of conflicts.
Intuitively: If you have an activity that doesn’t overlap too many other activities, maybe it’s good to schedule it.
Let’s say that we have four activities:
[2, 5), [5, 8), [12, 15), [0, 100]
The thought is that [0, 100) is not a good one to schedule, because it overlaps many other activities.
B. Schedule activities by earliest start time.
Why is this a bad idea?
Let’s have the simplest counterexample for why this is wrong.
[1, 2), [2, 3), [0, 3)
It will take [0, 3) first. Algorithm only gets 1 activity.
But the best algorithm would have given us 2.
C. Schedule by earliest finish time.
D. Schedule by shortest duration first.
This is wrong. Give me activities that proves that this algorithm is wrong.
[2, 4), [0, 3), [3, 6)
Scheduling [2, 4) screws up the other two activities. The greedy schedules 1 activity, but the best algorithm schedules 2.
E. Schedule by latest start time first.
We have three contenders remaining:
1. Schedule by earliest finish time first. Correct
(We will prove this one)
2. Schedule by latest start time first. Correct
(We won’t be proving this one in class, but you can do it if you like)
3. Schedule from least number of conflicts to most number of conflicts.
This algorithm is incorrect.
Counterexamples are harder to find, but they do exist. I came up with one with 9 activities that can lead to incorrect execution. Can you find a counterexample?
———-
We’re left with two correct algorithms: schedule by earliest finish time and schedule by latest start time. We’ll prove earliest finish time.
The pseudocode of our greedy algorithm:
sort activities by finish time: f_1 <= f_2 <= ... <= f_n
f = 0 # finish time of last activity added to S
for i in [1,2,...,n]:
if s_i >= f:
S = S + {a_i}
I’m now going to structure the proof for showing that this greedy algorithm is correct.
At a high level, the proof is by induction.
There is some optimal solution out there. Let’s call it OPT
OPT has the largest number of possible activities.
Let S be the result of the greedy algorithm.
Maybe S is optimal too, maybe it isn’t.
All we know right now is that S can’t have more activities than OPT. Why? Because OPT is by definition the best.
If our solution is as good as OPT, we’re done.
Greedy choice: earliest finish time first. We’ll prove that this is correct
It might seem to make some sense to try to agree with OPT on each step. But no: we can’t do this. It is too restrictive…
Let’s say that we have these activities:
[0, 4), [1, 3)
What are the optimal solutions? There are two: [0, 4) by itself, and [1, 3) by itself
We have to be able to match OPT exactly, or match another optimal exactly. There can be multiple different optimal solutions, and we have to match one of them.
We’re going to show that at each step, S agrees with _some_ optimal, not necessarily OPT.
You have to define what it means for your greedy solution to agree with OPT. This determines what it means for a greedy algorithm to be “promising”.
In this problem, what it means for a solution to be promising is that it agrees with OPT on whether or not to schedule each of the first i activities.
e.g. if greedy and OPT both include activity 4, then they agree on activity 4.
e.g. if greedy and OPT both exclude activity 4, then they agree on activity 4.
e.g. if one of them includes activity 4 but the other excludes it… they don’t agree on activity 4!
If we show that greedy is promising up to and including iteration n, then our solution is optimal.
It’s a proof by induction, so we have a base case and inductive case.
Base case: i = 0.
What do we have to prove here?
We have to show that our first 0 choices of activities agree with what is in OPT.
What is S at this point? {}. S is the empty set
We have to show that everything in S is also in OPT. True
OPT could be anything, but that’s fine. It doesn’t matter what it is.
We can’t be wrong on any of our greedy choices, because we haven’t made any choices yet.
Inductive case:
We assume that greedy is correct up to and including iteration i, and we have to prove that it is still correct after iteration i+1.
Let S_0 be what greedy produces before any iterations.
S_1 is what greedy has produced after iteration 1.
S_2 is what greedy has produced after iteration 2.
S_n is what greedy has produced after iteration n.
IH (inductive hypothesis): S_i and OPT agree on whether to include or exclude each of the first i activities.
We have to prove that S_{i+1} and some optimal agree on whether to include or exclude activity i+1.
On iteration i+1, the greedy algorithm has a choice to make. What’s the choice?
What are the options for the greedy algorithm? It has two: include the activity or exclude the activity.
Remember that the greedy algorithm we’re proving is “earliest finish time first”.
This means that
f1 <= f2 <= ... <= fn
We're going to expand on this next week, but here's a preview:
There are four different cases for us to consider; i.e. four subproofs.
1. Greedy includes activity i+1, and so does OPT.
2. Greedy excludes activity i+1, and so does OPT.
3. Greedy includes activity i+1, but OPT excludes it.
4. Greedy excludes activity i+1, but OPT includes it.
No matter which case we are in, we have to show that there is an optimal that makes the same choice.
... stay tuned!