Greedy choice: what choice am I making at each step?
For the activity scheduling:
Greedy choice: earliest finish time
Let S be the result of the greedy algorithm.
Let OPT be an optimal solution
Promising: S and OPT agree on whether to include each of the first i activities.
Invariant: S is promising.
S_0 is what we have before the greedy algorithm does anything
s_1 is what we have after iteration 1
s_2 is what we have after iteration 2…
…
s_n is what we have after iteration n (final result of our greedy algorithm)
1. Show that S is promising before the first iteration
aka show that s_0 is promising.
Promising here means that we agree with OPT on whether or not to include each of the first 0 activities.
We aren’t considering any activities. So this is trivially true… no way to disagree on zero activities!
Inductive step:
Suppose i >= 0 and our invariant is true:
S_i is promising. It means that we agree with OPT on whether or not to include each of the first i activities.
Now we have to prove that S_{i+1} is promising.
aka we have to agree with OPT also on activity a_{i+1}.
There are a few cases here…
1. Greedy doesn’t include activity a_{i+1}. aka S doesn’t use it.
Then, OPT is also not allowed to use a_{i+1}. Why?
We can argue that OPT can’t possibly include a_{i+1}.
What has to happen for the greedy algorithm NOT to use a_{i+1} in its solution?
a_{i+1} has to overlap something that we previously added.
What does this mean about OPT?
Remember that we agree with OPT on inclusion/exclusion of the first i activities.
If a_{i+1} conflicts with something in S_i, then it must conflict with something in OPT too. So OPT can’t include it… OPT is an optimal and correct solution so definitely has no overlap.
Note: we don’t need to prove the case that S doesn’t include it and OPT does include it, because as we just argued this cannot happen.
2. Greedy _does_ use a_{i+1}.
We know that a_{i+1} therefore doesn’t overlap something that we’ve already added to S.
We have to make sure that OPT includes this activity.
Two subcases:
2A. OPT already includes a_{i+1}. Done. S already agrees with OPT.
2B. Of course, it’s possible that OPT doesn’t use a_{i+1}.
We have to argue that OPT _could have_ used it.
High-level view:
-we’re going to show that there’s another optimal that does use a_{i+1}
-Then we’re going to conclude that greedy agrees with this other optimal
Claim: we cannot simply add a_{i+1} to OPT.
Why can’t we do this?
If we add a_{i+1} to OPT, what must happen? We must introduce some conflict to OPT. Otherwise, if we could add something to OPT, then OPT wasn’t optimal after all because we just improved it!
Next question is this:
what does a_{i+1} conflict with in OPT?
We know that a_{i+1} can’t conflict with any of the a1, a2, a3, …, a_i that are in OPT.
Why? Our greedy algorithm adds a_{i+1}, so it cannot conflict with any previously-added activities. And by IH, OPT includes all of these same activities!
So, a_{i+1} must conflict with something from a_{i+2}, a_{i+3}, …, in OPT
Remember that we sorted by finish time:
f_1 <= f_2 <= f_3 <= ... <= f_n
What do we know about the activities in OPT that conflict with a_{i+1}?
We know that the conflicting ones finish after a_{i+1} finishes.
Claim: there is exactly one activity in OPT that conflicts with a_{i+1}.
Why?
Each activity in OPT that conflicts with a_{i+1} must start before a_{i+1} finishes and finish after a_{i+1} finishes.
Can we have two activities that do this?
No: they would both have to cross the finish point of a_{i+1}, so they would overlap... but again OPT has no overlaps.
Call this conflicting activity a_j.
Now let's do this:
take OPT, add a_{i+1}, remove a_j. Call the result OPT2
We have not changed the number of activities in OPT2.
Further, OPT2 must be optimal, because OPT was, and OPT2 has the same number of activities as OPT and no overlap.
Now, the greedy algorithm uses a_{i+1}, _AND_ OPT2 uses a_{i+1}.
This means we agree with all choices up to and including i+1.
Quick example... Imagine that we have these activities:
[1, 4) [3, 5) [8, 10)
Greedy produces [1, 4), [8, 10)
OPT could be [3, 5), [8, 10)
If we swap [3, 5) with [1, 4), then greedy agrees with OPT.
There are multiple possible optimals.
All we have to do is agree with one of them.
This is a proof that our greedy algorithm is correct.
-=-=-=-=-=-=-=-=-=-
What is the MST of this graph?
1--2[weight 4]
2--3[weight 5]
1--3[weight 6]
1--4[weight 7]
2--4 [weight 2]
3--5[weight 10]
{(2, 4), {1, 2}, (2, 3), (3, 5)}
Prove that Kruskal is correct.
Greedy choice: consider edges in increasing order of weight. From minimum weight to maximum weight.
Let T be the result of Kruskal.
Let OPT be optimal solution. Minimizes sum of weights of the edges.
Promising: T and OPT agree on whether or not to include each of the first i edges.
Base case: you have to prove that T and OPT agree on whether to use the first 0 edges.
Nothing can be wrong yet.
Inductive:
i >= 0 and T_i is promising.
Cases:
1. Kruskal doesn’t include e_{i+1}
If Kruskal doesn’t include edge i+1, then edge i+1 must cause a cycle with the edges already added.
But OPT agrees with all of those previously-added edges, and OPT can’t have any cycles (OPT is a valid MST), so OPT also can’t use edge i+1.
So greedy agrees with OPT on edge i+1.
2. Kruskal does use edge i+1.
2A. OPT already uses edge i+1. Done.
2B. OPT doesn’t use edge i+1.
We’re finally going to use the sorting.
We can’t add edge i+1 to OPT.
Why?
OPT is already a spanning tree. If we add an edge, it can’t be a spanning tree anymore.
If we add it, we form a cycle in OPT. An MST doesn’t have cycles.
If we add edge i+1 to OPT, then we have to remove some edge from OPT to eliminate the cycle.
Wrong: remove any edge on that cycle…
What could go wrong if we do that?
Maybe this removes an edge that’s also in T_i. This breaks the invariant.
What we have to do is find an edge in OPT on that cycle that isn’t in T_i.
Call that edge j.
If we can do this next thing:
Take OPT, add edge i+1, remove edge j. Call the result OPT2.
then we’re done because we have a new MST whose total weight is no worse than OPT (remember the sorting of edges). But can we do that?
If every edge on the cycle is already in T_i, then adding edge i+1 to T_i would form a cycle. In this case, Kruskal would _not_ add edge i+1!
OPT agrees with Kruskal on what to do with the first i edges so, similarly, the cycle in OPT cannot consist solely of edges among the first i+1 edges.
So there must be some edge j > i + 1 in OPT that is on this cycle but not in T_i.
So therefore we agree with OPT2 on whether to include the first i+1 edges.
Can we really have multiple optimal MSTs?
Yes.
Imagine we had this graph:
a–b[weight 1]
a–c[weight 1]
b–c[weight 1]
Possible MSTs:
{a–b, a–c}
{a–b, b–c}
{a–c, b–c}