CSE 101 Exam 2 Review
CSE 101 Exam 3 Review
Greedy Algorithms (Ch 5)
• Basics
• Change making
• Interval scheduling
• Exchange arguments
• Optimal caching
• Huffman codes
• Minimal spanning trees
Greedy Algorithms
General Algorithmic Technique:
1. Find decision criterion
2. Make best choice according to criterion
3. Repeat until done
Surprisingly, this sometimes works.
Things to Keep in Mind about Greedy
Algorithms
• Algorithms are very natural and easy to write
down.
• However, not all greedy algorithms work.
• Proving correctness is important.
Interval Scheduling
Problem: Given a collection C of intervals, find a
subset S ⊆ C so that:
1. No two intervals in S overlap.
2. Subject to (1), |S| is as large as possible.
Algorithm
IntervalScheduling(C)
S ← {}
While(some interval in C
doesn’t overlap any in S)
Let J be the non-overlapping
interval with smallest max
Add J to S
Return S
Proof of Correctness
• Algorithm produces J1, J2,…,Js with Ji = [xi,yi].
• Consider some other solution
K1, K2,…,Kt with Ki = [wi,zi].
Claim: For each m ≤ t, ym ≤ zm.
In particular, s ≥ t.
Proof of Claim
Use Induction on m.
Base Case: m = 1.
J1 is the interval with smallest max, so y1 ≤ z1.
Inductive Step: Assume ym ≤ zm.
• Jm+1 has smallest y for any [x,y] with x > ym.
• Km+1 = [wm+1, zm+1] has
wm+1 > zm ≥ ym
• Therefore, ym+1 ≤ zm+1.
Exchange Argument
• Greedy algorithm makes a sequence of decisions D1,
D2, D3,…,Dn eventually reaching solution G.
• Need to show that for arbitrary solutions A that G ≥
A.
• Find sequence of solutions
A=A0, A1, A2,…,An = G
so that:
– Ai ≤ Ai+1
– Ai agrees with D1,D2,…,Di
Exchange Argument
In particular, we need to show that given any Ai
consistent with D1,…,Di we can find an Ai+1 so
that:
• Ai+1 is consistent with D1,…,Di+1
• Ai+1 ≥ Ai
Then we inductively construct sequence
A=A0 ≤ A1 ≤ A2 ≤ … ≤ An = G
Thus, G ≥ A for any A. So G is optimal.
Model
• k words in cache at a time.
• CPU asks for memory access.
• If in cache already, easy.
• Otherwise, need to load into cache replacing
something else, slow.
Optimal Caching
Problem: Given sequence of memory accesses
and cache size, find a cache schedule that
involves fewest possible number of swaps with
disk.
8 Cache misses.
CPU
Cache 1
Cache 2
A B A C A D E C B C A C
A
–
A
B
A
B
A
C
A
C
A
D
A
E
C
E
C
B
C
B
C
A
C
A
Furthest In The Future (FITF)
• For each cell consider the next time that
memory location will be called for.
• Replace cell whose next call is the furthest in
the future.
A
B
C
X A Y A B B X C
A
B
X
Proof of Optimality
• Exchange argument
• nth decision: What to do at nth time step.
• Given schedule S that agrees with FITF for first
n time steps, create schedule S’ that agrees for
n+1 and has no more cache misses.
Case 1: S agrees with FITF on step n+1
Nothing to do. S’ = S.
Case 2: S Makes Unnecessary
Replacement
If S replaces some element of memory that was
not immediately called for, put it off.
A
B
C
A
A
B
X
A
B
C
A
A
B
C X
Can assume that S only replaces elements if
there’s a cache miss.
Case 3
The remaining case is that there is a cache miss
at step n+1 and that S replaces the wrong
thing.
A
B
C
X
X
B
C
A
B
C
X
A
X
C
S FITF
Case 3a: S throws out B before using it
A
B
C
X
X
B
C
S
Y
B
B
No B
A
B
C
X
X
A
C
S’
Y
B
Z
No B
Z A
Case 3b: S keeps B until it is used
A
B
C
X
X
B
C
S
B
B
Z
• B is FITF
• A is used sometime
before B.
• A must be loaded into
memory somewhere
else.
A
A
Case 3b: S keeps B until it is used
A
B
C
X
X
B
C
S
B
B
Z
A
A
A
B
C
X
X
B C
S’
B
B
Z
A
A A
Y
Y
Instead of replacing A and then bringing it back,
we can replace B and then bring it back.
Huffman Codes
Definition: An encoding is prefix-free if the
encoding of no letter is a prefix of the
encoding of any other.
Lemma: Any prefix-free encoding can be
uniquely decoded.
Optimal Encoding
Problem: Given a string, S, find a prefix-free
encoding that encodes S using the fewest
number of bits.
How Long is the Encoding?
If for each letter x in our string, x appears f(x)
times and if we encode x as a string of length
ℓ(x), the total encoding length is:
Σ f(x)·ℓ(x).
Tree Representation
Can represent prefix-free encoding as a tree.
A
B
C D
0
0
0
1
1
1
Letters are leaves.
Length of encoding =
Depth of leaf.
Siblings
• No matter what the tree structure, two of the
deepest leaves are siblings.
• Can assume filled by two least frequent
elements.
• Can assume that two least frequent elements
are siblings!
Algorithm
HaffmanTree(L)
While(at least two left)
x, y ← Two least frequent
z new node f(z) ← f(x)+f(y)
x and y children of z
Replace x and y with z in L
Return remaining elt of L
Minimum Spanning Trees
Note: In this problem, you will never want to build
more roads than necessary. This means, you will
never want to have a cycle.
Definition: A tree is a connected graph, with no cycles.
A spanning tree in a graph G, is a subset of the edges of
G that connect all vertices and have no cycles.
If G has weights, a minimum spanning tree is a
spanning tree whose total weight is as small as
possible.
Basic Facts about Trees
Lemma: For an undirected graph G, any two of
the below imply the third:
1. |E| = |V|-1
2. G is connected
3. G has no cycles
Corollary: If G is a tree, then |E| = |V|-1.
Greedy Idea
How do you make an MST?
• Try using the cheapest edges.
Proposition: In a graph G, let e be an edge of
lightest weight. Then there exists an MST of G
containing e. Furthermore, if e is the unique
lightest edge, then all MSTs contain e.
One Step
• After picking lightest edge, what then?
• Contract edge and repeat.
• Equivalent to picking lightest edge that
doesn’t form a cycle.
Algorithm
Kruskal(G)
T ← {}
While(|T| < |V|-1)
Find lightest edge e that
doesn’t create cycle with T
Add e to T
Return T
Optimized Kruskal
Kruskal(G)
Sort edges by weight
T ← {}
Create Union Find
For v ∈ V, New(v)
For (v,w) ∈ E in increasing order
If Rep(v) ≠ Rep(w)
Add (v,w) to T
Join(v,w)
Return T Runtime:O(|E|log|E|)
Other Algorithms
There are many other ways to create MST
algorithms. Kruskal searches the whole graph
for light edges, but you can also grow from a
point.
Proposition: In a graph G, with vertex v, let e be
an edge of lightest weight adjacent to v. Then
there exists an MST of G containing e.
Furthermore, if e is the unique lightest edge,
then all MSTs contain e.
Prim’s Algorithm
Prim’s Algorithm: Add lightest edge that
connects v to a new vertex.
Implementation very similar to Dijkstra.
Prim’s Algorithm
Prim(G,w)
Pick vertex s \\ doesn’t matter which
For v ∈ V, b(v) ← ∞ \\ lightest edge into v
T ← {}, b(s) ← 0
Priority Queue Q, add all v with key=b(v)
While(Q not empty)
u ← DeleteMin(Q)
If u ≠ s, add (u,Prev(u)) to T
For (u,v) ∈ E
If w(u,v) < b(v)
b(v) ← w(u,v)
Prev(v) ← u
DecreaseKey(v)
Return T
Runtime:
O(|V|log|V| + |E|)
Slightly better than
Kruskal
Analysis
At any stage, have some set S of vertices
connected to s. Find cheapest edge
connecting S to SC.
Proposition: In a graph G, with a cut C, let e be
an edge of lightest weight crossing C. Then
there exists an MST of G containing e.
Furthermore, if e is the unique lightest edge,
then all MSTs contain e.
Dynamic Programming (Ch 6)
• Background and past examples
• Longest Common Subsequence
• Knapsack
• Chain Matrix Multiplication
• All-Pairs Shortest Paths
• Maximum Independent Sets of Trees
• Travelling Salesman
Dynamic Programming
Our final general algorithmic technique:
1. Break problem into smaller subproblems.
2. Find recursive formula solving one
subproblem in terms of simpler ones.
3. Tabulate answers and solve all subproblems.
Longest Common Subsequence
We say that a sequence is a common
subsequence of two others, if it is a
subsequence of both.
Problem: Given two sequences compute the
longest common subsequence. That is the
subsequence with as many letters as possible.
Case Analysis
How do we compute LCSS(A
1
A
2
…A
n
, B
1
B
2
…B
m
)?
Consider cases for the common subsequence:
1. It does not use A
n
.
2. It does not use B
m
.
3. It uses both A
n
and B
m
and these characters
are the same.
Recursion
On the other hand, the longest common
subsequence must come from one of these
cases. In particular, it will always be the one
that gives the biggest result.
LCSS(A
1
A
2
…A
n
, B
1
B
2
…B
m
) =
Max(LCSS(A
1
A
2
…A
n-1
, B
1
B
2
…B
m
),
LCSS(A
1
A
2
…A
n
, B
1
B
2
…B
m-1
),
[LCSS(A
1
A
2
…A
n-1
, B
1
B
2
…B
m-1
)+1])
[where the last option is only allowed if A
n
= B
m
]
Recursion
LCSS(A
1…n
,B
1…m
)
LCSS(A
1…n-1
,B
1…m
)
LCSS(A
1…n
,B
1…m-1
)
LCSS(A
1…n-1
,B
1…m-1
)
LCSS(A
1…n-2
,B
1…m
) LCSS(A
1…n-2
,B
1…m-1
)
Key Point: Subproblem reuse
Only ever see LCSS(A
1
A
2
…A
k
, B
1
B
2
…B
ℓ
)
Base Case
Our recursion also needs a base case.
In this case we have:
LCSS(∅,B
1
B
2
…B
m
) = LCSS(A
1
A
2
…A
n
,∅) = 0.
Algorithm
LCSS(A
1
A
2
…A
n
,B
1
B
2
…B
m
)
Initialize Array T[0…n,0…m]
\\ T[i,j] will store LCSS(A
1
A
2
…A
i
,B
1
B
2
…B
j
)
For i = 0 to n
For j = 0 to m
If (i = 0) OR (j = 0)
T[i,j] ← 0
Else If A
i
= B
j
T[i,j] ← max(T[i-1,j],T[i,j-1],T[i-1,j-1]+1)
Else
T[i,j] ← max(T[i-1,j],T[i,j-1])
Return T[n,m]
Example
∅
A
AB
ABC
ABCB
ABCBA
∅
.
.
.
.
A
.
.
.
.
A
B
.
.
.
A
B
A
.
.
A
B
A
C
.
A
B
A
C
A
0 0 0 0 0 0
0 1 1 1 1 1
0 1 2 2 2 2
0 1 2 2 3 3
0 1 2 2 3 3
0 1 2 3 3 4
A
B
C
A
String:
ABCA
Notes about DP
• General Correct Proof Outline:
– Prove by induction that each table entry is filled
out correctly
– Use base-case and recursion
• Runtime of DP:
– Usually
[Number of subproblems]x[Time per subproblem]
More Notes about DP
• Finding Recursion
– Often look at first or last choice and see what
things look like without that choice
• Key point: Picking right subproblem
– Enough information stored to allow recursion
– Not too many
Knapsack
You have an available list of items. Each has a
(non-negative integer) weight, and value. Your
sack also has a capacity.
The goal is to find the collection of items so that:
1. The total weight of all the items is less than
the capacity
2. Subject to 1, the total value is as large as
possible.
Variations
There are two slight variations of this problem:
1. Each item can be taken as many times as you
want.
2. Each item can be taken at most once.
Recursion
What is BestValue(C)?
Possibilities:
• No items in bag
– Value = 0
• Item i in bag
– Value = BestValue(C-weight(i)) + value(i)
Recursion: BestValue(C) =
Max(0, Maxwt(i) ≤C (val(i)+BestValue(C-wt(i)))
Algorithm
Knapsack(Wt,Val,Cap)
Create Array T[0…Cap]
For C = 0 to Cap
T[C] ← 0
For items i with Wt(i) ≤ C
If T[C] < Val(i)+T[C-Wt(i)]
T[C] ← Val(i)+T[C-Wt(i)]
Return T[Cap]
Runtime:
O([Cap] [#Items])
Non Repeating Items
BestValue≤k(Cap) = Highest total value of items
with total weight at most Cap using only items
from the first k.
Base Case: BestValue≤0(C) = 0
Recursion: BestValue≤k(C) is the maximum of
1. BestValue≤k-1(C)
2. BestValue≤k-1(C- Wt(k))+Val(k)
[where this is only used if Wt(k) ≤ Cap]
Runtime
• Number of Subproblems: O([Cap] [#items])
• Time per subproblem O(1)
– Only need to compare two options.
• Final runtime O([Cap][#items]).
Chain Matrix Multiplication
Problem: Find the order to multiply matrices A1,
A2, A3,…,Am that requires the fewest total
operations.
Multiplying (nxm) by (mxk) takes nmk time.
Assume A1 is an n0 x n1 matrix, A2 is n1 x n2,
generally Ak is an nk-1 x nk matrix.
Recursion
• We need to find a recursive formulation.
• Often we do this by considering the last step.
• For some value of k, last step:
(A1A2…Ak)·(Ak+1Ak+2…Am)
• Number of steps:
– CMM(A1,A2,…,Ak) to compute first product
– CMM(Ak+1,…,Am) to compute second product
– n0nknm to do final multiply
• Recursion CMM(A1,…,Am) =
mink[CMM(A1,…,Ak)+CMM(Ak+1,…,Am)+n0nknm]
Subproblems
• Only need subproblems C(i,j) = CMM(Ai,Ai+1,…,Aj) for
1 ≤ i ≤ j ≤ m.
– Fewer than m2 total subproblems.
– Critical: Subproblem reuse.
Full Recursion
Base Case: C(i,i) = 0.
(With a single matrix, we don’t have to do
anything)
Recursive Step:
C(i,j) = mini≤k
Runtime: O(|V|3log|V|)
Floyd-Warshall Algorithm
• Label vertices v1, v2, …, vn.
• Let dk(u,w) be the length of the shortest u-w
path using only v1, v2,…,vk as intermediate
vertices.
• Base Case:
Recursion
Break into cases based on whether shortest path
uses vk.
• The shortest path not using vk has length
dk-1(u,w).
• The shortest path using vk has length
dk-1(u,vk)+dk-1(vk,w).
u w
vk
Algorithm
Base Case:
Recursion: For each u, w compute:
End Condition: d(u,w) = dn(u,w) where n = |V|.
Runtime: O(|V|3)
Maximum Independent Set
Definition: In an undirected graph G, an
independent set is a subset of the vertices of
G, no two of which are connected by an edge.
Problem: Given a graph
G compute the largest
possible size of an
independent set of G.
Call answer I(G).
Simple Recursion
Is vertex v in the independent set?
If not: Maximum independent
set is an independent set of G-v.
I(G) = I(G-v).
If so: Maximum independent set is
v plus an independent set of G-N(v).
I(G) = 1+I(G-N(v)).
v
Recursion: I(G) = max(I(G-v), 1+I(G-N(v)))
Independent Sets and Components
Lemma: If G has connected components
C1,C2,…,Ck then
I(G) = I(C1)+I(C2)+…+I(Ck).
Proof: An independent set for G is exactly the
union of an independent set for each of the Ci.
Can pick the biggest set for each Ci.
Recursion For Trees
Root not used:
I(G) = Σ I(children’s subtrees)
Root is used:
I(G) = 1+Σ I(grandchildren’s subtrees)
Travelling Salesman
Problem: Given a weighted (undirected) graph G
with n vertices find a cycle that visits each
vertex exactly once whose total weight is as
small as possible.
Naïve Algorithm
• Try all possible paths and see which is
cheapest.
• How many paths?
– n possible options for first city.
– (n-1) possible options for second city.
– (n-2) for third city
– …
– Total n!
• Runtime ≈ n!
Dynamic Program
Bestst,L(G) = Best s-t path that uses exactly the
vertices in L.
• Last edge is some (v,t) ∈ E for some v ∈ L.
• Cost is Bestsv,L-t(G)+ ℓ(v,t).
Full Recursion:
Bestst,L(G) = minv[Bestsv,L-t(G)+ ℓ(v,t)].
Runtime Analysis
Number of Subproblems:
L can be any subset of vertices (2n possibilities)
s and t can be any vertices (n2 possibilities)
n22n total.
Time per Subproblem:
Need to check every v (O(n) time).
Final Runtime:
O(n32n)
[can improve to O(n22n) with a bit of thought]