CSC373
Week 3: Dynamic Programming
Nisarg Shah
373F20 – Nisarg Shah 1
Recap
• Greedy Algorithms ➢ Interval scheduling ➢ Interval partitioning ➢ Minimizing lateness ➢ Huffman encoding ➢…
373F20 – Nisarg Shah 2
Jeff Erickson on greedy algorithms…
373F20 – Nisarg Shah 3
The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was secretary of Defense, and he actually had a pathological fear and hatred of the word ‘research’. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term ‘research’ in his presence. You can imagine how he felt, then, about the term ‘mathematical’. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose?
— Richard Bellman, on the origin of his term ‘dynamic programming’ (1984)
Richard Bellman’s quote from Jeff Erickson’s book
373F20 – Nisarg Shah 4
Dynamic Programming
• Outline
➢ Breaking the problem down into simpler subproblems, solve each subproblem just once, and store their solutions.
➢ The next time the same subproblem occurs, instead of recomputing its solution, simply look up its previously computed solution.
➢ Hopefully, we save a lot of computation at the expense of modest increase in storage space.
➢ Also called “memoization”
• How is this different from divide & conquer?
373F20 – Nisarg Shah 5
Weighted Interval Scheduling
• Problem
➢ Job 𝑗 starts at time 𝑠 and finishes at time 𝑓
𝑗𝑗
➢ Each job 𝑗 has a weight 𝑤 𝑗
➢ Two jobs are compatible if they don’t overlap
➢ Goal: find a set 𝑆 of mutually compatible jobs with highest
total weight σ 𝑤 𝑗∈𝑆 𝑗
• Recall: If all 𝑤 = 1, then this is simply the interval 𝑗
scheduling problem from last week
➢ Greedy algorithm based on earliest finish time ordering was optimal for this case
373F20 – Nisarg Shah 6
Recall: Interval Scheduling
• What if we simply try to use it again? ➢ Fails spectacularly!
373F20 – Nisarg Shah 7
Weighted Interval Scheduling
• What if we use other orderings?
➢ By weight: choose jobs with highest 𝑤 first
➢ Maximum weight per time: choose jobs with highest 𝑤/(𝑓 −𝑠)first
𝑗𝑗𝑗
➢ …
• None of them work!
➢ They’re arbitrarily worse than the optimal solution
➢ In fact, under a certain formalization, “no greedy algorithm” can produce any “decent approximation” in the worst case (beyond this course!)
𝑗
373F20 – Nisarg Shah 8
Weighted Interval Scheduling
• Convention
➢ Jobs are sorted by finish time: 𝑓 ≤ 𝑓 ≤ ⋯ ≤ 𝑓
12𝑛
➢ 𝑝 𝑗 = largest index 𝑖 < 𝑗 such that job 𝑖 is compatible
withjob𝑗(i.e.𝑓 <𝑠) 𝑖𝑗
Among jobs before job 𝑗, the ones compatible with it are precisely 1 ... 𝑖
E.g.
𝑝[8] = 1, 𝑝[7] = 3, 𝑝[2] = 0
373F20 - Nisarg Shah 9
Weighted Interval Scheduling
• The DP approach
➢ Let OPT be an optimal solution
➢ Two options regarding job 𝑛: o Option 1: Job 𝑛 is in OPT
• Can’t use incompatible jobs 𝑝 𝑛 + 1,...,𝑛 − 1
• Must select optimal subset of jobs from {1, ... , 𝑝 𝑛 } o Option 2: Job 𝑛 is not in OPT
• Must select optimal subset of jobs from {1, ... , 𝑛 − 1}
➢ OPT is best of both options
➢ Notice that in both options, we need to solve the problem on a prefix of our ordering
373F20 - Nisarg Shah 10
Weighted Interval Scheduling
• The DP approach ➢𝑂𝑃𝑇(𝑗)=maxtotalweightofcompatiblejobsfrom 1,...,𝑗
➢Base case: 𝑂𝑃𝑇 0 = 0
➢ Two cases regarding job 𝑗:
o Job 𝑗 is selected: optimal weight is 𝑤 + 𝑂𝑃𝑇(𝑝 𝑗 ) 𝑗
o Job 𝑗 is not selected: optimal weight is 𝑂𝑃𝑇(𝑗 − 1) ➢ Bellman equation:
𝑂𝑃𝑇𝑗 =൝ 0
if𝑗=0 if 𝑗 > 0
max 𝑂𝑃𝑇 𝑗 − 1 , 𝑤 + 𝑂𝑃𝑇 𝑝 𝑗 𝑗
373F20 – Nisarg Shah 11
Brute Force Solution
373F20 – Nisarg Shah 12
Brute Force Solution
• Q: Worst-case running time of COMPUTE-OPT(𝑛)? a) Θ(𝑛)
b) Θ𝑛log𝑛 c) Θ 1.618𝑛 d) Θ(2𝑛 )
373F20 – Nisarg Shah 13
Brute Force Solution
• Brute force running time
➢ It is possible that 𝑝 𝑗 = 𝑗 − 1 for each 𝑗
➢ Calling COMPUTE-OPT(𝑗 − 1) and COMPUTE-OPT(𝑝[𝑗]) separately would take 2𝑛 steps
➢ We can slightly optimize:
o If 𝑝 𝑗 = 𝑗 − 1, call it just once, else call them separately
o Now, the worst case is when 𝑝 𝑗 = 𝑗 − 2 for each 𝑗 oRunningtime:𝑇 𝑛 =𝑇 𝑛−1 +𝑇 𝑛−2
• Fibonacci, golden ratio, …☺
• 𝑇 𝑛 = 𝑂(𝜑𝑛), where 𝜑 ≈ 1.618
373F20 – Nisarg Shah 14
Dynamic Programming
• Why is the runtime high?
➢ Some solutions are being computed many, many times
o E.g. if 𝑝[5] = 3, then COMPUTE-OPT(5) calls COMPUTE-OPT(4) and COMPUTE-OPT(3)
o But COMPUTE-OPT(4) in turn calls COMPUTE-OPT(3) again • Memoization trick
➢ Simply remember what you’ve already computed, and re- use the answer if needed in future
373F20 – Nisarg Shah 15
Dynamic Program: Top-Down • Let’s store COMPUTE-OPT(j) in 𝑀[𝑗]
373F20 – Nisarg Shah 16
Dynamic Program: Top-Down
• Claim: This memoized version takes 𝑂 𝑛 log 𝑛 time
➢ Sorting jobs takes 𝑂 𝑛 log 𝑛
➢ It also takes 𝑂(𝑛 log 𝑛) to do 𝑛 binary searches to compute 𝑝(𝑗) for each 𝑗
➢ M-Compute-OPT(𝑗) is called at most once for each 𝑗
➢ Each such call takes 𝑂(1) time, not considering the time
taken by any subroutine calls
➢ So M-Compute-OPT(𝑛) takes only 𝑂 𝑛 time
➢ Overall time is 𝑂 𝑛 log 𝑛
373F20 – Nisarg Shah 17
Dynamic Program: Bottom-Up
• Find an order in which to call the functions so that the sub-solutions are ready when needed
373F20 – Nisarg Shah 18
Top-Down vs Bottom-Up
• Top-Down may be preferred…
➢ …when not all sub-solutions need to be computed on
some inputs
➢ …because one does not need to think of the “right order” in which to compute sub-solutions
• Bottom-Up may be preferred…
➢ …when all sub-solutions will anyway need to be
computed
➢ …because it is faster as it prevents recursive call overheads and unnecessary random memory accesses
➢ …because sometimes we can free-up memory early
373F20 – Nisarg Shah 19
Optimal Solution
• This approach gave us the optimal value
• What about the actual solution (subset of jobs)?
➢ Idea: Maintain the optimal value and an optimal solution ➢ So, we compute two quantities:
𝑂𝑃𝑇𝑗 =൝ 0 if𝑗=0
max 𝑂𝑃𝑇 𝑗 − 1 , 𝑤 + 𝑂𝑃𝑇 𝑝 𝑗 if 𝑗 > 0 𝑗
∅ if 𝑗 = 0
𝑆(𝑗−1) if𝑗>0∧𝑂𝑃𝑇 𝑗−1 ≥𝑤 +𝑂𝑃𝑇 𝑝 𝑗 𝑆𝑗=൞ 𝑗
𝑗∪𝑆(𝑝𝑗) if𝑗>0∧𝑂𝑃𝑇𝑗−1<𝑤+𝑂𝑃𝑇𝑝𝑗 𝑗
373F20 - Nisarg Shah 20
Optimal Solution
𝑂𝑃𝑇𝑗 =൝ 0 if𝑗=0
max 𝑂𝑃𝑇 𝑗 − 1 , 𝑣 + 𝑂𝑃𝑇 𝑝 𝑗 if 𝑗 > 0 𝑗
∅ if 𝑗 = 0
𝑆(𝑗−1) if𝑗>0∧𝑂𝑃𝑇 𝑗−1 ≥𝑣 +𝑂𝑃𝑇 𝑝 𝑗 𝑆𝑗=൞ 𝑗
𝑗∪𝑆(𝑝𝑗) if𝑗>0∧𝑂𝑃𝑇𝑗−1<𝑣+𝑂𝑃𝑇𝑝𝑗 𝑗
This works with both top-down (memoization) and bottom-up approaches.
In this problem, we can do something simpler: just compute 𝑂𝑃𝑇 first, and later compute 𝑆 using only 𝑂𝑃𝑇.
373F20 - Nisarg Shah 21
Optimal Solution
𝑂𝑃𝑇𝑗 =൝ 0
if𝑗=0 if 𝑗 > 0
max 𝑂𝑃𝑇 𝑗 − 1 , 𝑣 + 𝑂𝑃𝑇 𝑝 𝑗 𝑗
⊥ if 𝑗 = 0
𝐿 if𝑗>0∧𝑂𝑃𝑇𝑗−1 ≥𝑣+𝑂𝑃𝑇𝑝𝑗 𝑆𝑗=൞ 𝑗
𝑅 if𝑗>0∧𝑂𝑃𝑇𝑗−1 <𝑣+𝑂𝑃𝑇𝑝𝑗 𝑗
• Save space by storing only one bit of information for each 𝑗: which option yielded the max weight
• To reconstruct the optimal solution, start with 𝑗 = 𝑛
➢ If 𝑆 𝑗 ➢ If 𝑆 𝑗 ➢ If 𝑆 𝑗
= 𝐿, update 𝑗 ← 𝑗 − 1
= 𝑅, add 𝑗 to the solution and update 𝑗 ← 𝑝[𝑗] =⊥, stop
373F20 - Nisarg Shah 22
Optimal Substructure Property
• Dynamic programming applies well to problems that have optimal substructure property
➢ Optimal solution to a problem can be computed easily given optimal solution to subproblems
• Recall: divide-and-conquer also uses this property ➢ Divide-and-conquer is a special case in which the
subproblems don’t “overlap”
➢ So there’s no need for memoization
➢ In dynamic programming, two of the subproblems may in turn require access to solution to the same subproblem
373F20 - Nisarg Shah 23
Knapsack Problem
• Problem
➢ 𝑛 items: item 𝑖 provides value 𝑣𝑖 > 0 and has weight 𝑤𝑖 > 0
➢ Knapsack has weight capacity 𝑊
➢ Assumption: 𝑊, 𝑣𝑖 -s, and 𝑤𝑖 -s are all integers
➢ Goal: pack the knapsack with a collection of items with highest total value given that their total weight is at most 𝑊
373F20 – Nisarg Shah 24
A First Attempt
• Let 𝑂𝑃𝑇(𝑤) = maximum value we can pack with a knapsack of capacity 𝑤
➢ Goal: Compute 𝑂𝑃𝑇(𝑊)
➢ Claim? 𝑂𝑃𝑇(𝑤) must use at least one job 𝑗 with weight ≤ 𝑤
and then optimally pack the remaining capacity of 𝑤 − 𝑤 𝑗
if 𝑤 < 𝑤∗ if𝑤≥𝑤∗
• This is wrong!
➢ It might use an item more than once!
➢Let𝑤∗ =min𝑤
𝑗
➢𝑂𝑃𝑇𝑤 =ቐmax𝑣+𝑂𝑃𝑇𝑤−𝑤
𝑗 𝑗:𝑤𝑗≤𝑤
0 𝑗𝑗
373F20 - Nisarg Shah 25
A Refined Attempt
• 𝑂𝑃𝑇(𝑖, 𝑤) = maximum value we can pack using only items 1, ... , 𝑖 given capacity 𝑤
➢ Goal: Compute 𝑂𝑃𝑇(𝑛, 𝑊)
• Consider item 𝑖
➢If 𝑤𝑖 > 𝑤, then we can’t choose 𝑖. Just use 𝑂𝑃𝑇(𝑖 − 1,𝑤)
➢ If 𝑤𝑖 ≤ 𝑤, there are two cases:
o If we choose 𝑖, the best is 𝑣𝑖 + 𝑂𝑃𝑇 𝑖 − 1, 𝑤 − 𝑤𝑖 o If we don’t choose 𝑖, the best is 𝑂𝑃𝑇(𝑖 − 1, 𝑤)
373F20 – Nisarg Shah 26
Running Time
• Consider possible evaluations 𝑂𝑃𝑇(𝑖, 𝑤) ➢𝑖∈ 1,…,𝑛
➢ 𝑤 ∈ {1, … , 𝑊} (recall weights and capacity are integers) ➢ There are 𝑂(𝑛 ⋅ 𝑊) possible evaluations of 𝑂𝑃𝑇
➢ Each is evaluated at most once (memoization)
➢ Each takes 𝑂(1) time to evaluate
➢ So the total running time is 𝑂(𝑛 ⋅ 𝑊)
• Q: Is this polynomial in the input size? ➢ A: No! But it’s pseudo-polynomial.
373F20 – Nisarg Shah 27
What if…?
• Note that this algorithm runs in polynomial time when the value of 𝑊 is polynomially bounded in the length of the input
• Q: What if instead of the weights being small integers, we were told that the values are small integers?
➢ Then we can use a different dynamic programming approach!
373F20 – Nisarg Shah 28
A Different DP
• 𝑂𝑃𝑇(𝑖, 𝑣) = minimum capacity needed to pack a total value of at least 𝑣 using items 1, … , 𝑖
➢Goal: Compute max 𝑣 ∶ 𝑂𝑃𝑇 𝑖,𝑣 ≤ 𝑊
• Consider item 𝑖
➢ If we choose 𝑖, we need capacity 𝑤𝑖 + 𝑂𝑃𝑇(𝑖 − 1, 𝑣 − 𝑣𝑖) ➢ If we don’t choose 𝑖, we need capacity 𝑂𝑃𝑇 𝑖 − 1, 𝑣
𝑂𝑃𝑇𝑖,𝑣 =
0 if 𝑣 ≤ 0
∞ if 𝑣 > 0, 𝑖 = 0
min 𝑤𝑖 +𝑂𝑃𝑇 𝑖−1,𝑣−𝑣𝑖 , if𝑣>0,𝑖>0 𝑂𝑃𝑇 𝑖−1,𝑣
373F20 – Nisarg Shah 29
A Different DP
• 𝑂𝑃𝑇(𝑖, 𝑣) = minimum capacity needed to pack a total value of at least 𝑣 using items 1, … , 𝑖
➢Goal: Compute max 𝑣 ∶ 𝑂𝑃𝑇 𝑖,𝑣 ≤ 𝑊
• This approach has running time 𝑂(𝑛 ⋅ 𝑉), where
𝑉=𝑣 +⋯+𝑣 1𝑛
• So we can get 𝑂(𝑛 ⋅ 𝑊) or 𝑂(𝑛 ⋅ 𝑉)
• Can we remove the dependence on both 𝑉 and 𝑊? ➢ Not likely.
➢ Knapsack problem is NP-complete (we’ll see later).
373F20 – Nisarg Shah 30
Looking Ahead: FPTAS
• While we cannot hope to solve the problem exactly intime𝑂 𝑝𝑜𝑙𝑦 𝑛,log𝑊,log𝑉 …
➢ For any 𝜖 > 0, we can get a value that is within 1 + 𝜖 multiplicative factor of the optimal value in time
𝑂 𝑝𝑜𝑙𝑦 𝑛,log𝑊,log𝑉,1 𝜖
➢ Such algorithms are known as fully polynomial-time approximation scheme (FPTAS)
➢ Core idea behind FPTAS for knapsack:
o Approximate all weights and values up to the desired precision o Solve knapsack on approximate input using DP
373F20 – Nisarg Shah 31
Single-Source Shortest Paths
• Problem
➢ Input: A directed graph 𝐺 = (𝑉, 𝐸) with edge lengths l𝑣𝑤
on each edge (𝑣, 𝑤), and a source vertex 𝑠
➢ Goal: Compute the length of the shortest path from 𝑠 to
every vertex 𝑡
• When l𝑣𝑤 ≥ 0 for each (𝑣,𝑤)…
➢ Dijkstra’s algorithm can be used for this purpose
➢ But it fails when some edge lengths can be negative ➢ What do we do in this case?
373F20 – Nisarg Shah 32
Single-Source Shortest Paths
• Cycle length = sum of lengths of edges in the cycle
• If there is a negative length cycle, shortest paths are not even well defined…
➢ You can traverse the cycle arbitrarily many times to get arbitrarily “short” paths
𝑠
373F20 – Nisarg Shah 33
Single-Source Shortest Paths
• But if there are no negative cycles…
➢ Shortest paths are well-defined even when some of the
edge lengths may be negative
• Claim: With no negative cycles, there is always a shortest path from any vertex to any other vertex that is simple
➢ Consider the shortest 𝑠 ⇝ 𝑡 path with the fewest edges among all shortest 𝑠 ⇝ 𝑡 paths
➢ If it has a cycle, removing the cycle creates a path with fewer edges that is no longer than the original path
373F20 – Nisarg Shah 34
Optimal Substructure Property
• Consider a simple shortest 𝑠 ⇝ 𝑡 path 𝑃
➢ It could be just a single edge
➢ But if 𝑃 has more than one edges, consider 𝑢 which immediately precedes 𝑡 in the path
➢ If 𝑠 ⇝ 𝑡 is shortest, 𝑠 ⇝ 𝑢 must be shortest as well and it must use one fewer edge than the 𝑠 ⇝ 𝑡 path
𝑡
373F20 – Nisarg Shah 35
Optimal Substructure Property
• 𝑂𝑃𝑇(𝑡, 𝑖) = shortest path from 𝑠 to 𝑡 using at most 𝑖
edges
• Then:
➢ Either this path uses at most 𝑖 − 1 edges ⇒ 𝑂𝑃𝑇(𝑡, 𝑖 − 1)
➢ Or it uses 𝑖 edges ⇒ min 𝑂𝑃𝑇 𝑢, 𝑖 − 1 + l𝑢𝑡 𝑢
𝑡
373F20 – Nisarg Shah 36
Optimal Substructure Property
• 𝑂𝑃𝑇(𝑡, 𝑖) = shortest path from 𝑠 to 𝑡 using at most 𝑖
edges
• Then:
➢ Either this path uses at most 𝑖 − 1 edges ⇒ 𝑂𝑃𝑇(𝑡, 𝑖 − 1)
➢ Or it uses 𝑖 edges ⇒ min 𝑂𝑃𝑇 𝑢, 𝑖 − 1 + l𝑢𝑡 𝑢
𝑂𝑃𝑇𝑡,𝑖 =൞
0 𝑖=0∨𝑡=𝑠 ∞ 𝑖=0∧𝑡≠𝑠
min 𝑂𝑃𝑇 𝑡,𝑖−1 ,min𝑂𝑃𝑇 𝑢,𝑖−1 +l𝑢𝑡 otherwise 𝑢
➢ Running time: 𝑂(𝑛2) calls, each takes 𝑂(𝑛) time ⇒ 𝑂 𝑛3 ➢ Q: What do you need to store to also get the actual paths?
373F20 – Nisarg Shah 37
Side Notes
• Bellman-Ford- Moore algorithm
➢ Improvement over this DP
➢ Running time 𝑂(𝑚𝑛) for 𝑛 vertices and 𝑚 edges
➢ Space complexity reduces to 𝑂(𝑚 + 𝑛)
373F20 – Nisarg Shah 38
Maximum Length Paths?
• Can we use a similar DP to compute maximum length paths from 𝑠 to all other vertices?
• This is well defined when there are no positive cycles, in which case, yes.
• What if there are positive cycles, but we want maximum length simple paths?
373F20 – Nisarg Shah 39
Maximum Length Paths?
• What goes wrong?
➢ Our DP doesn’t work because its path from 𝑠 to 𝑡 might
use a path from 𝑠 to 𝑢 and edge from 𝑢 to 𝑡
➢ But path from 𝑠 to 𝑢 might in turn go through 𝑡 ➢ The path may no longer remain simple
• In fact, maximum length simple path is NP-hard
➢ Hamiltonian path problem (i.e. is there a path of length 𝑛 − 1 in a given undirected graph?) is a special case
373F20 – Nisarg Shah 40
All-Pairs Shortest Paths
• Problem
➢ Input: A directed graph 𝐺 = (𝑉, 𝐸) with edge lengths l𝑣𝑤
on each edge (𝑣, 𝑤) and no negative cycles
➢ Goal: Compute the length of the shortest path from all
vertices 𝑠 to all other vertices 𝑡
• Simple idea:
➢ Run single-source shortest paths from each source 𝑠
➢ Running time is 𝑂 𝑛4
➢ Actually, we can do this in 𝑂(𝑛3) as well
373F20 – Nisarg Shah 41
All-Pairs Shortest Paths
• Problem
➢ Input: A directed graph 𝐺 = (𝑉, 𝐸) with edge lengths l𝑣𝑤
on each edge (𝑣, 𝑤) and no negative cycles
➢ Goal: Compute the length of the shortest path from all
vertices 𝑠 to all other vertices 𝑡
• 𝑂𝑃𝑇 𝑢, 𝑣, 𝑘 = length of shortest simple path from
𝑢 to 𝑣 in which intermediate nodes from {1, … , 𝑘} • Exercise: Write down the recursion formula of 𝑂𝑃𝑇
such that given subsolutions, it requires 𝑂(1) time •Runningtime:𝑂 𝑛3 calls,𝑂 1 percall⇒𝑂 𝑛3
373F20 – Nisarg Shah 42
Chain Matrix Product
• Problem
➢ Input: Matrices 𝑀 , … , 𝑀 where the dimension of 𝑀 is
𝑑𝑖−1 × 𝑑𝑖
➢Goal: Compute 𝑀 ⋅ 𝑀 ⋅ …⋅ 𝑀 12𝑛
• But matrix multiplication is associative
➢𝐴⋅𝐵⋅𝐶=𝐴⋅𝐵⋅𝐶
➢ So isn’t the optimal solution going to call the algorithm for multiplying two matrices exactly 𝑛 − 1 times?
➢ Insight: the time it takes to multiply two matrices depends on their dimensions
1𝑛𝑖
373F20 – Nisarg Shah 43
Chain Matrix Product
• Assume
➢ We use the brute force approach for matrix multiplication
➢ So multiplying 𝑝 × 𝑞 and 𝑞 × 𝑟 matrices requires 𝑝 ⋅ 𝑞 ⋅ 𝑟 operations
•Example:compute𝑀 ⋅𝑀 ⋅𝑀 123
➢ 𝑀 is 5 X 10 1
➢ 𝑀 is 10 X 100 2
➢ 𝑀 is 100 X 50 3
➢ 𝑀 ⋅𝑀 ⋅𝑀 →5⋅10⋅100+5⋅100⋅50=30000ops 123
➢𝑀 ⋅ 𝑀 ⋅𝑀 →10⋅100⋅50+5⋅10⋅50=52500ops 123
373F20 – Nisarg Shah 44
Chain Matrix Product
• Note
➢ Our input is simply the dimensions 𝑑0, 𝑑1, … , 𝑑𝑛 (such that
each 𝑀 is 𝑑 × 𝑑 ) and not the actual matrices 𝑖 𝑖−1 𝑖
• Why is DP right for this problem? ➢ Optimal substructure property
➢ Think of the final product computed, say 𝐴 ⋅ 𝐵
➢ 𝐴 is the product of some prefix, 𝐵 is the product of the
remaining suffix
➢ For the overall optimal computation, each of 𝐴 and 𝐵 should be computed optimally
373F20 – Nisarg Shah 45
Chain Matrix Product
• 𝑂𝑃𝑇(𝑖,𝑗) = min ops required to compute 𝑀 ⋅ …⋅ 𝑀 𝑖𝑗
➢ Here, 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛
➢ Q: Why do we not just care about prefixes and suffices?
o 𝑀 ⋅ 𝑀 ⋅ 𝑀 ⋅ 𝑀 ⋅ 𝑀 ⇒ need to know optimal solution for 12345
𝑀⋅𝑀⋅𝑀 234
0 𝑖=𝑗
𝑂𝑃𝑇𝑖,𝑗 =൝
➢ Running time: 𝑂 𝑛2 calls, 𝑂(𝑛) time per call ⇒ 𝑂 𝑛3
min𝑂𝑃𝑇𝑖,𝑘+𝑂𝑃𝑇𝑘+1,𝑗+𝑑 𝑑𝑑∶𝑖≤𝑘<𝑗 if𝑖<𝑗 𝑖−1 𝑘 𝑗
373F20 - Nisarg Shah 46
Chain Matrix Product
• Can we do better?
This slide is not in the scope of the course
➢ Surprisingly, yes. But not by a DP algorithm (that I know of)
➢ Hu & Shing (1981) developed 𝑂(𝑛 log 𝑛) time algorithm by reducing chain matrix product to the problem of “optimally” triangulating a regular polygon
Source: Wikipedia
Example
• 𝐴is10×30,𝐵is30×5,𝐶is5×60
• The cost of each triangle is the product
of its vertices
• Want to minimize total cost of all
triangles
373F20 - Nisarg Shah 47
Edit Distance
• Edit distance (aka sequence alignment) problem ➢How similar are strings 𝑋 = 𝑥 ,...,𝑥 and 𝑌 = 𝑦 ,...,𝑦 ?
• Suppose we can delete or replace symbols
➢ We can do these operations on any symbol in either string
➢ How many deletions & replacements does it take to match the two strings?
1𝑚 1𝑛
373F20 - Nisarg Shah 48
Edit Distance
• Example: ocurrance vs occurrence
6 replacements, 1 deletion
1 replacement, 1 deletion
373F20 - Nisarg Shah 49
Edit Distance
• Edit distance problem ➢ Input
oStrings𝑋=𝑥 ,...,𝑥 and𝑌=𝑦 ,...,𝑦 1𝑚 1𝑛
o Cost 𝑑(𝑎) of deleting symbol 𝑎
o Cost 𝑟(𝑎, 𝑏) of replacing symbol 𝑎 with 𝑏
• Assume𝑟issymmetric,so𝑟𝑎,𝑏 =𝑟(𝑏,𝑎) ➢ Goal
o Compute the minimum total cost for matching the two strings
• Optimal substructure?
➢ Want to delete/replace at one end and recurse
373F20 - Nisarg Shah 50
Edit Distance
• Optimal substructure
➢Goal: match 𝑥 ,...,𝑥 and 𝑦 ,...,𝑦
1𝑚1𝑛
➢ Consider the last symbols 𝑥 and 𝑦 𝑚𝑛
➢ Three options:
o Delete 𝑥 , and optimally match 𝑥 , ... , 𝑥 and 𝑦 , ... , 𝑦
𝑚 1 𝑚−1 1 𝑛
o Delete 𝑦 , and optimally match 𝑥 , ... , 𝑥 and 𝑦 , ... , 𝑦
𝑛 1 𝑚 1 𝑛−1
o Match 𝑥
𝑚 𝑛
and 𝑦 , and optimally match 𝑥 , ... , 𝑥 and 𝑦 , ... , 𝑦
1 𝑚−1 1 𝑛−1
• Weincurcost𝑟(𝑥 ,𝑦 ) 𝑚𝑛
• Extend the definition of 𝑟 so that 𝑟 𝑎,𝑎 = 0 for any symbol 𝑎
➢ Hence in the DP, we need to compute the optimal solutions
for matching 𝑥 ,...,𝑥 with 𝑦 ,...,𝑦 for all (𝑖,𝑗) 1𝑖1𝑗
373F20 - Nisarg Shah 51
Edit Distance
• 𝐸[𝑖,𝑗] = edit distance between 𝑥 ,...,𝑥 and 𝑦 ,...,𝑦 1𝑖1𝑗
• Bellman equation
0 if 𝑖 = 𝑗 = 0
𝑑 𝑦 +𝐸[𝑖,𝑗−1] if𝑖=0∧𝑗>0 𝑗
𝑑𝑥𝑖 +𝐸[𝑖−1,𝑗] if𝑖>0∧𝑗=0 min{𝐴, 𝐵, 𝐶} otherwise
𝐸𝑖,𝑗 =
where
𝐴=𝑑𝑥 +𝐸𝑖−1,𝑗,𝐵=𝑑𝑦 +𝐸𝑖,𝑗−1 𝑖𝑗
𝐶=𝑟𝑥,𝑦 +𝐸[𝑖−1,𝑗−1] 𝑖𝑗
• 𝑂(𝑛 ⋅ 𝑚) time, 𝑂(𝑛 ⋅ 𝑚) space
373F20 – Nisarg Shah 52
Edit Distance
0 if 𝑖 = 𝑗 = 0
𝐸𝑖,𝑗 =
where
𝑑 𝑦 +𝐸[𝑖,𝑗−1] if𝑖=0∧𝑗>0 𝑗
𝑑𝑥𝑖 +𝐸[𝑖−1,𝑗] if𝑖>0∧𝑗=0 min{𝐴, 𝐵, 𝐶} otherwise
𝐴=𝑑𝑥 +𝐸𝑖−1,𝑗,𝐵=𝑑𝑦 +𝐸𝑖,𝑗−1 𝑖𝑗
𝐶=𝑟 𝑥,𝑦 +𝐸[𝑖−1,𝑗−1] 𝑖𝑗
• Space complexity can be reduced in bottom-up approach
➢ While computing 𝐸[⋅, 𝑗], we only need to store 𝐸[⋅, 𝑗] and 𝐸 ⋅, 𝑗 − 1 ,
➢ So the additional space required is 𝑂(𝑚)
➢ By storing two rows at a time instead, we can make it 𝑂 𝑛
➢ Usually people include storage of inputs, so it’s 𝑂(𝑛 + 𝑚)
➢ But this is not enough if we want to compute the actual solution
373F20 – Nisarg Shah 53
Hirschberg’s Algorithm
This slide is not in the scope of the course
• The optimal solution can be computed in 𝑂 𝑛 ⋅ 𝑚 time and 𝑂(𝑛 + 𝑚) space too!
373F20 – Nisarg Shah 54
Hirschberg’s Algorithm
This slide is not in the scope of the course
• Key idea nicely combines divide & conquer with DP • Edit distance graph
𝑑(𝑥𝑖)
𝑑(𝑦 ) 𝑗
373F20 – Nisarg Shah 55
Hirschberg’s Algorithm
This slide is not in the scope of the course
• Observation (can be proved by induction)
➢ 𝐸[𝑖, 𝑗] = length of shortest path from (0,0) to (𝑖, 𝑗)
𝑑(𝑥𝑖)
𝑑(𝑦 ) 𝑗
373F20 – Nisarg Shah 56
Hirschberg’s Algorithm
This slide is not in the scope of the course
• Lemma
➢ Shortest path from (0,0) to (𝑚, 𝑛) passes through (𝑞, Τ )
where 𝑞 minimizes length of shortest path from (0,0) to 𝑛𝑛
(𝑞, Τ ) + length of shortest path from (𝑞, Τ ) to (𝑚,𝑛) 22
𝑛
2
373F20 – Nisarg Shah 57
Hirschberg’s Algorithm
This slide is not in the scope of the course
• Idea
➢ Find 𝑞 using divide-and-conquer
➢ Find shortest paths from (0,0) to (𝑞, Τ ) and (𝑞, Τ ) to
(𝑚, 𝑛) using DP
𝑛𝑛 22
373F20 – Nisarg Shah 58
Application: Protein Matching
373F20 – Nisarg Shah 59