CS计算机代考程序代写 Bioinformatics DNA algorithm Last week recap

Last week recap
•Greedy Algorithms:
Ø Dijkstra’s algorithm (single-source shortest paths)
•Dynamic Programming:
Ø A DAG shortest path algorithm
Ø Weighted interval scheduling o Bottom-up algorithm
o Top-down algorithm
o Memoization
2021-02-01 CSC373 Winter 2021 – Sam Toueg 1

Dynamic Programing Edit Distance
2021-02-01 CSC373 Winter 2021 – Sam Toueg 2

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Some applications:
• Spelling correction: suggest dictionary words that have a
low edit distance from a misspelled word
• Bioinformatics: Quantify the similarity of DNA sequences (viewed as strings of the letters A, C, G and T)
2021-02-01 CSC373 Winter 2021 – Sam Toueg 3

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Example: “boarder’’ vs “barbers’’
!
b
o
a
r
d
e
r



b
a
r
b
e
r
s
2021-02-01 CSC373 Winter 2021 – Sam Toueg 4

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Example: “boarder’’ vs “barbers’’
!
b
o
a
r
d
e
r



b
a
r
b
e
r
s
2021-02-01 CSC373 Winter 2021 – Sam Toueg 6

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Example: “boarder’’ vs “barbers’’
!
b
o
a
r
d
e
r



b
a
r
b
e
r
s
2021-02-01 CSC373 Winter 2021 – Sam Toueg 7

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Example: “boarder’’ vs “barbers’’
!
b
o
a
r
d
e
r



b
a
r
b
e
r
s
2021-02-01 CSC373 Winter 2021 – Sam Toueg 8

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Example: “boarder’’ vs “barbers’’
2 replacements, 2 deletions
!
b
o
a
r
d
e
r



b
a
r
b
e
r
s
!
b
o
a
r
d
e
r


b

a
r
b
e
r
s
2021-02-01 CSC373 Winter 2021 – Sam Toueg 9

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Example: “boarder’’ vs “barbers’’
2 replacements, 2 deletions
!
b
o
a
r
d
e
r



b
a
r
b
e
r
s
!
b
o
a
r
d
e
r


b

a
r
b
e
r
s
2021-02-01 CSC373 Winter 2021 – Sam Toueg 10

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Example: “boarder’’ vs “barbers’’
2 replacements, 2 deletions
!
b
o
a
r
d
e
r



b
a
r
b
e
r
s
!
b
o
a
r
d
e
r


b

a
r
b
e
r
s
2021-02-01 CSC373 Winter 2021 – Sam Toueg 11

Edit Distance
• How similar are strings ! = #!,…,#” and & = ‘!,…,’#?
• Suppose we can delete or replace symbols
ØWe can do these operations on any symbol in either string
ØHow many deletions & replacements does it take to match the two strings?
Example: “boarder’’ vs “barbers’’
2 replacements, 2 deletions
1 replacement, 2 deletions 2021-02-01 CSC373 Winter 2021 – Sam Toueg 12
!
b
o
a
r
d
e
r



b
a
r
b
e
r
s
!
b
o
a
r
d
e
r


b

a
r
b
e
r
s

Edit Distance
Problem
ØInput: Strings ! = #!,…,#” and & = ‘!,…,’# oCost $(&) of deleting symbol &
o Cost ((&, *) of replacing symbol & with *
•Assume!issymmetric,so!”,$ =!($,”)
Ø Output: the minimum total cost for matching ! and &
Example: “boarder’’ vs “barbers’’
2 replacements, 2 deletions totalcost=$ * +( *,, +
($,* +$(-)
1 replacement, 2 deletions
totalcost=$ , +( $,* + $(-)
#
b
o
a
r
d
e
r

$

b
a
r
b
e
r
s
#
b
o
a
r
d
e
r

$
b

a
r
b
e
r
s
2021-02-01
CSC373 Winter 2021 – Sam Toueg
13

Edit Distance
Problem
ØInput: Strings ! = #!,…,#” and & = ‘!,…,’# oCost $(&) of deleting symbol &
o Cost ((&, *) of replacing symbol & with *
•Assume!issymmetric,so!”,$ =!($,”)
Ø Output: the minimum total cost for matching ! and &
Optimal substructure
Consider the last symbols #” and ‘# of ! and &. There are three cases:
Ø Case A: Delete .!, and optimally match .”, … , .!#” and 0″, … , 0$
Ø Case B: Delete 0$, and optimally match .”, … , .! and 0″, … , 0$#”
Ø Case C: Match .! and 0$, and optimally match .”, … , .!#” and 0″, … , 0$#”
Let ((*, +) = edit distance between #!, … , #$ and ‘!, … , ‘%
2021-02-01 CSC373 Winter 2021 – Sam Toueg 14

Edit Distance of ! = #!, … , #1 and & = ‘!, … , ‘2
5(6, 7) = edit distance between .”, … , .% and 0″, … , 0&
Consider the last symbols #$ and ‘% of D and E. There are three cases: Case A: Delete .%, and optimally match .”, … , .%#” and 0″, … , 0&
56,7=$.% +56−1,7 Case B: Delete 0&, and optimally match .”, … , .% and 0″, … , 0&#”
(A)
.”,…,.%#”
.%
0″,…,0&

.”,….,.%

0″,…,0&#”
0&
56,7=$0& +56,7−1 (B) Case C: Match .% and 0&, and optimally match .”, … , .%#” and 0″, … , 0&#”
.”,…,.%#”
.%
0″,…,0&#”
0&
(C) (.%,0& =0if.%=0&
56,7=(.%,0& +56−1,7−1
if6>0,7>0,5 6,7 =min{@,A,B}
2021-02-01 CSC373 Winter 2021 – Sam Toueg 15
where@=$ .% +5 6−1,7 A=$0& +56,7−1
B=(.%,0& +56−1,7−1

Edit Distance
• ((*,+) = edit distance between #!,…,#$ and ‘!,…,’%
• Bellman equation
0 if 6 = 0 ∧ 7 = 0
5 6,7 =
$ .% +5(6−1,0) if6>0∧7=0 $0& +5(0,7−1)if6=0∧7>0 min{@, A, B} if 6 > 0 ∧ 7 > 0
Column
7−2 7−1 7
where@=$ .%
A=$0& +56,7−1
+5 6−1,7 , B=(.%,0& +56−1,7−1
B@
6 − 1 6
2021-02-01
6 − 1, 7 − 2 6, 7 − 2
6 − 1, 7 6, 7
Row
6 − 1, 7 − 1
6, 7 − 1
A
first compute @, A and B
CSC373 Winter 2021 – Sam Toueg 16
to compute this, need to

Edit Distance
• ((*,+) = edit distance between #!,…,#$ and ‘!,…,’%
• Bellman equation
0 if 6 = 0 ∧ 7 = 0
5 6,7 =
$ .% +5(6−1,0) if6>0∧7=0 $0& +5(0,7−1)if6=0∧7>0 min{@, A, B} if 6 > 0 ∧ 7 > 0
where @ = $ .%
A=$ 0& +5 6,7−1
+ 5 6 − 1, 7 ,
B=( .%,0& EDITDIST(D = .”,…,.!, E = 0″,…,0$, $(−),((−,−)):
+5 6−1,7−1
• Algorithm:
5(0,0) = 0
for 6 = 1 to I do
5 6,0 =$ .% +5(6−1,0) for 7 = 1 to J do
5 0,7 =$ 0& +5(0,7−1) for 6 = 1 to I do
for 7 = 1 to J do
56,7 =min{@,A,B}
return5 I,J
K(J ⋅ I) time K(J⋅I)space
2021-02-01
CSC373 Winter 2021 – Sam Toueg
19

Edit Distance
• ((*,+) = edit distance between #!,…,#$ and ‘!,…,’%
• Bellman equation
0 if 6 = 0 ∧ 7 = 0
if 6 > 0 ∧ 7 > 0
• How to recover the editing operations? After computing all the 5 6, 7 ’s execute:
5 6,7 =
min{@, A, B}
where @ = $ .%
A=$ 0 +5 6,7−1
$ .% +5(6−1,0) if6>0∧7=0 $0& +5(0,7−1)if6=0∧7>0
+ 5 6 − 1, 7 , B=( .%,0& +5 6−1,7−1
&
for 6 = I to 1 do
for 7 = J to 1 do
if5 6,7 =$ .% +5 6−1,7 then delete .%
K(J ⋅ I) time elseif5 6,7 =( .%,0& +5 6−1,7−1 then
elseif5 6,7 =$ 0& +5 6,7−1 then delete 0&
2021-02-01
replace .% with 0&
CSC373 Winter 2021 – Sam Toueg
20

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
Assume:
* ” = 1 (deleting a character cost 1) ! “,$ =&1 ” ≠$
0″=$ Minimum edit cost for matching ! and “?

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
E(3,4) : cost of matching “boa’’ and “barb’’

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
0
= E(0,0) = cost of matching two empty strings

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
012

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
0123

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1
2

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1
2
3
4 5 6 7

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1
2
3
4 5 6 7

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1 0=min(1+1, 0+r(b,b), 1+1)
2 3 4 5 6 7
0

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1 0=min(1+1, 0+r(b,b), 1+1)
2 3 4 5 6 7
0

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 10
2
3
4 5 6 7

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1 0 1=
2
3
4 5 6 7

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1 0 1=min(0+1, 1+r(b,a), 2+1) 21
3
4 5 6 7

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 1 0 1=min(0+1, 1+r(b,a), 2+1) 21
3
4 5 6 7

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 101
2
3
4 5 6 7

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 10123456 21123456 32123456 43212345 54322345 65433234 76544323

barbers
b o a r d e r

Edit Distance — DP algorithm matrix
!
01234567 10123456 21123456 32123456 43212345 54322345 65433234
7 6 5 4 4 3 2 3=E(7,7)

barbers
b o a r d e r

Edit Distance — DP algorithm matrix
!
01234567 10123456 21123456 32123456 43212345 54322345 65433234
7 6 5 4 4 3 2 3=E(7,7)

Edit Distance — DP algorithm matrix
!
barbers
b o a r d e r
01234567 10123456 2d(o)1 1 2 3 4 5 6 32123456 43212345 5 4 3 2r(d,b)2 3 4 5 65433234 7 6 5 4 4 3 2d(s)3

Tracing back to recover the edit operations

Edit Distance — DAG of subproblems
barbers b
o a r
d e
r

Edit Distance — DAG of subproblems barbers
0,0 0,1
b
1,0 1,1
o a r
d e
r
3,4
7,7

Edit Distance — DAG of subproblems
barbers
0,0 0,1
b
1,0 1,1
o a r
d e
r
3,4
Blue: cost 0 Black: cost 1
7,7

Edit Distance — DAG of subproblems
barbers
3,4
0,0 1,0
0,1 1,1
0 o
b
1
0
a
r d
e
0
1
0
minimum edit cost = length of shortest path from (0,0) to (7,7)
r0
1
7,7