Global alignment and the
Needleman-Wunsch algorithm CompSci 369, 2022
School of Computer Science, University of Auckland
Copyright By PowCoder代写 加微信 powcoder
Last lecture
scoring alignments score matrices
gap penalties
This lecture
Principle of optimality
recurrence relation Needleman-Wunsch algorithm elements of an alignment algorithm
Principle of Optimality
If we know the of the optimal alignment of length then the score of the first parts of the alignment must be optimal.
Recurrence relation for global alignment
Let Let Let
denote the first residues of .
be the score of the optimal alignment between and
be the score for matching residue to residue so
jy ix ]i : 1[x
)jy ,ix(s j,iF
Assume a linear gap penalty: penalty for or is so
both score .
d− )jy ,−(
The optimal alignment up to and aligned
in which case
or aligned to a gap
in which case
either has
d − )1 − j ,i( F = )j ,i( F
− − ix V jy A G I
)jy ,ix(s + )1 − j ,1 − i(F = )j ,i(F
ix V G L jy A G I
or aligned to a gap.
in which case
d − )j ,1 − i( F = )j ,i( F
ix V G L − jy A G
This means we can write as a recurrence relation
Can solve this recurrence if we have boundary conditions: Set:
So can calculate alignment
(start with score of 0)
(aligning first i residues of to gaps scores i gap penalties)
(same for first residues of )
(score of best alignment) and, in process, get best
j dj− = )j,0(F di− = )0,i(F 0 = )0,0(F
.d−)1−j,i(F⎩
,d − )j ,1 − i( F ⎨ xam = )j ,i( F ,)jy,ix(s+)1−j,1−i(F⎧
Don’t work directly with recurrence relation (would repeat many calculations), instead think of as a matrix
th is found by taking maximum score based on closest 3 entries to left and above:
Draw arrow to show where (i,j)th entry was derived from
di− = )i,0(F = )0,i(F 0 = )0,0(F
Global alignment: example
Align x = ATA and y = AGTTA using scores:
if and are both purines (A,G) or pyrimidines (C,T)
if is a purine and is a pyrimidine or vice versa. The gap penalty is
So score matrix is
2 2− 1 2− T 2− 2 2− 1 G 1 2− 2 2− C 2− 1 2− 2 A
a 2−=)b,a(s
b a 1 = )b ,a(s b=a 2=)b,a(s
Gap penalty is . So fill out :
2 2− 1 2− T 2− 2 2− 1 G 1 2− 2 2− C 2− 1 2− 2 A
gap penalty is -2
2 2− 1 2− T 2− 2 2− 1 G 1 2− 2 2− C 2− 1 2− 2 A
So score of best alignment is 2. Get the best alignment by starting in bottom right corner and tracing arrows back
Two alignments with best score depending on which arrow followed at :
A−T−A AT−−A A T T G A dna A T T G A
Elements of alignment algorithm
a recurrence relation for the quantity we are trying to optimise boundary conditions
tabular computing to efficiently calculate the recurrence traceback, includes rules for where to start and stop traceback.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com