程序代做 More pairwise alignment and

More pairwise alignment and
multiple sequence alignment CompSci 369, 2022

School of Computer Science, University of Auckland

Copyright By PowCoder代写 加微信 powcoder

Last lecture
Local alignment algorithm
Overlap alignment
Alignment using more complex gap penalties

This lecture
Alignment using ane gap penalty Approaches to multiple sequence alignment

Complexity of pairwise alignment algorithms
Naively, quadratic in time and space: .
If only need score but not alignment itself, easy to do with linear space: (still time)
With a little work, can get alignment in linear space too. Details in notes.

More complex gap penalities
.1 − j , … ,0 = k ,)k − j(γ + )k ,i( F ⎩
,1 − i , … ,0 = k ,)k − i(γ + )j ,k( F ⎨ xam = j,iF
,)jy,ix(s+)1−j,1−i(F⎧

For need to consider other cells: diagonal + everything same row and column. Result is , not in speed
)2n(O )3n(O
1+j+i )j,i(F

Affine gap alignment:
A approach is available here but need to use more space.
is score of best alignment up to
is score of best alignment up to
is score of best alignment up to
given that
given that
given that
is aligned to .
is aligned to a gap.
is aligned to a gap.
jy ix )j,i(
)j,i(M )2n(O
e)1 − k( − d− = )k(γ
jy GCA – ixCA
– – jy G ix C C A
jy G C A ix C C A

Assume a gap never directly follows an insertion (so can’t go direct )
Can again apply tabular computing this time using 3 arrays (for and traceback as before.
Algorithm is still in time and space but with higher coecents (as need to store and calculate 3 arrays not 1).
.e − )1 − j ,i(yI { xam = )j ,i(yI ,d−)1−j,i(M
,dna ;e−)j,1−i(xI{xam=)j,i(xI ,d−)j,1−i(M
;)jy,ix(s+)1−j,1−i(yI⎪⎩
,)jy ,ix(s + )1 − j ,1 − i(xI ⎨ xam = )j ,i(M
,)jy,ix(s+)1−j,1−i(M⎪⎧ yI ↔ xI

Affine alignment as a finite state automaton
States , and shown as cirlces.
Transitions betwen states via arrows and incur cost shown next to arrow.
Each state increments position on sequences and by amount shown

An alignment is a path through the states
Finite state automata are known as Moore machines or Mealy Machines in Computer Science.
MyI M MxIxI M M
KSEA–LH K-DAPSLV

Finite state automata are known as Moore machines or Mealy Machines in Computer Science.
Markov chains and Hidden Markov Models (coming soon…) are stochastic FSAs

Pairwise alignment in practice
If and not too large, we use these algorithms.
Commonly, have sequence and want to nd matches in whole database. So small but very large. Need to use approximations to the exact method.
BLAST (Basic Local Alignment Search Tool) is the most common. See
https://blast.ncbi.nlm.nih.gov/Blast.cgi

Multiple sequence alignment

MSA via dynamic programming
Conceptually easy (recusion etc)
But resources required for naive implementation quickly get large:
Needleman-Wunsch is in space and time at worst
Extending to sequences of length requires storing and calculating matrix of size .
eg 5 seqeunces of of length 100 produces array with cells, 4 bytes per cell means need approx 37GB just to store
Real examples may involve 1000s of sequences each with 1000s of sites.
Some clever approaches (calculating upper and lower bounds on score) can limit size of space searched but only work for small problems (say seqs of bases)
Give up on nding optimal alignment and try heuristics to nd good enough alignment.
0101 = 501

MSA via progressive alignment
Most widely used technique for MSA Involves a series of pairwise alignments.
Basic idea: choose a pair of sequences and align them, choose a third and align it to either of rst two, and so on until all sequences aligned.
May want to align two MSA together: eg when aligning 4 sequences, might align two pairs rst then align the resulting alignments together.

MSA via progressive alignment
Issues to address:
Order in which to align the sequences How to align two sequences together How to align a sequence to a MSA How to align two MSAs together
The order of alignment can be determined using a guide tree. We’ll look at a method of construction a guide tree, known as UPGMA.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com