CS 369: Viterbi and forward algorithms
David Welch 9 April 2019
Processing math: 0%
Viterbi algorithm (log units)
The key part of the algorithm is calculating the recursion Vl(i + 1) = El(xi + 1) + max
Think of V_k(i) = V(k,i) as a matrix. Row indices are states, column indices are positions along sequence.
2/23
Viterbi example: CpG islands
Methylation changes the nucleotide pair CG (written CpG) to TG, so CpG is rarer than expected in genome.
But in functional regions, methylation is less common so we see more CpGs there. These regions are known as CpG islands.
Model each part of a sequence as having either high or low CpG content. Label the regions H (for high CpG content) and L (for low).
3/23
Viterbi example: CpG islands
In high CpG regions, C and G each occur with probability 0.35 and T and A with probability 0.15.
In low CpG regions, C and G are probability 0.2 and T, A are 0.3.
The initial state is H with probability 0.5 and L with probability 0.5. Transitions are given by a_{HH} = a_{HL} = 0.5, \, a_{LL} = 0.6, a_{LH} = 0.4.
Find the most likely state sequence given the sequence of symbols x = GGCACTGAA.
4/23
Viterbi example: CpG islands
Work in log units (here use log base 2, can use any base, usually base e easiest): set \mathbf A = \log_2(a) and \mathbf E = \log_2(e) \mathbf A = \begin{array}{ccc}
& H & L \\ H & \log(a_{HH}) = -1 & -1 \\ L & -1.322 & -0.737 \end{array}
\mathbf E = \begin{array}{ccccc} & A & C&G&T\\ H & \log(e_H(A)) = -2.737 & -1.515 & -1.515 & -2.737 \\ L & -1.737 & -2.322 & -2.322 & -1.737 \end{array}
Recursion is: V_l(i) = E_l(x_i) + \max_k(V_k(i-1) + A_{kl})
5/23
Want to fill out:
Use V_l(i) = E_l(x_i) + \max_k(V_k(i-1) + A_{kl})
6/23
Viterbi example: CpG islands
Traceback begins in the final column by finding state that maximises the joint probability, L in this case.
Following the pointers from this position and recording the state at each step gives us the state path with the highest probability is \pi^\ast = HHHLLLLLL.
7/23
Viterbi algorithm examples
Recall the cheating casino which had a fair die (F) and a loaded die (L). The loaded die rolls a six 50% of the time.
Rolls (symbol sequence) 2516624245663624223463653452351666624662666661516
Reconstructed state seqeunce using Viterbi algoritm FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLFFFF
States (true state seqeunce) FFFFLFFFLLLLLLLLFFFFFFFFFFFFFFLLLLFFFFLLLLLLFFFFF
8/23
4 F F
Automatic Chord Recognition – Chordec
Video of Chordec app in use Same with a different song
Paper: R. Chen, W. Shen, A. Srinivasamurthy, P. Chordia: “Chord Recognition Using Duration-Explicit Hidden Markov Models”, in International Society of Music Information Retrieval Conference (ISMIR), Oct 2012
9/23
Viterbi review
The Viterbi algorithm finds the most likely state sequence, \pi^* for a given symbol sequence, x.
It defines the quantity v_k(i) which is probability of the most likely state sequence of length i ending in state k.
Think of this as a matrix, V(k,i) = v_k(i). We calculate this using tabular computation.
The calculation is based on the recursion v_l(i+1) = e_l(x_{i+1}) \max_k (v_k(i)a_{kl}) with boundary conditions v_0(0) = 1 and v_k(0) = 0 for k \neq 0.
## Viterbi review
An example (where there are 3 possible states) of how to get from the i th column of the matrix v to the (i+1) th column:
10/23
Viterbi review
Or or the log form of the recursion is V_l(i+1) = E_l(x_{i+1}) + \max_k (V_k(i)+ A_{kl})
11/23
Calculating the probability of x
If x and \pi known, easy to calculate P(x,\pi).
But only observe x, not \pi so what do we do?
P(x,\pi^*) is not much use as there may be loads of possible state paths \pi
Want to marginalise over the state path, that is, calculate P(x) = \sum_{\pi} P(x,\pi)
But this sum contains a huge number of terms (K^L if x is L long and there are K states)
Dynamic programming to the rescue…
12/23
Forward algorithm
Let f_k(i) = P(x_{1:i},\pi_i = k)
So f_k(i) is the joint probability of the first i observations and the path being in
state k at the i th step.
Recursion for f: f_l(i+1) = e_l(x_{i+1}) \sum_k f_k(i)a_{kl}
Always start in state 0 at step 0, so f_0(0) = 1 and f_k(0) = 0 \mbox{ for } k \neq 0.
13/23
Forward algorithm
Initialisation i = 0: f_0(0) = 1, f_k(0) = 0 for k>0.
Recursion i = 1,\ldots,L: f_l(i) = e_l(x_i) \sum_k f_k(i-1)a_{kl}).
Termination: P(x) = \sum_k f_k(L)a_{k0}.
If the end state 0 is not modelled, set a_{k0} = 1 to get P(x) = \sum_k f_k(L).
14/23
Think of f_k(i) as the matrix f(k,i) and calculate column by column:
15/23
Forward algorithm in log units
As always, prefer to work in log units.
Using notation A_{kl} = \log a_{kl} etc, main recurion in log units is
F_l(i) = \log \left[ e_l(x_i) \sum_k f_k(i-1)a_{kl})\right] = E_l(x_i) + \log \left[\sum_k f_k(i-1)a_{kl})\right] But can’t take the log through the sum, so if only storing F and A could calcualte RHS as E_l(x_i) + \log \left[\sum_k \exp(F_k(i-1) + A_{kl})\right] but this involves forming exponential of large negative number which is exactly what want to avoid.
16/23
Log sum
Want \log(e^a + e^b) without forming e^a or e^b.
Handy trick: \log(e^a + e^b) = \log(e^a(1 + e^{b-a}) ) = \log(e^a) + \log(1 + e^{b-a})
= a + \log(1 + e^{b-a}).
If |b-a| is small enough, never need store an extremely large or small number so
stable.
Call this the logsum of a and b
17/23
Log sum
Extends to finding the log of a sum of multiple logged numbers:
function logsum(x) z=0
foriinx
z += exp(i – x[0])
return x[0] + log(z)
or, using \log_2,
function logsum(x) z=0
foriinx
z += 2^(i – x[0])
return x[0] + log2(z)
18/23
Forward algorithm in log units
Initialisation i = 0: F_0(0) = 0, F_k(0) = -\infty for k>0.
Recursion i = 1,\ldots,L: F_l(i) = E_l(x_i) + logsum_k\left(F_k(i-1) + A_{kl}\right). Termination: P(x) = logsum_k \left(F_k(L) + A_{k0} \right).
If the end state 0 is not modelled, set a_{k0} = 1 to get P(x) = logsum_k (F_k(L)).
19/23
a ## transition matrix
e ## emission matrix
Forward algorithm example: CpG islands cont.
x = GGCACTGAA
## H L ##H0.50.5 ##L0.40.6
## A C G T ## H 0.15 0.35 0.35 0.15 ## L 0.30 0.20 0.20 0.30
20/23
## do calcualtion without using log units
forwardMatrixNoLog = forwardNoLog(seq,a,e) forwardMatrixNoLog
## probability is sum of last column
sum(forwardMatrixNoLog[,9])
## [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.175 0.044625 0.01193937 [2,] 0.100 0.029500 0.00800250
0.001375603 0.0006931204 8.350342e-05 0.003231356 0.0005253231 1.985262e-04
[,9] 1.112453e-06 2.954041e-06
##
##
## [,7] [,8]
## ##
[1,] 4.240677e-05 5.110917e-06 [2,] 3.217349e-05 1.215224e-05
##
[1] 4.066495e-06
21/23
## use log base 2
forwardMatrix = forward(seq,a,e) forwardMatrix
## log probability is logsum of last column
logsum(forwardMatrix[,9])
log2(sum(forwardMatrixNoLog[,9])) ## compare calculations
## [,1]
## [1,] -2.514573
## [2,] -3.321928
## [,8]
## [1,] -17.57799
## [2,] -16.32842
[,2] [,3] [,4] [,5] [,6] [,7] -4.486004 -6.388129 -9.505720 -10.49461 -13.54781 -14.52535 -5.083141 -6.965334 -8.273644 -10.89451 -12.29838 -14.92377
[,9] -19.77782 -18.36888
## [1] -17.90778
## [1] -17.90778
22/23
viterbi(seq,a,e)[[1]]
Compare to joint probability \log_2(P(x, \pi^*)) of the observations and best path \pi^* from Viterbi algorithm:
##
##
##
##
##
##
[,1] [1,] -2.514573 [2,] -3.321928 [,8] [1,] -22.38698 [2,] -21.34633
[,2] [,3] [,4] [,5] [,6] [,7] -5.029146 -7.543720 -11.28069 -13.11719 -16.85415 -18.65001 -5.836501 -8.351074 -10.28069 -13.33958 -15.81351 -18.87240
[,9] -25.40523 -23.82027
23/23