title: ¡®CS 369: HMM applications¡¯ author: ¡°David Welch¡± date: ¡°5 April 2019¡± output: ioslides_presentation: widescreen: true transition: 0.01 ¡ª
HMM Idea
States of a Markov chain are not seen, we only see symbols that states emit. All states emit the same symbols but with different probabilities.
We observe the sequence of symbols, want to know the sequence of states.
HMM Weather example
A sequence of states (wind directions) might look like: NNNSSNS Producing the sequence of symbols (weather): UUUURUR
HMM Notation
HMM Weather example notation
HMM Example: Cheating casino
A casino occasionally switches from a fair die to a loaded die.
Rolling loaded die gives 6 50% of the time, each of 1-5 10% of the time.
States are: \(F\) and \(L\) corresponding to the Fair and Loaded die.
Emissions are: 1,2,3,4,5,6 corresponding to value rolled
Transition probabilities: \(a_{FL} = 0.05\) and \(a_{LF} = 0.1\). \(a_{FF} = 1 – a_{FL} = 0.95\) and \(a_{LL} = 1 – a_{LF} = 0.9\). Emission probabilities: \(e_F(b) = 1/6\) for \(b = 1,\ldots,6\) while \(e_L(b) = 1/10\) for \(b = 1,\ldots,5\) and \(e_L(6 ) = 0.5\).
HMM Example: Cheating casino
From this process, we might observe the symbol sequence (i.e., a sequence of rolls of the die)
25166242456636242234636534523516666246626666615
Rolls (symbol sequence) 25166242456636242234636534523516666246626666615
States (state seqeunce) FFFFLFFFLLLLLLLLFFFFFFFFFFFFFFLLLLFFFFLLLLLLFFF
Joint probability of state and symbol seqeunce
Think of HMM as a generative process:
the first state \(\pi_1\) is chosen according to probability \(a_{0i}\).
From first state, a symbol is emitted according to the probabilities \(e_{\pi_1}\). The next state is then chosen according to transition probs \(a_{\pi_1 i}\)
The next state emits a symbol and so on.
Joint probability of states and symbols is thus: \[ P(x,\pi) = P(x,\pi|\mathbf{e,a}) = a_{0 \pi_1} \prod_{i = 1}^L e_{\pi_i}(x_i)a_{\pi_i \pi_{i+1}}. \]
HMM Problems
Viterbi algorithm
Given symbol sequence \(x\), find state path \(\pi\) that maximises the joint probability: \[ \pi^* = \arg\,\max_{\pi} P(x,\pi). \] Can find \(\pi^*\) using a dynamic programming approach.
Viterbi algorithm
Let \(v_k(i)\) be the probability of the most probable state path ending in state \(k\) at observation \(i\)
So \(v_k(i)\) is the best score of the state path for the first \(i\) observations.
Suppose \(v_k(i)\) is known for all states \(k\).
Then \(v_l(i+1)\) can be calculated for observation \(x_{i+1}\) as \[v_l(i+1) = e_l(x_{i+1}) \max_k (v_k(i)a_{kl}).\] Use this recursion to find best state path.
Assume boundary conditions: always start in begin state called 0, so \(v_0(0) = 1\)
Viterbi algorithm
Initialisation \(i = 0\): \(v_0(0) = 1\), \(v_k(0) = 0\) for \(k>0\).
Recursion \(i = 1,\ldots,L\): \(v_l(i) = e_l(x_i) \max_k(v_k(i-1)a_{kl})\).
\(Pointer_i(l) = \arg \max_k(v_k(i-1)a_{kl})\)
Termination: \(P(x,\pi^*) = \max_k (v_k(L)a_{k0})\).
Traceback \(i = L,\ldots,1\): \(\pi^*_L = \arg \max_k(v_k(L)a_{k0})\).
\(\pi_{i-1}^* = Pointer_i(\pi_i^*)\).
The termination step assumes the end state is modelled as \(a_{k0}\). If the end is not modelled, this term disappears.
Viterbi algorithm
Even the most probable path has very low probability, so it is important to work in log units.
The recursion in log units is \[ V_l(i+1) = \log (e_l(x_{i+1})) + \max_k (V_k(i) + \log(a_{kl})) = E_l(x_{i+1}) + \max_k (V_k(i) + A_{kl})\] where \[V_k(i) = \log(v_k(i)), A_{kl} = \log(a_{kl}),E_l(x_{i+1}) = \log(e_l(x_{i+1}))\]
Viterbi algorithm (log units)
Initialisation \(i = 0\): \(V_0(0) = 0\), \(V_k(0) = -\infty\) for \(k>0\). Recursion \(i = 1,\ldots,L\): \(V_l(i) = E_l(x_i) + \max_k(V_k(i-1) + A_{kl})\). \(Pointer_i(l) = \arg \max_k(V_k(i-1) +A_{kl})\)
Termination: \(P(x,\pi^*) = \max_k (V_k(L) + A_{k0})\).
Traceback \(i = L,\ldots,1\): \(\pi^*_L = \arg \max_k(V_k(L) + A_{k0})\). \(\pi_{i-1}^* = Pointer_i(\pi_i^*)\).
\(\pi\) is the state sequence with ith state \(\pi_i\)
Transition probability from state \(k\) to state \(l\) is \(a_{kl} = P(\pi_i = l | \pi_{i-1} = k)\)
Symbol sequence is \(x\), with ith symbol \(x_i\)
often use \(b\) for the generic symbol
Emission probability is probability of emitting \(b\) from state \(k\) \(e_{k}(b) = P(x_i = b | \pi_i = k)\)
Decoding ¡ª given model and output, find most likely state path (use Viterbi algorithm)
Evaluation ¡ª given model and output, what is probability model generated that output (use forward or backward algorithms)
Posterior decoding ¡ª given model and output, what is probability of being in each state at each time (use forward-backward algorithm)
Learning ¡ª given model structure and output, find parameters of model (use Baum-Welch algorithm)