CS计算机代考程序代写 algorithm Learning HMMs: Baum-Welch

Learning HMMs: Baum-Welch
J&M Appendix A

CMPSC 442
Week 10, Meeting 28, Three Segments

Outline

● Backward algorithm
● Formulae for estimating the parameters
● Baum-Welch

2Outline, Wk 10, Mtg 28

Learning HMMs: Baum-Welch
J&M Appendix A

CMPSC 442
Week 10, Meeting 28, Segment 1 of 3: Backward Algorithm

Developing the Parameters for an HMM

● In previous meetings, we assumed that the parameters of the HMM
model are known
○ Often these parameters are estimated on annotated training data
○ Annotation is often difficult and/or expensive
○ Training data might be different from a new dataset of interest

● Parameters can be learned from unlabeled data
○ In principle, the goal is to maximize the parameters with respect to the

current data
■ Develop a model , such that

4Backward Algorithm

Learning the HMM Parameters

● Unfortunately, there is no known way to analytically find a global
maximum, i.e., a model , such that

● It is possible to find a local maximum

● Theorem: Given an initial model , we can always find a model ,
such that

5Backward Algorithm

Iterative Re-Estimation of Parameters

● Key Idea: parameter re-estimation by hill-climbing
● Iteratively re-estimate the HMM parameters until convergence

○ Initialize with some parameter values
○ With these initial parameters, compute the expected counts of states, and

expected state transition counts, given the observation sequences
○ Using these expectations, which approximate MLE counts over the data

(were the state labels available), we then recompute the transition
probabilities and emission probabilities

6Backward Algorithm

Baum-Welch

● Baum-Welch algorithm uses Expectation Maximization to iteratively re-estimate
the parameters yielding at each iteration a new
○ Initializes to a random set of values, then for each iteration:
○ Calculates , from left to right (forward) and from right to left (backward) to

get the probability of the states i and j at times t and t+1
■ Every state transition from t to t+1 occurs as part of a sequence
■ For all transitions, we compute the forward probability from the sequence start to t, and

the backward probability from the sequence end back to t+1
○ Re-estimates

● Requires an algorithm for backward computation through the trellis

7Backward Algorithm

Backward Algorithm

● Given that we are in state i at time t, the probability β of seeing the
observations from time t + 1 to T, given λ, is:

● The Forward and Backward algorithms give same total probabilities

over the observations: P(O)=P(o1,o2,…,oT )

8Backward Algorithm

The backward probability βt (i) is symmetrical to αt (i) in the Forward
Algorithm:
● Initialization:

● Induction:

● Termination:

Backward Algorithm

9Backward Algorithm

Backward Induction Step Illustrated

10Backward Algorithm
Figure from Martin & Jurafsky, 3rd Ed., Appendix A

Parameter Re-estimation

● Three sets of HMM parameters need to be iteratively estimated
○ Initial state distribution: πi
○ Transition probabilities: ai,j
○ Emission probabilities: bi(ot)

11Backward Algorithm

Learning HMMs: Baum-Welch
J&M Appendix A

CMPSC 442
Week 10, Meeting 28, Segment 2 of 3: Formulae for
Estimating the Parameters

Likelihood of the Observations

● To compute the likelihood for o1,o2,…,oT as P(0|λ) we would want all the
paths through the trellis and the associated probabilities

● Given that we do not have λ, we use an approximation λ’ to compute
an expectation for each sequence of observations

● Given the Markov assumptions, we need only consider the probability
of every state transition i, j for every observation at every t

● Given all our sequences o1,o2,…,oT the probability of every state
transition with an emission at t+1 is derivable from the forward
probability from 1 to t, and the backward probability from T to t+1

13

Using the Forward – Backward Algorithm

14

Figure from Martin & Jurafsky, 3rd Ed., Appendix A

Backward Algorithm

Expectations for the Transition Probabilities

● Assume we have an estimate of the transition probability aij at a
particular point in time t in the observation sequence

● Then the estimate of the total count for sums over all t
○ Formulate an expression for the expected joint probability ξt(i,j) of state i

at time t and state j at time t+1, based on observation sequences
○ For the numerator of , sum over ξt(i,j) for all t
○ For the denominator, sum over ξt (i,k) for all t and for all transitions i,k

15Formulae for Parameter Estimates

Deriving the Expression for the Expectation

● The expression ξt(i,j) can be formulated as the joint probability of the
states at i and j conditioned on our observation sequences and the
parameters:

● Using the forward-backward computation, we can compute the
following:

16

Deriving the Expression for the Expectation

● We want:

● We have:

● We divide 𝝐
t
(i,j) by P(0|λ) to get ξt(i,j)

○ Because P(X|Y,Z) = P(X,Y|Z)/P(Y|Z)

17Formulae for Parameter Estimates

Deriving ξt(i,j)

1. By laws of probability
2. : the probability of the whole observation sequence
3. Divide εt(i,j) by to get ξt(i,j)

18Formulae for Parameter Estimates

Formulation for Estimating the Transition Probabilities

● Given the expression ξt(i,j), we can now sum over all t
● Numerator sums over all probabilities of states i at t and states j at t+1
● Denominator sums over all probabilities of states i at t and states k at

t+1 for all k in full set of states Q

19Formulae for Parameter Estimates

Expectations of the Emission Probabilities

● Given each ot estimate the probability γt(j) of being in state j at time t

20Formulae for Parameter Estimates

Estimating the Emission Probabilities (Sensor Model)

21Formulae for Parameter Estimates

Estimating the Priors for Each State

● Initial state distribution: is the probability that qi is a start state (t = 1)

22Formulae for Parameter Estimates

Learning HMMs: Baum-Welch
J&M Appendix A

CMPSC 442
Week 10, Meeting 28, Segment 3 of 3: Baum-Welch

Baum-Welch

● An example of Expectation Maximization
● E-Step: Compute expected values of the states j at times t using γt(j)

and of the transitions i,j from t to t+1 using ξt(i,j)
● M-Step: From these expected values, compute new parameter

estimates and
● Iterate until convergence

24Baum-Welch

Baum-Welch Expectation Step

25Baum-Welch

Baum-Welch Maximization Step

26Baum-Welch

Theory versus Practice

● In theory, Baum-Welch can do completely unsupervised learning of the
parameters

● In practice, the initialization is very important

27Baum-Welch

Summary

● Backward algorithm moves through the trellis backward to compute
the backward probabilities of paths

● Formulae for estimating the parameters of an HMM require a
combination of forward and backward computation through the trellis

● Baum-Welch is an example of the Expectation Maximization algorithm
which uses hill-climbing to find the optimal assignment of parameter
values that accounts for the observed sequences

28Summary, Wk 10, Mtg 28