CS计算机代考程序代写 chain Hidden Markov Mode algorithm Statistical Machine Learning

Statistical Machine Learning

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Outlines

Overview
Introduction
Linear Algebra

Probability

Linear Regression 1

Linear Regression 2

Linear Classification 1

Linear Classification 2

Kernel Methods
Sparse Kernel Methods

Mixture Models and EM 1
Mixture Models and EM 2
Neural Networks 1
Neural Networks 2
Principal Component Analysis

Autoencoders
Graphical Models 1

Graphical Models 2

Graphical Models 3

Sampling

Sequential Data 1

Sequential Data 2

1of 821

Statistical Machine Learning

Christian Walder

Machine Learning Research Group
CSIRO Data61

and

College of Engineering and Computer Science
The Australian National University

Canberra
Semester One, 2020.

(Many figures from C. M. Bishop, “Pattern Recognition and Machine Learning”)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

762of 821

Part XXII

Sequential Data 2

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

763of 821

Example of a Hidden Markov Model

Assume Peter and Mary are students in Canberra and Sydney,
respectively. Peter is a computer science student and only
interested in riding his bicycle, shopping for new computer
gadgets, and studying. (Well, he also does other things but
because these other activities don’t depend on the weather we
neglect them here.)
Mary does not know the current weather in Canberra, but
knows the general trends of the weather in Canberra. She also
knows Peter well enough to know what he does on average on
rainy days and also when the skies are blue.
She believes that the weather (rainy or not rainy) follows a
given discrete Markov chain. She tries to guess the sequence
of weather patterns for a number of days after Peter tells her on
the phone what he did in the last days.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

764of 821

Example of a Hidden Markov Model

Mary uses the following HMM

rainy sunny
initial probability 0.2 0.8

transition probability rainy 0.3 0.7
sunny 0.4 0.6

emission probability cycle 0.1 0.6
shop 0.4 0.3
study 0.5 0.1

Assume, Peter tells Mary that the list of his activities in the last
days was [′cycle′,′ shop′,′ study′]

(a) Calculate the probability of this observation sequence.
(b) Calculate the most probable sequence of hidden states for

these observations.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

765of 821

Hidden Markov Model

The trellis for the hidden Markov model
Find the probability of ’cycle’, ’shop’, ’study’

Rainy
0.2

Sunny
0.8

Rainy

Sunny

0.4

0.7

0.3

0.6

Rainy

Sunny

0.4

0.7

0.3

0.6

cycle

shop

study

0.5
0.4

0.1 0.6 0.3

0.1
cycle

shop

study

0.5
0.4

0.1 0.6 0.3

0.1
cycle

shop

study

0.5
0.4

0.1 0.6 0.3

0.1

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

766of 821

Hidden Markov Model

Find the most probable hidden states for ’cycle’, ’shop’,
’study’

Rainy
0.2

Sunny
0.8

Rainy

Sunny

0.4

0.7

0.3

0.6

Rainy

Sunny

0.4

0.7

0.3

0.6

cycle

shop

study

0.5
0.4

0.1 0.6 0.3

0.1
cycle

shop

study

0.5
0.4

0.1 0.6 0.3

0.1
cycle

shop

study

0.5
0.4

0.1 0.6 0.3

0.1
cycle

shop

study

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

767of 821

Homogeneous Hidden Markov Model

Joint probability distribution over both latent and observed
variables

p(X,Z |θ) = p(z1 |π)

[
N∏

n=2

p(zn | zn−1,A)

]
N∏

m=1

p(xm | zm, φ)

where X = (x1, . . . , xN), Z = (z1, . . . , zN), and
θ = {π,A, φ}.
Most of the discussion will be independent of the particular
choice of emission probabilities (e.g. discrete tables,
Gaussians, mixture of Gaussians).

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

768of 821

Maximum Likelihood for HMM

We have observed a data set X = (x1, . . . , xn).
Assume it came from a HMM with a given structure
(number of nodes, form of emission probabilities).
The likelihood of the data is

p(X |θ) =

Z

p(X,Z |θ)

This joint distribution does not factorise over n (as with the
mixture distribution).
We have N variables, each with K states : KN terms.
Number of terms grows exponentially with the length of the
chain.
But we can use the conditional independence of the latent
variables to reorder their calculation later.
Further obstacle to find a closed loop maximum likelihood
solution: calculating the emission probabilities for different
states zn.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

769of 821

Maximum Likelihood for HMM – EM

Employ the EM algorithm to find the Maximum Likelihood
for HMM.

Start with some initial parameter settings θold.
E-step: Find the posterior distribution of the latent
variables p(Z |X,θold).
M-step: Maximise

Q(θ,θold) =

Z

p(Z |X,θold) ln p(Z,X |θ)

with respect to the parameters θ = {π,A, φ}.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

770of 821

Maximum Likelihood for HMM – EM

Employ the EM algorithm to find the Maximum Likelihood
for HMM.
Start with some initial parameter settings θold.

E-step: Find the posterior distribution of the latent
variables p(Z |X,θold).
M-step: Maximise

Q(θ,θold) =

Z

p(Z |X,θold) ln p(Z,X |θ)

with respect to the parameters θ = {π,A, φ}.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

771of 821

Maximum Likelihood for HMM – EM

Employ the EM algorithm to find the Maximum Likelihood
for HMM.
Start with some initial parameter settings θold.
E-step: Find the posterior distribution of the latent
variables p(Z |X,θold).

M-step: Maximise

Q(θ,θold) =

Z

p(Z |X,θold) ln p(Z,X |θ)

with respect to the parameters θ = {π,A, φ}.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

772of 821

Maximum Likelihood for HMM – EM

Employ the EM algorithm to find the Maximum Likelihood
for HMM.
Start with some initial parameter settings θold.
E-step: Find the posterior distribution of the latent
variables p(Z |X,θold).
M-step: Maximise

Q(θ,θold) =

Z

p(Z |X,θold) ln p(Z,X |θ)

with respect to the parameters θ = {π,A, φ}.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

773of 821

Maximum Likelihood for HMM – EM

Denote the marginal posterior distribution of zn by γ(zn),
and
the joint posterior distribution of two successive latent
variables by ξ(zn−1, zn)

γ(zn) = p(zn |X,θold)
ξ(zn−1, zn) = p(zn−1, zn |X,θold).

For each step n, γ(zn) has K nonnegative values which
sum to 1.
For each step n, ξ(zn−1, zn) has K × K nonnegative values
which sum to 1.
Elements of these vectors are denoted by γ(znk) and
ξ(zn−1,j, znk) respectively.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

774of 821

Maximum Likelihood for HMM – EM

Because the expectation of a binary random variable is the
probability that it is one, we get with this notation

γ(znk) = E [znk] =

zn

γ(zn)znk

ξ(zn−1,j, znk) = E [zn−1,j znk] =

zn−1,zn

ξ(zn−1, zn)zn−1,j znk.

Putting all together we get

Q(θ,θold) =
K∑

k=1

γ(z1k) lnπk +
N∑

n=2

K∑
j=1

K∑
k=1

ξ(zn−1,j, znk) lnAjk

+

N∑
n=1

K∑
k=1

γ(znk) ln p(xn |φk).

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

775of 821

Maximum Likelihood for HMM – EM

M-step: Maximising

Q(θ,θold) =
K∑

k=1

γ(z1k) lnπk +
N∑

n=2

K∑
j=1

K∑
k=1

ξ(zn−1,j, znk) lnAjk

+

N∑
n=1

K∑
k=1

γ(znk) ln p(xn |φk).

results in

πk =
γ(z1k)∑K
j=1 γ(z1j)

Ajk =
∑N

n=2 ξ(zn−1,j, znk)∑K
l=1

∑N
n=2 ξ(zn−1,j, znl)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

776of 821

Maximum Likelihood for HMM – EM

Still left: Maximising with respect to φ.

Q(θ,θold) =
K∑

k=1

γ(z1k) lnπk +
N∑

n=2

K∑
j=1

K∑
k=1

ξ(zn−1,j, znk) lnAjk

+

N∑
n=1

K∑
k=1

γ(znk) ln p(xn |φk).

But φ only appears in the last term, and under the
assumption that all the φk are independent of each other,
this term decouples into a sum.
Then maximise each contribution

∑N
n=1 γ(znk) ln p(xn |φk)

individually.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

777of 821

Maximum Likelihood for HMM – EM

In the case of Gaussian emission densities

p(x |φk) = N (x |µk,Σk),

we get for the maximising parameters for the emission
densities

µk =

∑N
n=1 γ(znk) xn∑N

n=1 γ(znk)

Σk =

∑N
n=1 γ(znk) (xn − µk)(xn − µk)

T∑N
n=1 γ(znk)

.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

778of 821

Forward-Backward HMM

Need to efficiently evaluate the γ(znk) and ξ(zn−1,j, znk).
The graphical model for the HMM is a tree!
We know we can use a two-stage message passing
algorithm to calculate the posterior distribution of the latent
variables.
For HMM this is called the forward-backward algorithm
(Rabiner, 1989), or Baum-Welch algorithm (Baum, 1972).
Other variants exist, different only in the form of the
messages propagated.
We look at the most widely known, the alpha-beta
algorithm.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

779of 821

Conditional Independence for HMM

Given the data X = {x1, . . . , xN} the following independence
relations hold:

p(X | zn) = p(x1, . . . , xn | zn) p(xn+1, . . . , xN | zn)
p(x1, . . . , xn−1 | xn, zn) = p(x1, . . . , xn−1 | zn)

p(x1, . . . , xn−1 | zn−1, zn) = p(x1, . . . , xn−1 | zn−1)
p(xn+1, . . . , xN | zn, zn+1) = p(xn+1, . . . , xN | zn+1)

p(xn+2, . . . , xN | xn+1, zn+1) = p(xn+2, . . . , xN | zn+1)
p(X | zn−1, zn) = p(x1, . . . , xn−1 | zn−1) p(xn | zn)

p(xn+1, . . . , xN | zn)
p(xN+1 |X, zN+1) = p(xN+1 | zN+1)

p(zn+1 |X, zN) = p(zN+1 | zN)

zn−1 zn zn+1

xn−1 xn xn+1

z1 z2

x1 x2

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

780of 821

Conditional Independence for HMM – Example

Let’s look at the following independence relation:

p(X | zn) = p(x1, . . . , xn | zn) p(xn+1, . . . , xN | zn)

Any path from the set {x1, . . . , xn} to the set {xn+1, . . . , xN}
has to go through zn.
In p(X | zn) the node zn is conditioned on (= observed).
All paths from x1, . . . , xn−1 through zn to xn+1, . . . , xN are
head-tail.
The path from xn through zn to zn+1 is tail-tail, so blocked.
Therefore x1, . . . , xn ⊥⊥ xn+1, . . . , xN | zn.

zn−1 zn zn+1

xn−1 xn xn+1

z1 z2

x1 x2

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

781of 821

Alpha-Beta HMM

Define the joint probability of observing all data up to step
n and having zn as latent variable to be

α(zn) = p(x1, . . . , xn, zn).

Define the probability of all future data given zn to be

β(zn) = p(xn+1, . . . , xN | zn).

Then it an be shown the following recursions hold

α(zn) = p(xn | zn)

zn−1

α(zn−1) p(zn | zn−1)

α(z1) =
K∏

k=1

{πk p(x1 |φk)}z1k

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

782of 821

Alpha-Beta HMM

At step n we can efficiently calculate α(zn) given α(zn−1)

α(zn) = p(xn | zn)

zn−1

α(zn−1) p(zn | zn−1)

k = 1

k = 2

k = 3

n− 1 n

α(zn−1,1)

α(zn−1,2)

α(zn−1,3)

α(zn,1)
A11

A21

A31

p(xn|zn,1)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

783of 821

Alpha-Beta HMM

And for β(zn) we get the recursion

β(zn) =

zn+1

β(zn+1) p(xn+1 | zn+1) p(zn+1 | zn)

k = 1

k = 2

k = 3

n n+ 1

β(zn,1) β(zn+1,1)

β(zn+1,2)

β(zn+1,3)

A11

A12

A13

p(xn|zn+1,1)

p(xn|zn+1,2)

p(xn|zn+1,3)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

784of 821

Alpha-Beta HMM

How do we start the β recursion? What is β(zN) ?

β(zN) = p(xN+1, . . . , xN | zn).

Can be shown the following is consistent with the
approach

β(zN) = 1.

k = 1

k = 2

k = 3

n n+ 1

β(zn,1) β(zn+1,1)

β(zn+1,2)

β(zn+1,3)

A11

A12

A13

p(xn|zn+1,1)

p(xn|zn+1,2)

p(xn|zn+1,3)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

785of 821

Alpha-Beta HMM

Now we know how to calculate α(zn) and β(zn) for each
step.
What is the probability of the data p(X) ?
Use the definition of γ(zn) and Bayes

γ(zn) = p(zn |X) =
p(X | zn) p(zn)

p(X)
=

p(X, zn)
p(X)

and the following conditional independence statement
from the graphical model of the HMM

p(X | zn) = p(x1, . . . , xn | zn)︸ ︷︷ ︸
α(zn)/p(zn)

p(xn+1, . . . , xN | zn)︸ ︷︷ ︸
β(zn)

and therefore
γ(zn) =

α(zn)β(zn)
p(X)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

786of 821

Alpha-Beta HMM

Marginalising over zn results in

1 =

zn

γ(zn) =

zn α(zn)β(zn)
p(X)

and therefore at each step n

p(X) =

zn

α(zn)β(zn)

Most conveniently evaluated at step N where β(zN) = 1 as

p(X) =

zN

α(zn).

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

787of 821

Alpha-Beta HMM

Finally, we need to calculate the joint posterior distribution
of two successive latent variables by ξ(zn−1, zn) defined by

ξ(zn−1, zn) = p(zn−1, zn |X).

This can be done directly from the α and β values in the
form

ξ(zn−1, zn) =
α(zn−1) p(xn | zn) p(zn | zn−1)β(zn)

p(X)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

788of 821

How to train a HMM using EM?

1 Make an inital selection for the parameters θold where
θ = {π,A, φ)}. (Often, A,π initialised uniformly or
randomly. The φk-initialisation depends on the emission
distribution; for Gaussians run K-means first and get µk
and Σk from there.)

2 (Start of E-step) Run forward recursion to calculate α(zn).
3 Run backward recursion to calculate β(zn).
4 Calculate γ(z) and ξ(zn−1, zn) from α(zn) and β(zn).
5 Evaluate the likelihood p(X).
6 (Start of M-step) Find a θnew maximising Q(θ,θold). This

results in new settings for the parameters πk,Ajk and φk as
described before.

7 Iterate until convergence is detected.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

789of 821

Alpha-Beta HMM – Notes

In order to calculate the likelihood, we need to use the joint
probability p(X,Z) and sum over all possible values of Z.
That means, every particular choice of Z corresponds to
one path through the lattice diagram. There are
exponentially many !
Using the alpha-beta algorithm, the exponential cost has
been reduced to a linear cost in the length of the model.
How did we do that?
Swapping the order of multiplication and summation.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

790of 821

HMM – Viterbi Algorithm

Motivation: The latent states can have some meaningful
interpretation, e.g. phonemes in a speech recognition
system where the observed variables are the acoustic
signals.
Goal: After the system has been trained, find the most
probable states of the latent variables for a given
sequence of observations.
Warning: Finding the set of states which are each
individually the most probable does NOT solve this
problem.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

791of 821

HMM – Viterbi Algorithm

Define ω(zn) = maxz1,…,zn−1 ln p(x1, . . . , xn, z1, . . . , zn)
From the joint distribution of the HMM given by

p(x1, . . . , xN , z1, . . . , zN) = p(z1)

[
N∏

n=2

p(zn | zn−1)

]
N∏

n=1

p(xn | zn)

the following recursion can be derived

ω(zn) = ln p(xn | zn) + max
zn−1
{ln p(zn | zn−1) + ω(zn−1)}

ω(z1) = ln p(z1) + ln p(x1 | z1) = ln p(x1, z1)

k = 1

k = 2

k = 3

n− 2 n− 1 n n + 1

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

A Simple Example

Maximum Likelihood for
HMM

Forward-Backward
HMM

Conditional
Independence

Alpha-Beta HMM

How to train a HMM
using EM?

HMM – Viterbi Algorithm

792of 821

HMM – Viterbi Algorithm

Calculate

ω(zn) = max
z1,…,zn−1

ln p(x1, . . . , xn, z1, . . . , zn)

for n = 1, . . . ,N.
For each step n remember which is the best transition to
go into each state at the next step.
At step N : Find the state with the highest probability.
For n = 1, . . . ,N − 1: Backtrace which transition led to the
most probable state and identify from which state it came.

k = 1

k = 2

k = 3

n− 2 n− 1 n n + 1

Sequential Data 1
Motivation
Stationary versus Nonstationary
Markov Model
State Space Model
Hidden Markov Model
HMM – Generative View
HMM – Handwritten Digits