Data Mining and Machine Learning
Statistical Modelling of Sequences (2) Peter Jančovič
Slide 1
Data Mining and Machine Learning
Objectives
So far, we introduced Markov models
Hidden Markov models (HMMs)
Calculating the probability of an observation sequence
The Forward Probability calculation
HMM training
Slide 2
Data Mining and Machine Learning
Hidden Markov Models (HMMs)
Slide 3
Data Mining and Machine Learning
HMMs (continued)
In a HMM we assume that the current item purchased depends only on the current state (shop) and not on items previously purchased or shops previously visited
Suppose xt is the state and yt is the item purchased at time t
The diagram indicates the dependencies:
Visible Hidden
Slide 4
Data Mining and Machine Learning
Markov Model
a11
a22
a33 a12
a24
a44
a23
a34
X={xt} Y={yt}
Slide 5
Data Mining and Machine Learning
Hidden Markov Model
In a hidden Markov model, the relationship between the observation sequence and the state
sequence is ambiguous.
a11 a22
a12 a23
a33 a24
a44
X={xt} Y={yt}
a34
Slide 6
Data Mining and Machine Learning
HMMs Continued
Slide 7
Data Mining and Machine Learning
Formal definition of a HMM
An N state HMM with observations {I1,…,IM} comprises:
An underlying N state Markov model, defined by an initial state probability vector P0 and N×N state transition probability matrix A, where:
– P0(n) = P(x1 = Sn)
– Anm = P(xt = Sm | xt-1 = Sn)
An N×M state output probability matrix B where
Slide 8
– Bnm = Bn(m) = P(yt = Im | xt = Sn) Data Mining and Machine Learning
Example HMM Probability Calculation
0.5 0.5 0.2 0.3 P 0.2, A 0.1 0.1 0.8
0.3 0.4 0 0.6
0
Slide 9
Data Mining and Machine Learning
Example (Continued)
Slide 10
Data Mining and Machine Learning
Example (Continued)
Slide 11
Data Mining and Machine Learning
Example (Continued)
Slide 12
Data Mining and Machine Learning
Calculating the probability of an observed sequence
Even in our simple example with 3 states and 2 observations there are 9 terms in the summation
In general, if the Markov model is fully connected and has N states, and we have T observations, then the number of state sequences (and therefore the number of terms in the summation) is NT. This makes direct calculation of P(I) computationally impractical.
However, there is an efficient solution….
Slide 13
Data Mining and Machine Learning
The Forward Probability calculation
Slide 14
Data Mining and Machine Learning
Slide 15
Data Mining and Machine Learning
Slide 16
Data Mining and Machine Learning
Example
Slide 17
Data Mining and Machine Learning
Example
Slide 18
Data Mining and Machine Learning
Example
Slide 19
Data Mining and Machine Learning
Example
Slide 20
Data Mining and Machine Learning
Example
Slide 21
Data Mining and Machine Learning
HMM Parameter Estimation
Given a HMM and a sequence y we can calculate P(y)
But where does the HMM come from? In other words how do we estimate the HMM’s parameters?
This is done from data, using an algorithm similar to the E-M algorithm for estimating the parameters of a GMM
The HMM training algorithm is called the Baum- Welch algorithm
Like the E-M algorithm, it involves making an initial estimate and then iteratively improving the estimate until convergence. Hence it is only locally optimal
Slide 22
Data Mining and Machine Learning
HMM training
1. Make an initial estimate of the HMM – M0
2. Obtain a large set of training data Y
3. Set i=1
4. Apply the Baum-Welch algorithm to Y and Mi-1 to get a new model Mi such that P(Y|Mi) ≥P(Y|Mi-1)
5. If |P(Y|Mi) – P(Y|Mi-1)| ≤ ε then stop, else 1. i=i+1
2. Go back to step 4.
Slide 23
Data Mining and Machine Learning
Local optimality P(Y|M)
Global maximum
Local maximum
M0 M1…Mn
Slide 24
Data Mining and Machine Learning
Summary
Hidden Markov Models
Calculating the probability of an observation sequence
The forward probability calculation
HMM training
Slide 25
Data Mining and Machine Learning