程序代写代做代考 data mining algorithm GMM Data Mining and Machine Learning

Data Mining and Machine Learning
HMM Training Peter Jančovič
Slide 1
Data Mining and Machine Learning

Objectives
 Reminder: Maximum Likelihood (ML) parameter estimation
– ML for Gaussian PDFs and for GMMs
 ML for HMMs
– Viterbi-style training
– Baum-Welch algorithm
Slide 2
Data Mining and Machine Learning

Fitting a Gaussian PDF to Data
 Suppose y = y1,…,yn,…,yT is a sequence of T data values
 Given a Gaussian PDF p with mean  and variance
, define:
py|,py |,
t1
 How do we choose  and ?
 Define the best fitting Gaussian to be the one such that p(y|,) is maximised – Maximum Likelihood (ML) estimation of ,
T
t
Slide 3
Data Mining and Machine Learning

ML estimation of ,  Intuitively:
– The maximum likelihood estimate of  should be the average value of y1,…,yT, (the sample mean)
– The maximum likelihood estimate of  should be the variance of y1,…,yT. (the sample variance)
 This turns out to be true: p(y| , ) is maximised by
setting:
1T 1T
 y,  y
t
T t1 T t1
t
2
Slide 4
Data Mining and Machine Learning

ML training for GMMs
 Now consider
– A Gaussian Mixture Model with M components has
– M means: 1,…,M
– M variances 1,…,M
– M mixture weights w1,…,wM
– A training sequence y1,…,yT
 How do we find the maximum likelihood estimate
of 1,…,M, 1,…,M, w1,…,wM? Data Mining and Machine Learning
Slide 5

GMM Parameter Estimation
 If we knew which component each sample yt came from, then parameter estimation would be easy
– Set m to be the average value of the samples which belong to the mth component
– Set m to be the variance of the samples which belong to the mth component
– Set wm to be the proportion of samples which belong to the mth component
 But we don’t know which component each sample belongs to
Slide 6
Data Mining and Machine Learning

Solution – the E-M Algorithm (1)  Guess initial values
0,…,0,0,…,0,w0,…,w0 1M1M1M
1. For each m calculate the probabilities p y  py |0,0
mttmm
Slide 7
m,t Pm|yt
Data Mining and Machine Learning
2. Use these probabilities to estimate how much each sample yt ‘belongs to’ the mth component

Solution – the E-M Algorithm (2) 3. Calculate the new GMM parameters
This is a measure of how muchyt ‘belongsto’the mth component
1  t1  mT
T
 y
1  t1 m T
m,t t 
m,t t1
T  (y (1))2 m,t t m
m,t t1
REPEAT steps 1-3
Slide 8
Data Mining and Machine Learning

Calculation of λm,t
 In other words, λm,t is the probability of the mth
component given the data point yt  From Bayes’ theorem
Calculate from mth Gaussian component
mth
weight
m,t P(m|yt)p(yt|m)P(m) p(yt )
pm ( yt )
w m
M
p (y )w
ktk k1
Sum over all components
Slide 9
Data Mining and Machine Learning

ML training for HMMs  Now consider
– An N state HMM M, each of whose states is associated with a Gaussian PDF
– A training sequence y1,…,yT
 For simplicity assume that each yt is 1-dimensional
Slide 10
Data Mining and Machine Learning

ML training for HMMs
 If we knew that:
– y1,…,ye(1) correspond to state 1
– ye(1)+1,…,ye(2) correspond to state 2 –:
– ye(n-1)+1,…,ye(n) correspond to state n –:
 Then we could set the mean of state n to the average value of ye(n-1)+1,…,ye(n)
Slide 11
Data Mining and Machine Learning

ML Training for HMMs
Slide 12
y1,…,ye(1), ye(1)+1,…,ye(2), ye(2)+1,…,yT Unfortunately we don’t know that ye(n-1)+1,…,ye(n)
correspond to state n…
Data Mining and Machine Learning

Solution
1. Define an initial HMM – M0
2. Use the Viterbi algorithm to compute the optimal state sequence between M0 and y1,…,yT
y1 y2 y3 … … yt … yT
Slide 13
Data Mining and Machine Learning

Solution (continued)
 Use optimal state sequence to segment y
y1 ye(1) ye(1)+1 … … ye(2) … yT
 Reestimate parameters to get a new model M1
Slide 14
Data Mining and Machine Learning

Solution (continued)
 Now repeat whole process using M1 instead of M0,
to get a new model M2
 Then repeat again using M2 to get a new model M3 ….
p(y|M0) p(y|M1) p(y|M2) …. p(y|Mn)….
Slide 15
Data Mining and Machine Learning

Local optimization
Local optimum
Global optimum
p(y|M)
M0 M1…Mn
M
Slide 16
Data Mining and Machine Learning

Baum-Welch optimization
 The algorithm just described is often called Viterbi training or Viterbi reestimation
 It is often used to train large sets of HMMs
 An alternative method is called Baum-Welch reestimation – it is a soft version of the Viterbi estimation
 Reestimation of mean value associated with state i:
Slide 17
Data Mining and Machine Learning

Baum-Welch Reestimation
P(xt=i|Y) = t(i) y1 y2 y3 y4 y5 yt yt+1yT
Slide 18
Data Mining and Machine Learning

‘Forward’ Probabilities
y1 y2 y3 y4 y5 yt yt+1yT
Slide 19
Data Mining and Machine Learning

‘Backward’ Probabilities
t(i)Prob(yt1,…,yT |xt i,M)aijt1(j)bj(yt1) j
y1 y2 y3 y4 y5 yt yt+1yT
Slide 20
Data Mining and Machine Learning

‘Forward-Backward’ Algorithm
y1 y2 y3 y4 y5 yt yt+1 yT
Slide 21
Data Mining and Machine Learning

Notes on HMM parameter estimation
 The Baum-Welch/Viterbi algorithm is only guaranteed to find a locally optimal HMM set – hence choice of M0 can be important
 Baum-Welch/Viterbi is a supervised training algorithm which requires labelled speech data
 The labelling need not be at the same level as the HMM set – phoneme level HMMs can be trained using data labelled orthographically at the phrase or sentence level
 For large applications B-W reestimation can be very computationally expensive
Slide 22
Data Mining and Machine Learning

Summary
 Maximum Likelihood (ML) estimation
 Viterbi HMM parameter estimation
 Baum-Welch HMM parameter estimation – Forward and backward probabilities
Slide 23
Data Mining and Machine Learning