程序代写代做代考 data mining Hidden Markov Mode Bayesian algorithm Data Mining and Machine Learning

Data Mining and Machine Learning
HMM Adaptation Peter Jančovič
Slide 1
Data Mining and Machine Learning

Objectives
 So far we talked about Maximum Likelihood training for HMMs (the E-M algorithm)
– Viterbi-style training
– Baum-Welch algorithm
 In this session, we talk about HMM adaptation: – Maximum A-Posteriori (MAP) estimation
– Maximum Likelihood Linear Regression (MLLR)
Slide 2
Data Mining and Machine Learning

Adaptation
 A modern large-vocabulary continuous speech recognition system has many thousands of parameters
 Many hours of speech data used to train the system (e.g. 200+ hours!)
 Speech data comes from many speakers
 Hence recogniser is ‘speaker independent’
 But performance for an individual would be better if the system were speaker dependent
Slide 3
Data Mining and Machine Learning

Adaptation
 For a single speaker, only a small amount of training data is available
 Viterbi reestimation or Baum-Welch reestimation will not work
 Adaptation:
– the problem of robustly adapting a large number of model
parameters using a small amount of training data
Slide 4
Data Mining and Machine Learning

‘Parameters vs training data’
Larger training set
Performance
Smaller training set
Number of parameters
Slide 5
Data Mining and Machine Learning

Adaptation
 Two common approaches to adaptation (with small amounts of training data)
– Bayesian adaptation (also known as MAP adaptation (MAP = Maximum a Posteriori))
– Transform-based adaptation (also known as MLLR (MLLR = Maximum Likelihood Linear Regression))
Slide 6
Data Mining and Machine Learning

Bayesian (MAP) adaptation
 MAP estimation maximises the posterior probability
of M given the data y, i.e., P(M | y)
 From Bayes’ Theorem:
 P(M) is the prior probability of M
 p(y | M) is the likelihood of the adaptation data on M
Slide 7
Data Mining and Machine Learning

Bayesian (MAP) adaptation
 Uses well-trained, ‘speaker-independent’ HMM as a prior P(M) for the estimate of the parameters of the speaker dependent HMM
 E.G:
Speaker independent state PDF (Prior model)
1.4 1.2 1 0.8 0.6 0.4 0.2 0
MAP estimate of mean
Sample mean (speaker- dependent model)
-5 -4 -3 -2 -1-0.20 1 2 3 4 5
Slide 8
Data Mining and Machine Learning

Bayesian (MAP) adaptation
MAP model Prior model ‘Speaker- dependent’
model
 Intuitively, if the adaptation data set y is big, then the MAP adapted model will be biased towards y, so  will be small
 Conversely, if there is very little adaptation data, the MAP model will be biased towards the prior, so  will be big
Slide 9
Data Mining and Machine Learning

Transform-based adaptation (MLLR)
 Maximum Likelihood Linear Regression (MLLR) is another method for adapting the mean vectors of a set of HMMs
 Estimate a linear transform to transform speaker-independent into speaker-dependent parameters
 Suppose that MSI is a speaker-independent HMM with Gaussian Mixture state output PDFs
 Suppose A is linear transformation on the D-dimensional space of acoustic vectors and that b is an acoustic vector
 Let MSD = T(MSI) be the HMM derived from MSI by replacing each Gaussian mean vector  with A + b
Slide 10
Data Mining and Machine Learning

MLLR adaptation
 Given data y from a new speaker, the aim of MLLR is to find
A and b such that P(y|T(MSI)) is maximised
 … hence Maximum Likelihood LR
 Need to estimate the D  D parameters of A
 Each acoustic vector is typically 40 dimensional, so a linear transform of the acoustic data needs 40*40 = 1600 parameters
 This is much less than the 10s or 100s of thousands of parameters needed to train the whole system
 Same transformation A can be used for all models and states.
 Alternatively, if there is enough data from the new speaker, a separate transformation can be estimated for each model, state, or set of states
Slide 11
Data Mining and Machine Learning

Transform-based adaptation
Speaker- independent parameters
Speaker-dependent data points
Adapted parameters
‘best fit’ transform
Slide 12
Data Mining and Machine Learning

Summary
 Bayesian (MAP) adaptation
– J-L Gauvain and C-H Lee, “Bayesian learning for Hidden Markov Models with Gaussian mixture state observation densities”, Speech Communication 11, pp 205-213, 1992
 Transform-based (MLLR) adaptation
– C J Leggeter and P C Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density HMMs”, Computer Speech and Language, 9, pp 171-186, 1995
Slide 13
Data Mining and Machine Learning