Data Mining and Machine Learning
HMMs for Automatic Speech Recognition: Types of HMMs
Peter Jančovič Slide 1
Data Mining and Machine Learning
Objectives To understand
Differences between types of HMMs
Slide 2
Data Mining and Machine Learning
HMM taxonomy
General HMMs
Conventional HMMs
HMM / NN ‘Hybrids’ Best of both Worlds?
Hidden semi-Markov models Improved duration modelling
Segmental HMMs
Improved segment modelling
Slide 3
Data Mining and Machine Learning
Types of Conventional HMM
Conventional HMMs
Continuous HMMs
Non-Gaussian Continuous HMMs
Gaussian HMMs
Gaussian Mixture HMMs
Slide 4
Data Mining and Machine Learning
Discrete HMMs
Front-End Processing Re-Visited
Vectors in d-dimensional (continuous) space
Vectors in d-dimensional (continuous) space
Symbols from a finite set
Linear Transform e.g. cosine transform
Vector Quantisation
Speech Recognition
Slide 5
Data Mining and Machine Learning
Front-end Processing e.g. filterbank analysis
Discrete HMMs
If VQ is used, then a state output PDF bi is defined by a list of probabilities
bi(m)=Prob(yt=zm|xt =si)
The resulting HMM is a discrete HMM
Common in mid-1980/ early-1990s
Computational advantages
Disadvantages
– VQ may introduce non-recoverable errors
– Choice of metric d for VQ?
Outperformed by Continuous HMM
Slide 6
Data Mining and Machine Learning
Continuous HMMs
Without VQ, bi (y) must be defined for any y in the
(continuous) observation set S
Hence discrete state output PDFs no longer viable
Use parametric continuous state output PDFs – Continuous HMMs
Choice of PDF restricted by mathematical tractability and computational usefulness (see “HMM training & recognition” later)
Most people begin with Gaussian PDFs
Resulting HMMs called Gaussian HMMs
Slide 7
Data Mining and Machine Learning
Gaussian HMMs
State output PDFs are multivariate Gaussian
11 b(y) exp ymC1ym
i (2)dC 2 i i i i
mi and Ci are the mean vector and covariance matrix which define bi
Slide 8
Data Mining and Machine Learning
Gaussian HMMs – Issues
Significant computational savings if covariance matrix can be assumed to be diagonal
In general, Gaussian PDFs are not flexible enough to model speech pattern variability accurately
– In many applications (e.g. modelling speech from multiple speakers) a unimodal PDF is inadequate
– Even if unimodal PDF is basically OK there may be more subtle inadequacies
Slide 9
Data Mining and Machine Learning
Gaussian Mixture HMMs
Any PDF can be approximated arbitrarily closely by a Gaussian mixture PDF with sufficient components
But…
– More mixture components require more data for
robust model parameter estimation
– Parameter smoothing and sharing needed (e.g. ‘tied mixtures’, ‘grand variance’,…)
Gaussian mixture HMMs widely used in systems in research laboratories
Slide 10
Data Mining and Machine Learning
Relationship with Neural Networks
‘Classical’ HMM training methods focus on fitting state output PDFs to data (modelling), rather than minimizing overlap between PDFs (discrimination)
NNs are good at discrimination
But ‘classical’ NNs poor at coping with time-
varying data
Research interest in ‘hybrid’ systems which use NNs to relate the observations to the states of the underlying Markov model
More recently, recurrent NNs also replacing HMMs
Slide 11
Data Mining and Machine Learning
Summary
Types of HMM
– Discrete HMMs
– Continuous HMMs – Gaussian Mixture HMMs
Slide 12
Data Mining and Machine Learning