Data Mining and Machine Learning
HMMs for Automatic Speech Recognition
Peter Jančovič Slide 1
Data Mining and Machine Learning
Objectives To understand
Application of HMMs for automatic speech recognition
HMM assumptions
Slide 2
Data Mining and Machine Learning
Pattern Recognition
Suppose we have a finite number of classes, w1,…,wC and the goal is to decide which class has given rise to the measurement x
The probability of the class w given that the measurement x has been observed is called posterior probability of the class w – denoted by P(w|x)
Slide 3
Data Mining and Machine Learning
Bayes’ Theorem
The form of Bayes’ Theorem which we need for pattern recognition is:
Class-conditional density
px|wPw px
Prior probability
P w | x Posterior probability
Slide 4
Data Mining and Machine Learning
Automatic Speech Recognition
Given a sequence of acoustic feature vectors Y = {y1,…,yT}
we want to find the sequence of words W = {w1,…,wL}
such that the probability P(W |Y)
is maximized.
If M = {M1,…,MK} is the sequence of HMMs which
represents W, then P( W | Y ) = P( M | Y )
Slide 5
Data Mining and Machine Learning
Bayes’ Theorem
Computation of the probability P( M | Y ) is made
possible using Bayes’ Theorem
p(Y |W)P(W) p(Y)
P(W |Y)
P(W) is the “language model probability”
p( Y | W ) is the “acoustic model probability”
Slide 6
Data Mining and Machine Learning
Mathematical Modelling
Mathematical modelling for speech recognition Two conflicting requirements:
– Faithful model of human speech production/perception
– Mathematically tractable & Computationally Useful
HMMs are one of the best compromise at the
moment
Mathematics & Computing Speech Science
HMMs
Slide 7
Data Mining and Machine Learning
‘Divide and Conquer’
One possible approach to ASR is sequential ‘divide and conquer’, e.g.
– classify speech vectors as ‘acoustic features’ – classify sequences of acoustic features as
phonemes
– classify sequences of phonemes as words – classify sequences of words …
DISASTER!!
Slide 8
Data Mining and Machine Learning
Delayed Decision Making
Another name for this might be non-recoverable error propagation!
Better to avoid all classification decisions until all sources of information are available. Then perform classification as a single, integrated process – delayed decision making
Delayed Decision Making underlies HMM success
Slide 9
Data Mining and Machine Learning
The ‘HMM Compromise’ Assume that :
A spoken utterance is a time-varying sequence which moves through a sequence of ‘segments’ – (yes)
Underlying structure of the segments is constant w.r.t time – (no!)
Durations of segments vary – (yes)
All variations between different realizations of
the segments are random – (no!) Data Mining and Machine Learning
Slide 10
Hidden Markov Model
In a hidden Markov model, the relationship between the observation sequence and the state
sequence is ambiguous.
a11 a22
a12 a23
a33 a24
a44
X={xt} Y={yt}
a34
Slide 11
Data Mining and Machine Learning
Hidden Markov Models
A HMM consists of
A set of states S = {s1,…,sN}
A state transition probability matrix A = [aij]i,j=1,…N,,
where aij =Prob(sj at time t | si at time t-1)
For each state si, a PDF bi defined on the set of possible
observations O s.t.
bi(o) = Prob(yt=o | xt=si)
bi is called the state output PDF for state i (or the ith state
output PDF) Slide 12
Data Mining and Machine Learning
10 state HMM of the digit ‘zero’
Slide 13
Data Mining and Machine Learning
6 state HMM of the digit ‘zero’
Slide 14
Data Mining and Machine Learning
HMM Assumptions
Temporal Independence – the observation yt depends on the state xt but is otherwise independent of the rest of the observation sequence Y = {yt}!
… so, the position of the vocal tract at time t is independent of its position at time t-1!
Piecewise stationarity – the underlying structure of speech is a sequence of stationary segments
Random variability – variations from this underlying structure are random
Slide 15
Data Mining and Machine Learning
HMM State Duration Model Constant segments correspond to the HMM states
aii
Pi (D) } (1-aii)
D
Probability of state duration D is given by Pi (D) = (1 – aii)aii(D – 1)
Slide 16
Data Mining and Machine Learning
Summary
Introduction to application of HMMs for speech recognition – HMM assumptions
Slide 17
Data Mining and Machine Learning