Data Mining and Machine Learning
Speech Recognition using HMMs – Viterbi Decoding
Peter Jančovič Slide 1
Data Mining and Machine Learning
Viterbi Decoding
Viterbi Decoding is the algorithm which is used to find the sequence of HMM states (or HMMs) which is most likely to have generated a given observation sequence
Similar to the Forward Probability calculation
Slide 2
Data Mining and Machine Learning
Viterbi Decoding (1)
y1 y2 y3 y4 y5 yt-1 yt yT
Q: How can M have generated Y? A: Via a state sequence of length T
Slide 3
Data Mining and Machine Learning
Basic Probability Calculation
y1 y2 y3 y4 y5
b1(y2) b1(y3) b2(y4) b2(y5)
yt-1 yt yT
b1(y1)
b2(yt-1)
b3(yt)
a23
b5(yT)
a11 a11
a22
a12
a35
Slide 4
Data Mining and Machine Learning
Viterbi Decoding (2)
Construction of ‘state-time trellis’
y1 y2 y3 y4 y5 yt-1 yt yT
Slide 5
Data Mining and Machine Learning
Viterbi Decoding (3)
Let X ={x1,…,xT} be a state sequence of length T
The joint probability of Y and X is given by:
p(Y,X)b (y)T a b (y)
x1
1 t2t1tt
xxxt
i.e. the product of the state-output and state transition probabilities along the state sequence
The optimal state sequence is the sequence X such that p(Y,X) is maximized
p(Y) is the sum of p(Y,X) over all sequences X
Slide 6
Data Mining and Machine Learning
Viterbi Decoding (4)
y1 y2 y3 y4 y5 yt-1 yt yT
(i,t)
αt(i) = Prob(y1,…,yt , opt sequence to (i,t)) αt(i) = max {αt-1(i-1)ai-1,i , αt-1(i)ai,i } bi(yt)
Slide 7
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 8
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 9
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 10
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 11
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 12
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 13
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 14
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 15
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 16
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 17
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 18
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 19
Data Mining and Machine Learning
Viterbi Decoding – example
y1 y2 y3 y4 y5 y6 y7 y8
Slide 20
Data Mining and Machine Learning
Viterbi Decoding – Trace-back
y1 y2 y3 y4 y5 y6 y7 y8
Slide 21
Data Mining and Machine Learning
Viterbi Decoding – Trace-back
y1 y2 y3 y4 y5 y6 y7 y8
Slide 22
Data Mining and Machine Learning
Viterbi Decoding – Trace-back
y1 y2 y3 y4 y5 y6 y7 y8
Slide 23
Data Mining and Machine Learning
Isolated Speech Recognition
y1 y2 y3 y4 y5 yt-1 yt yT
Slide 24
Data Mining and Machine Learning
Connected Speech Recognition
New transitions connect end of every model to start of every model
y1 y2 y3 y4 y5 yt-1 yt yT
Slide 25
Data Mining and Machine Learning
Connected Speech Recognition
y1 y2 y3
y4 y5
yt-1 yt yT
Minimum score – traceback from here
Slide 26
Data Mining and Machine Learning
Partial Traceback (1)
In continuous speech recognition, cannot trace-back from the end of the utterance (there is no end!)
Instead partial traceback operates as follows:
– For each time t and state i a word link record describes the sequence of words on the best path to (t,i).
– At regular intervals all active paths are traced back to see if they converge at some time s in the past
– If so, the best path up to time s cannot change, and the sequence of words up to s can be output
Slide 27
Data Mining and Machine Learning
Partial Traceback (2)
r
st
Paths to here already output
Slide 28
Data Mining and Machine Learning
Beam Pruning
Choose threshold T
The active states at time t-1 are those whose partial path scores are within T of the best path score at time t-1
yt-1 yt
‘highest probability’ state
active state
states to process at time t
At time t, only process those states which link back to an active state
Slide 29
Data Mining and Machine Learning
Partial Traceback, Beam Pruning & Recognition ‘Speed’
Partial traceback introduces a ‘lag’ into recognition process – not due to inadequate processor speed
Lag worse when models are poor
Beam Pruning less effective for ambiguous input
Severe Beam Pruning will degrade performance
Proper management of Partial Traceback and Beam Pruning is essential for optimal performance
Slide 30
Data Mining and Machine Learning
Summary
Speech recognition using HMMs
– Viterbi decoding
– Isolated & Connected/Continuous speech recognition – Beam pruning
– Partial traceback
Slide 31
Data Mining and Machine Learning