Data Mining and Machine Learning
Speech Recognition using HTK Peter Jančovič
Slide 1
Data Mining and Machine Learning
Objectives
Building an ASR system using Hidden Markov Model Toolkit (HTK)
– Feature Representation – Training
– Recognition (Testing)
Introduction to Perl
Slide 2
Data Mining and Machine Learning
ASR system using HTK
Hidden Markov Model toolkit (HTK) – available for free download
at http://htk.eng.cam.ac.uk/
– Setoftools–locatedinc:\HTK\HTK3.2bin
– exe-files
– manual for the toolkit HTKManual
– Tools likely to be used: HBuild, HCompV, HERest, HInit, HList,
HCopy, HRest, HLEd, HVite, HResult
– Each tool called separately – passed input parameters, e.g., configuration files, list of files to be processed, etc.
Chapter 3 in the HTK Manual (but phoneme-level) Connected digit ASR system
Slide 3
Data Mining and Machine Learning
Task grammar
Task grammar for voice dialing
Slide 4
Data Mining and Machine Learning
Task grammar
Task – connected digits recognition – Word-listfilecontains:
one two … nine oh zero sil sp
Create a text-file called ‘gram’ containing (p.160 in HTK) (
sil < one | two | three | four | five | six | seven | eight | nine | zero > sil )
HParse.exe gram wdnet
VERSION=1.0 N=9 L=22
I=0 I=1 I=2 I=3 I=4 I=5 I=6 J=0 J=1 J=2 etc
W=sil W=one W=two
W=three W=sil W=!NULL W=!NULL
S=0 E=7 S=1 E=0 S=1 E=7
Slide 5
Data Mining and Machine Learning
Dictionary
Dictionary for phoneme-level HMMs
– Contains a list of words required in the task + their pronunciation, i.e., phoneme-level transcription
– Example: five /f/ /ay/ /v/
– Create using HDMan tool
Dictionary for word-level HMMs
– Pronunciation is the copy of the list of words
– Example: one one two two
Slide 6
Data Mining and Machine Learning
Data preparation
Record data or use database provided
– Training data – estimation of the parameters of the ASR system – Testing data – evaluation of the performance
Label files – transcription of the spoken utterance – collected into Master Label File (.mlf)
– Phoneme-level – Word-level
Example: label_trainClean_noSP.mlf contains:
#!MLF!# “*/FAC_13A.lab“ Sil
One
Three
Sil
.
etc
s Slide 7
Data Mining and Machine Learning
Feature extraction
Extraction of speech acoustic features, e.g., MFCC, logFBE, LPC etc
Use HCopy tool (Ch5 in HTK, p. 55-75)
List file (.scp)
Slide 8
Data Mining and Machine Learning
Creating word-level HMMs
Training procedure
– A set of single-Gaussian word-level HMMs
– Start with a set of identical HMMs – means and variances are identical for all word models
– Then perform several training iterations
– Add short-pause (sp)
– Loop: increase number of mixtures & perform several training iterations
– Perform several final training iterations
Slide 9
Data Mining and Machine Learning
Prototype HMM
Define a prototype model – defines the model topology
– number of states, covariance matrix type, feature type, feature dimension, number of streams
Example: 8 state left-to-right HMM, no skips, diagonal covariance matrix, 1 stream, 39 dim feature vector
Write a text-file containing:
1.0 1.0 1.0 1.0 …
1.0 1.0 1.0 1.0 …
Slide 10
Data Mining and Machine Learning
Training – flat start (HCompV)
Tool HCompV
– compute the global mean and variance over the entire training data
– set parameters of all of the Gaussians in a given HMM to these values
HCompV.exe -C config -o hmmdef -f 0.01 -m -S listTrain.scp -M hmm0 proto
– creates a new version of the ‘proto’ with name ‘hmmdef’ in the directory ‘hmm0’
– the zero means and unit variances replaced by the global speech means and variances
– options:‘-f’–variancefloor;‘-o’–outputfilename;‘-S’–filelist
Slide 11
Data Mining and Machine Learning
Training – creating initial HMMs
Using ‘hmmdef’, construct HMM for all vocabulary units (digits, phonemes)
– manually copying the ‘hmmdef’ and relabeling it for each required digit (including ‘sil’)
– automatically – write a small program in Perl or C (etc) – provided exe-files: macros.exe, models_1mixsil.exe
Slide 12
Data Mining and Machine Learning
Training – HMM estimation (HERest)
Tool HERest – estimation of the HMM parameters using Baum-Welch algorithm
HERest -D -C $CONFIG -I $LABELS -t 250.0 150.0 1000.0 -S $LIST_FILE -H $HMM_DIR/hmm1/macros -H $HMM_DIR/hmm1/models -M $HMM_DIR/hmm2 $WORD_LIST
Slide 13
EE4R Automatic Spoken Language Processing
Training – HMM estimation (HERest)
Perform several estimation iterations using the HERest Then generate ‘short-pause’ (sp) model
– Copy the central state of the ‘sil’ model
– The ‘sp’ model is tied with the middle state of the ‘sil’
model (HHEd tool used here)
Add the ‘sp’ in the last line of the WORD_LIST
Slide 14
Data Mining and Machine Learning
Training – mixture increase (HHEd)
Tool HHEd – various functions, including, increasing the number of mixtures
Uses .hed file as input to define the function to be performed
HHEd -H $HMM_DIR/hmm8/macros -H $HMM_DIR/hmm8/models -M $HMM_DIR/hmm9 $ED_CMDFILE2 $WORD_LISTSP
– the file macros should contain the variance floor macro vFloors generated earlier
Slide 15
Data Mining and Machine Learning
Recognition – HVite
Tool HVite – performs recognition of an unknown utterance by using the Viterbi algorithm
Trained HMMs
#!MLF!# “c:/Experiments/SpeechRecogHTK/dataAurora2/spec_ff3dct2a1/TESTA/CLEAN1/FAK_1B.rec“ 0 2100000 sil -1527.106689
2100000 9100000 one -6118.945313
9100000 9200000 sp -74.889305
9200000 10900000 sil -1286.454468
. “c:/Experiments/SpeechRecogHTK/dataAurora2/spec_ff3dct2a1/TESTA/CLEAN1/FAK_2B.rec“ etc
Slide 16
Data Mining and Machine Learning
Recognition – HResults
Compares the recout.mlf with the reference .mlf file – gives the recognition performance
SENT: 197 of the 200 test utterances (98.50%) were correctly recognised WORD:
– Indicates that of the 855 words (N) in total, 853 (99.77%) were recognised correctly
– There was 1 deletion error (D), 1 substitution error (S) and 1 insertion error (I)
– The accuracy figure (Acc) of 99.65% is lower than the percentage correct (Cor)
because it takes account of the insertion errors which the latter ignores
====================== HTK Results Analysis ============== Date: Sun Oct 22 16:14:45 1995
Ref : testrefs.mlf
Rec : recout.mlf
———————— Overall Results —————–
SENT: %Correct=98.50 [H=197, S=3, N=200]
WORD: %Corr=99.77, Acc=99.65 [H=853, D=1, S=1, I=1, N=855] ==========================================================
Slide 17
Data Mining and Machine Learning
Introduction to Perl language Perl
– programming language – text processing, e.g., files, strings
– available on any operating system
Creating and running a Perl program
– text file
Slide 18
– Perl interpreter reads line by line and executes
– run in the command prompt window
> perl myprog.pl
Data Mining and Machine Learning
Perl program
Similar to C syntax
– statements terminated by ;
– comments begin with #
– logical operators &&, ||, ! as in C
Variables
– no need to pre-declare – variables are global
$x = 2; # variable ‘x’ will hold value 2
$greet = “hello”; # variable ‘greet’ will hold string ‘hello’
Slide 19
Data Mining and Machine Learning
Perl program – Arrays
Arrays
@array = (1, 2, “hello”);
$x=1;
$y=2;
@nums = ($x+$y, $x-$y); 1)
$array[0] = $array[0] + $array[1];
# a 3 element array
$len = @array; # variable ‘len’ holds 3 (the length of @array)
# valuable ‘nums’ holds (3, –
# array[0] now holds 3
Slide 20
Data Mining and Machine Learning
Perl program – Conditions
if (expr) { stmt;
}
else { elseif (expr) {
stmt; }}
if ($x > 3) { $x = 3; }
stmt;
Slide 21
Data Mining and Machine Learning
Perl program – Loops 1
while (expr) { stmt;
}
for (init_expr; test_expr; incr_expr) {
stmt; }
for ($i=0; $i<100; $i++) { stmt;
} Slide 22
Data Mining and Machine Learning
Perl program – Loops 2
Iterating over all elements of an array foreach $var (@array) {
stmt; # $var pointer to the current element in @array
}
Slide 23
Data Mining and Machine Learning
Perl program – External programs Running external programs
– runs the HCopy.exe (from the HTK toolkit) with the given input parameters
system(“HCopy.exe –”);
Slide 24
Data Mining and Machine Learning
Perl program – File operations, Print File handles to filenames as in C
open(F1, “filename”); open(F2, “>filename”); open(F3, “>>filename”); close(F1);
Print output print “Woo Hoo\n”
# opens ‘filename’ for reading
# opens ‘filename’ for writing
# opens ‘filename’ for appending
# prints a string to stdout
Slide 25
Data Mining and Machine Learning
Perl program – Print output
Example print output to a file
$fname = “file.txt”;
open(FILE, $fname) || die “Could not open $fname \n”; print $FILE “So, that’s the END of the Perl intro.\n”;
Perl Introduction based on http://cslibrary.stanford.edu/108/
Slide 26
Data Mining and Machine Learning