程序代写代做代考 data mining Hidden Markov Mode compiler Data Mining and Machine Learning

Data Mining and Machine Learning
Application of HMMs for Automatic Speech Recognition – Introduction
Peter Jančovič Slide 1
Data Mining and Machine Learning

Objectives
 Introduce automatic speech recognition
 Understand why speech recognition is difficult – Continuity
– Variability
– Confusability
– Effects of accent
 Speech recognition terminology
Slide 2
Data Mining and Machine Learning

Why is speech recognition difficult?
Intuitively…
 Meaning is represented by sentences
 A Sentence is a sequences of words
 A Word is a sequences of phonemes
 …
 This view of speech is based on text
 But speech is NOT just “Acoustic text”
Slide 3
Data Mining and Machine Learning

Speech is….  Continuous
– “We were away a year ago”  Variable
– “bread and butter” or “brembudder”  Ambiguous
– “The grey tape can fix that leak”
– “The great ape can fix that leek”
– “The great ape can fix that league”
– “The great tape can. Fix that’ll eek!”
Slide 4
Data Mining and Machine Learning

Speech is Continuous
Slide 5
Data Mining and Machine Learning

Variability
Slide 6
Data Mining and Machine Learning

Confusability
Slide 7
Data Mining and Machine Learning

English Vowels : /h_d/
Slide 8
Data Mining and Machine Learning

“League” or “Leek”?
 “league” = / l i g /
 “leek” = / l i k /
 Difference appears to be in the final consonant:
– /g/ is voiced
– /k/ is unvoiced
 But in natural fluent speech, the duration of the vowel /i/ may be a more important cue to recognition!
Slide 9
Data Mining and Machine Learning

ABI – Accents of the British Isles
 Corpus of recordings of 15 different accents of British English
– 300+subjects
– Approx.20minutesofspeechpersubject
– 20+ subjects (10m, 10f) per accent
ABI
Accents of the British Isles
– Eachsubjectbornintown,livedthereallof his or her life, parents born in town
– Fundedby20/20Speech
– Upto£2K(academic)or£20K(industry)
Slide 10
Data Mining and Machine Learning

ABI
ABI
Accents of the British Isles
 ABI II
– Extendedcorpus
– Systematicstudyofeffectofaccentonrecogniserperformance – Systematicstudyofacousticcorrelatesofaccent
Lowestoft Elgin Glasgow Ulster Denbigh
Slide 11
Data Mining and Machine Learning

Approaches to Speech Recognition  Many approaches to speech recognition have been tried in past
 Researchers in early days believed there was insufficient information in the acoustic data to recognise speech, and that additional sources of knowledge were necessary
– acoustic-phonetic, lexical (words), syntactic (grammar), semantic and domain-specific knowledge
 Most successful approach to-date is based on a combination of hidden Markov models (HMMs) with (deep) neural networks
Slide 12
Data Mining and Machine Learning

Speech Recognition Terminology
 Basic problem in speech recognition is variability  Early attempts to solve problem by removing it
 Speaker variability
– Speaker-dependent speech recognition systems train on, and subsequently recognise, a single speaker
– Multiple-speaker systems work for a particular population of speakers
– Speaker Independent systems work for any speaker, with no implicit or explicit training
– Speaker adaptive systems automatically adapt to a new speaker. E.G: begin with a speaker-independent system, and then adapt the system to a particular speaker to obtain a speaker-dependent system.
Slide 13
Data Mining and Machine Learning

Terminology (Continued)
 Another source of variability is co-articulation between words
 Isolated word recognition systems require the user to leave gaps between words
 Connected speech recognition systems recognize isolated phrases or sentences
 Continuous speech recognition systems recognize continuous speech.
Slide 14
Data Mining and Machine Learning

Vocabulary Size
 Another important issue is vocabulary size
 Small vocabulary systems work with vocabularies of 10-100 words
 Medium vocabularies comprise around 100 to 5,000 words
 Large Vocabulary Continuous Speech Recognition
(LVCSR) systems can cope with 60,000 words, while
 Unlimited vocabulary systems have no vocabulary size
limitation
Slide 15
Data Mining and Machine Learning

1970 1980
1990
2000 2010
1970s US ARPA programme
Whole word pattern matching (DP) Sakoe & Chiba
Bruce Lowerre’s ‘HARPY system (CMU)
1980s US DARPA prog. Resource Management (RM) task
‘Popularisation’ of HMMs (Rabiner, Levinson)
IBM Tangora system (Jelinek, Bahl)
The SPHINX system (Kai-Fu Lee) (CMU)
1990s DARPA programme: WSJ, BN, Switchboard – large vocabulary systems
Large-scale HMM-based systems (Cambridge University, LIMSI, IBM, Dragon,…
Google, Amazon, Apple, … Hybrid DNN-HMM systems
Historical perspective
Slide 16
Data Mining and Machine Learning

Phoneme-HMM Speech Recogniser
Speech signal
Phoneme HMM store
Application Compiler
N-gram language model
Viterbi Decoder
Optimal word sequence Or N-Best List,
Pronunciation dictionary
Slide 17
Data Mining and Machine Learning
Front-end signal
processing

Summary
 Why automatic speech recognition is difficult
– Speech is not “acoustic text”: Continuity, Variability, Confusability
 Speech recognition terminology  Historical perspective
Slide 18
Data Mining and Machine Learning