CS计算机代考程序代写 matlab Coursework 1 briefing

Coursework 1 briefing

Briefing for Coursework 1

Data collection
• Decision on sampling frequency (8KHz, 16KHz, 44kHz) – can use

MATLAB function resample() to change this (but only to reduce)

• Scaling

• Check signal amplitudes are consistent

• Apply rescaling or normalisation

• Clipping

• Check recording gain setting to avoid clipping

• Lead-in or zero amplitude signal

• If amplitude takes time to settle, or contains zeros, cut this out of
recordings or apply filter

• Make sure you are happy with recording condition before you
undertake a long data collection – speak to me first if unsure

Data collection – clipping

500 1000 1500 2000 2500 3000 3500 4000 4500 5000
−1

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6

0.8

500 1000 1500 2000 2500 3000 3500 4000 4500 5000
−1

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6

0.8

Adjust recording level to avoid clipping

Correct Clipped

Data collection – leading zeros

500 1000 1500 2000 2500 3000 3500 4000 4500 5000
−1

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6

0.8

1000 2000 3000 4000 5000 6000 7000
−1

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6

0.8

Remove/crop leading or trailing zeros – adjust label file
accordingly

May cause problems with subsequent feature extraction

Mel-scale filterbank

0 1 2 3 4

Mel-scaled filterbank – ~20 channels

Freq:
kHz

speech
signal

Pre-
emphasis

Spectral
analysis

Mel-scale
filterbank Log DCT

feature
vectors

Hamming
window

• Three things to decide:

• Linear/non-linear frequency mapping

• Number of channels

• Shape of filterbank channels

• Frequency mapping – ideally mel-scale (c.f. equation) but better to start with a
linear mapping where channels are spaced equally in frequency

• Number of channels – if sampling at 8kHz then 20-25 is typical, if sampling at
16kHz then 30 is typical

• Shape of filterbank channels – standard MFCC implementation uses triangular
shaped filterbank the overlap. A simple implementation could use non-
overlapping rectangular filterbanks

Rectangular filterbank

0 20 40 60 80 100 120 140
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 20 40 60 80 100 120 140
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Channel 1

assume K channel filterbank:

for channel = 1:K
Compute firstBin for channel
Compute lastBin for channel
filterbank(channel) = sum( mag(firstBin:lastBin)

end
feature
vector
8.341

Rectangular filterbank

0 20 40 60 80 100 120 140
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 20 40 60 80 100 120 140
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 20 40 60 80 100 120 140
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Channel 1 Channel 2

assume K channel filterbank:

for channel = 1:K
Compute firstBin for channel
Compute lastBin for channel
filterbank(channel) = sum( mag(firstBin:lastBin)

end

8.34

feature
vector

2.17
1
2

Mel-scale filterbank

0 20 40 60 80 100 120 140
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Magnitude spectrum

0 1 2 3 4

Mel-scaled filterbank – ~20 channels
Freq:
kHz

feature
vector

0 20 40 60 80 100 120 140
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Mel-scale filterbank

Channel 1

0 1 2 3 4

Mel-scaled filterbank – ~20 channels
Freq:
kHz

feature
vector

0 20 40 60 80 100 120 140
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Mel-scale filterbank

Channel 2

0 1 2 3 4

Mel-scaled filterbank – ~20 channels
Freq:
kHz

feature
vector

8
6

Discrete cosine transform
speech
signal

Pre-
emphasis

Spectral
analysis

Mel-scale
filterbank Log DCT

feature
vectors

Hamming
window

• MATLAB has dct function

• Need to decide level of truncation following DCT

• If have 20 channel filterbank and input that into DCT then have 20 DCT coefficients
as output

• Level of truncation will affect recognition performance

• As a simple rule, truncate to keep half the number of DCT coefficients

Writing HTK files
• HTK files are binary files and follow a strict structure – 12 byte header

followed by the coefficients (4 byte floats each) – see HTK manual
% Open file for writing:

fid = fopen(filename, ‘w’, ‘ieee-be’);

% Write the header information%

fwrite(fid, numVectors, ‘int32’); % number of vectors in file (4 byte int)

fwrite(fid, vectorPeriod, ‘int32’); % sample period in 100ns units (4 byte int)

fwrite(fid, numDims * 4, ‘int16’); % number of bytes per vector (2 byte int)

fwrite(fid, parmKind, ‘int16’); % code for the sample kind (2 byte int)

% Write the data: one coefficient at a time:

for i = 1: numVectors,

for j = 1:numDims,

fwrite(fid, data(i, j), ‘float32’);

end

• Further details in HTK book
• Use HList to check that HTK file contains correct data from MATLAB

Coding ideas
• Important to use a structured approach to feature extraction

for frameNumber = 1:numFrames,

{

frame = x(frameStart:frameEnd);

hamming

getMagSpec

filterbank

log

dct

truncation

}

write parameterised file

• Each stage can be a MATLAB function with inputs and outputs

Analysing results
• Sentence level, word level – correct and accuracy

• Hits, substitutions, deletions and insertions

.lab = one two three four

.rec = one three four deletion error

Analysing results
• Sentence level, word level – correct and accuracy

• Hits, substitutions, deletions and insertions

.lab = one two three four

.rec = one two five four substitution error

Analysing results
• Sentence level, word level – correct and accuracy

• Hits, substitutions, deletions and insertions

.lab = one two three four

.rec = one seven two three four insertion error

Research ideas
• Many variations on design and implementation

• Many parameters to adjust

• Interesting to see the effect that changes make on recognition
accuracy

• Keep training data and test data fixed and adjust one parameter
at a time – can then see the result of that change

• Evaluation should investigate this and show graphs, tables, etc

• Don’t need to include loads of confusion matrices – take up a lot of
space.

• Key number is %Acc – confusion matrix may go to explain a particular
result

Related Posts