Laboratory Assignment
MSc Introductory Module (Part I)
Peter Janˇcoviˇc
Assignment Instructions
You should work in groups of three (however, there is no problem if anyone wants to work in group of two or individually). EACH GROUP should record the outcomes of their work in ONE lab-report and store any required program code and files.
Tha lab-assignment report can be written in LaTeX or MS Word. The front page of the report should contain the module title and IDs of all persons working in the group as well as the effort of each person (standard for a person would be 100%, i.e., full effort). The report should be written using font size of 11pt. Matlab code listings should be included in appendices of the report and should be of 9pt font size.
Submission of the assignment (as well as all the required files) is through Canvas – under the ‘MSc Introductory Module for Computer Engineering’, find the assignment ‘Lab-Assignment Submission’. The report should be submitted in .pdf file format. Make sure you attach all the files required and make your submission.
Each group should make a single submission.
The usual penalty of 5% per day will apply to all late submissions.
Plagiarism
Plagiarism will not be tolerated. It is the act of a Student claiming as their own, intentionally or by omission, work which was not done by that Student. Plagiarism also includes a Student deliberately claiming to have done work submitted by the Student for assessment which was never undertaken by that Student, including self-plagiarism and the other breaches. Sanctions of a plagiarism include the Student failing the Programme of study.
1
1 Introduction
This assignment will test your knowledge and understanding of the following topics: programming in Matlab (loops, vectors, matrices, functions, reading/writing to files), discrete Fourier transform, digital filtering, modelling of data using Gaussian probability density function (PDF). This will be done through analysis of a given set of audio data.
2 The Data
The data you will use consists of audio data and accompanying text data (referred to as labels). The data is part of the TIMIT dataset1, which has been used widely for research in speech processing. You will use 70 files (audio and text), which are arranged in sub-folders.
The audio data contain recordings of speech, sampled at 8 kHz sampling frequency. The files are in Microsoft ‘.wav’ format, which can be read in Matlab using the function ‘audioread’.
Each audio wav-file is accompanied with a text file, referred to as label file (.lab file). Each .lab label file contains three columns – the third column is the list of phonemes (i.e., types of speech sounds) that are spoken in the corresponding wav-file; the first and second column give the start and end times of each phoneme, respectively. The times are in 100 ns (i.e., 1409375 corresponds to 140.9375 ms). An example of the label file is given in Figure 1.
Figure 1: An example of label file (part for audio ‘MCPM0/SA1.wav’).
You are also given a text file ‘listData.txt’ that contains the list of all the filenames (without the file extension) of the data.
3 Assignment task
Your Matlab program should be a text-based menu-driven program that has the following options for user to choose by pressing the speciffied letter (‘a’, ‘b’, ‘c’, ‘d’, or ‘e’):
(a) Perform FIR filtering (b) Extract signal segments
(c) Calculate DFT and energy for low/high frequency regions (d) Modelling of energy values using Gaussian PDFs
(e) Exit the program
The definition of all of these options is given in the following subsections.
1J. Garofolo, “TIMIT: acoustic-phonetic continuous speech corpus,” Linguistic Data Consortium, 1993. 2
3.1 Option (a): Perform FIR filtering
This option should read the text file ‘listData.txt’ and then in a loop load one by one the original wav-files from folder ‘wavOrig’, perform filtering on each file and store the output signals into wav-files with the same name but into a new folder named ‘wavFilt’. The new created wav-files should be exactly of the same length as the original wav-files.
Your program should implement FIR filtering of the audio signal through the relation of the output y(n) and input x(n) sample values. The filter is defined by its impulse response h(n) given in Eq. 1.
h(n) = {h(0), h(1), . . . , h(6)}
= {−0.1, 0.3, 0.5, 0.5, 0.5, 0.3, −0.1}.
(1)
You should assume that values of samples in the input signal x(n) for time n ≤ 0 are zero. Your program should also produce a figure of the magnitude frequency characteristic of the filter.
You are NOT ALLOWED to use any Matlab ready-made functions to perform the filtering and obtain the filter frequency characteristic (such as, ‘filter’, ‘conv’, ‘freqz’).
Deliverables:
• Include in the report: Figure of the magnitude frequency characteristic of the filter.
• Attach to your submission: A zip file of the folder ‘wavFilt’, containing all your created wav-files.
3.2 Option (b): Extraction of signal segment
This option should extract from the audio wav-files a specified part of the signal corresponding to the phoneme ‘s’ and phoneme ‘aa’ and store the extracted signals into matrices.
For each wav-file, you will need to read the corresponding label file and find all the occurrences of the the phonemes ‘s’ and ‘aa’. For a given occurrence of the phoneme, you should extract from the wav-file a segment of 20 ms of the signal around the centre of the phoneme, i.e., the start time (timeSegStart) and end time (timeSegEnd) of the segment to be extracted shoud be calculated (in ms) as:
timeSegStart = timeP hStart + (timeP hEnd − timeP hStart)/2 − 10 timeSegEnd = timeP hStart + (timeP hEnd − timeP hStart)/2 − 10,
(2)
where timePhStart and timePhEnd are, respectively, the start and end times of the phoneme as found in the label file (but converted to ms).
The signal segments of phoneme ‘s’ and phoneme ‘aa’ should be extracted from all the files and separately from ‘wavOrig’ data and ‘wavFiltered’ data and stored into arrays named: segOrig phS, segFilt phS, segOrig phAA and segFilt phAA. Each of these arrays should be of the size num × nSamples, where num is the number of occurrences of the particular phoneme (‘s’ or ‘aa’) and nSamples is the number of samples in a segment of signal, i.e., each row in the arrays corresponds to a phoneme occurrence and each column to the sample index. After processing all files, store the arrays into a mat-file called ‘segAllData.mat’. Note that the number of occurrences of these phonemes in each file varies (and sometimes there is none).
Deliverables:
• Include in the report: Figures of the extracted signal segment (waveform) of the first occurrence of the phoneme ‘s’ and phoneme ‘aa’ in the files: ‘wavOrig/MDPK0/SA1.wav’ and ‘wavFilt/MDPK0/SA1.wav’.
• Attach to your submission: The mat-file ‘segAllData.mat’, containing the arrays: segOrig phS, segFilt phS, segOrig phAA and segFilt phAA.
3
3.3 Option (c): Calculate DFT and energy for low/high frequency regions
This option should perform DFT on the signal segments and then calculate the average energy for low- frequency region (0-2 kHz) and for high-frequency region (2-4 kHz) and then convert these to decibels (dB). These calculations should be applied to each extracted signal segment.
For each phoneme (‘s’ and ‘aa’) and data conditions (‘orig’ and ‘filt’), it should process all the ex- tracted signal segments previously stored in the arrays (segOrig phS, segFilt phS, segOrig phAA or segFilt phAA) and store the two calculated energy values (in dB) in 2D arrays, named accordingly: enLFandHF orig phS, enLFandHF filt phS, enLFandHF orig phAA, and enLFandHF filt phAA. Each of these arrays should be of the size num × 2, where num is the number of occurrences of the partic- ular phoneme (‘s’ or ‘aa’) – each row in the arrays corresponds to a phoneme occurrence and each column to the low-frequency and high-frequency energy values.
Deliverables:
Include in the report:
• •
3.4
Figures of the magnitude spectrum for the first occurrence of the phoneme ‘s’ and ‘aa’ in the files: ‘wavOrig/MDPK0/SA1.wav’ and ‘wavFilt/MDPK0/SA1.wav’.
Histograms of the low-frequency and high-frequency energy values for the phoneme ‘s’ and ‘aa’ in each conditions (‘orig’ and ‘filt’), i.e., histograms of data in variables: enLFandHF orig phS, enLFandHF filt phS, enLFandHF orig phAA, and enLFandHF filt phAA.
Option (d): Modelling of energy values using Gaussian PDFs
This option should perform modelling of the low-frequency and high-frequency energy values separately for each phoneme (‘s’ and ‘aa’) and for each condition (‘orig’ and ‘filt’) using Gaussian PDFs.
Deliverables:
Include in the report (for each phoneme and each conditions):
• A table with the values of the parameters of the Gaussian PDFs modelling each data.
• Discuss the appropriateness of modelling using Gaussian PDFs. 3.5 Option (e): Exit the program
This option should exit the program.
4 Report and Marking criteria
Attach with your submission: report, files as requested in each of the tasks, .zip file containg all your m-files.
Marking will be according to the following criteria:
• Correctness of operation and completeness of part (a) [ 15 points ]
• Correctness of operation and completeness of part (b) [ 20 points ]
• Correctness of operation and completeness of part (c) [ 20 points ]
• Correctness of operation and completeness of part (d) [ 15 points ]
• Matlab programming – demonstration of suitable use of programming concepts and code efficiency [ 15 points ]
• Quality of report – formatting, English, figures with labels, etc [ 15 points ] END
4