代写 R C algorithm Scheme matlab scala statistic Adaptive Signal Processing and Machine Intelligence Coursework

Adaptive Signal Processing and Machine Intelligence Coursework
Prof. Danilo P. Mandic
TAs: Giuseppe Calvi, Ilia Kisil, Harry Davies, Shengxi Li, Takashi Nakamura
February 5, 2019
1

Contents
Guidelines 3
1 Classical and Modern Spectrum Estimation 4
1.1 PropertiesofPowerSpectralDensity(PSD) …………………………… 4 1.2 Periodogram-basedMethodsAppliedtoReal–WorldData ……………………. 5 1.3 CorrelationEstimation …………………………………….. 5 1.4 SpectrumofAutoregressiveProcesses ……………………………… 7 1.5 RealWorldSignals:RespiratorySinusArrhythmiafromRR-Intervals . . . . . . . . . . . . . . . . . . . 8 1.6 RobustRegression ………………………………………. 9
2

Guidelines
The coursework comprises four assignments, whose individual scores yield 80% of the final mark. The remaining 20% accounts for presentation and organisation. Students are allowed to discuss the coursework but must code their own MATLAB scripts, produce their own figures and tables, and provide their own discussion of the coursework assignments.
General directions and notation:
◦ The simulations should be coded in MATLAB, a de facto standard in the implementation and validation of signal processing algorithms.
◦ The report should be clear, well-presented, and should include the answers to the assignments in a chronological order and with appropriate labelling. Students are encouraged to submit through Blackboard (in PDF format only), although a hardcopy submission at the undergraduate office will also be accepted.
◦ The report should document the results and the analysis in the assignments, in the form of figures (plots), tables, and equations, and not by listing MATLAB code as a proof of correct implementation.
◦ The students should use the following notation: boldface lowercase letters (e.g. x) for vectors, lowercase letters with a (time) argument (x(n)) for scalar realisations of random variables and for elements of a vector, and uppercase letters (X) for random variables. Column vectors will be assumed unless otherwise stated, that is, x ∈ RN×1.
◦ In this Coursework, the typewriter font, e.g. mean, is used for MATLAB functions. Presentation:
◦ The length limit for the report is 42 pages. This corresponds to ten pages per assignment in addition to one page for front cover and one page for the table of contents, however, there are no page restrictions per assignment but only for the full-report (42 pages).
◦ The final mark also considers the presentation of the report, this includes: legible and correct figures, tables, and captions, appropriate titles, table of contents, and front cover with student information.
◦ The figures and code snippets (only if necessary) included in the report must be carefully chosen, for clarity and to meet the page limit.
◦ Do not insert unnecessary MATLAB code or the statements of the assignment questions in the report.
◦ For figures, (i) decide which type of plot is the most appropriate for each signal (e.g. solid line, non-connected points, stems), (ii) export figures in a correct format: without grey borders and with legible captions and lines, and (iii) avoid the use of screenshots when providing plots and data, use figures and tables instead.
◦ Avoid terms like good estimate, is (very) close, somewhat similar, etc – use formal language and quantify your statements (e.g. in dB, seconds, samples, etc).
◦ Note that you should submit two files to Blackboard: the report in PDF format and all the MATLAB code files compressed in a ZIP/RAR format. Name the MATLAB script files according to the part they correspond to (e.g. SEASP_Part_X-Y-Z.m).
Honour code:
Students are strictly required to adhere to the College policies on students responsibilities. The College has zero tolerance to plagiarism. Any suspected plagiarism or cheating (or prohibited collaboration on the coursework, see above) will lead to a formal academic dishonesty investigation. Being found responsible for an academic dishonesty violation results in a discipline file for the student and penalties, ranging from severe reduction in marks to expulsion from College.
3

1 Classical and Modern Spectrum Estimation
Aims: Students will learn to:
• Perform practical spectrum estimation using parametric models.
• Understand the benefits and drawbacks of parametric and line spectra. • Implement and analyse signal with time-varying spectra.
Background. For a discrete time deterministic sequence {x(n)}, with finite energy 􏰇∞n=−∞ |x(n)|2 < ∞, the Discrete Time Fourier Transform (DTFT) is defined as ∞ X(ω) = 􏰉 x(n)e−ωn (DTFT). (1) n=−∞ We often use the symbol X(ω) to replace the more cumbersome X(eω). The corresponding inverse DTFT is given by 1􏰊π x(n) = 2π X(ω)eωndω (inverse DTFT). (2) −π This can be verified by substituting (2) into (1). The energy spectral density is then defined as S(ω) = |X(ω)|2 (Energy Spectral Density). (3) A straightforward calculation gives 1􏰊π 1􏰊π∞∞ 􏰉∞ 1 􏰊 π |x(n)|2 = 2π S(ω) (Parseval′s theorem). (5) n=−∞ −π For random sequences we cannot guarantee finite energy for every realisation (and hence no DTFT). However, a random signal usually has a finite average power, and can therefore be characterised by average power spectral density (PSD). We assume zero mean data, E{x(n)} = 0, so that the autocovariance function (ACF) of a random signal x(n) is defined as S(ω)dω = 􏰉 􏰉 2π −π n=−∞ m=−∞ 􏰉 􏰉 x(n)x(m) n=−∞ m=−∞ e−ω(n−m)dω = 􏰉 |x(n)|2. n=−∞ x(n)x(m)e−ω(n−m)dω = In the process, we have used the equality 􏰈∞ eω(n−m)dω = δn,m (the Kronecker delta). Equation (4) can be now −∞ ∞∞ 􏰅1􏰊π 􏰆∞ 2π −π restated as 2π −π r(k) = E{x(k)x∗(k − m)} (Autocovariance function ACF). The Power Spectral Density (PSD) is defined as the DTFT of the ACF in the following way ∞ P (ω) = 􏰉 r(k)e−ωk Definition 1 of Power Spectral Density. k=−∞ The inverse DTFT of P(ω) is given by r(k) = 1 􏰈 π 2π −π 􏰇∞ r(l) 􏰋 1 􏰈 π e(k−l)ω dω􏰌 = r(k). (6) (7) P(ω)ekωdω = (8) P(ω)ekωdω, and it is readily verified that 1 􏰈 π 2π −π (4) l=−∞ Observe that 2π −π 1􏰊π r(0) = 2π P(ω)dω. −π Since from (6) r(0) = E{|x(n)|2} measures the (average) signal power, the name PSD for P (ω) is fully justified, as from (8) it represents the distribution of the (average) signal power over frequencies. The second definition of PSD is given by 􏰄 􏰄2  1 􏰄N−1 􏰄  1.1 Properties of Power Spectral Density (PSD) Approximation in the definition of PSD. Show analytically and through simulations that the definition of PSD in (7) is equivalent to that in (9) under a mild [5] assumption that the covariance sequence r(k) decays rapidly, that is, 1 N−1 􏰉 N→∞ N 􏰄n=0 x(n)e −nω 􏰄  P (ω) = lim E 􏰄􏰄 􏰄􏰄 Definition 2 of Power Spectral Density. (9) lim 􏰉 |k||r(k)| = 0. (10) N→∞ N k=−(N −1) Provide a simulation for the case when this equivalence does not hold. Explain the reasons. 4 1.2 Periodogram-based Methods Applied to Real–World Data Now consider two real–world datasets: a) The sunspot time series1 and b) an electroencephalogram (EEG) experiment. a) Apply one periodogram-based spectral estimation technique (possibly after some preprocessing) to the sunspot time [10] series. Explain what aspect of the spectral estimate changes when the mean and trend from the data are removed (use the MATLAB commands mean and detrend). Explain how the perception of the periodicities in the data changes when the data is transformed by first applying the logarithm to each data sample and then subtracting the sample mean from this logarithmic data. The basis for brain computer interface (BCI). b) The electroencephalogram (EEG) signal was recorded from an electrode located at the posterior/occipital (POz) [10] region of the head. The subject observed a flashing visual stimulus (flashing at a fixed rate of X Hz, where X is some integer value in the range [11, . . . , 20]). This induced a response in the EEG, known as the steady state visual evoked potential (SSVEP), at the same frequency. Spectral analysis is required to determine the value of ‘X’. The recording is contained in the EEG_Data_Assignment1.mat file2 which contains the following elements: ◦ POz – Vector containing the EEG samples (expressed in Volts) obtained from the POz location on the scalp, ◦ fs – Scalar denoting the sampling frequency (1200 Hz in this case). Read the readme_Assignment1.txt file for more information. Apply the standard periodogram approach to the entire recording, as well as the averaged periodogram with differ- ent window lengths (10 s, 5 s, 1 s) to the EEG data. Can you identify the the peaks in the spectrum corresponding to SSVEP? There should be a peak at the same frequency as the frequency of the flashing stimulus (integer X in the range [11, . . . , 20]), known as the fundamental frequency response peak, and at some integer multiples of this value, known as the harmonics of the response. It is important to note that the subject was tired during the recording which induced a strong response within 8-10 Hz (so called alpha-rhythm), this is not the SSVEP. Also note that a power-line interference was induced in the recording apparatus at 50 Hz, and this too is not the SSVEP. To enable a fair comparison across all spectral analysis approaches, you should keep the number of frequency bins the same. Hint: It is recommended to have 5 DFT samples per Hz. How does the standard periodogram approach compare with the averaged periodogram of window length 10 s? Hint: Observe how straightforward it is to distinguish the estimated SSVEP peaks from other spurious EEG activity in the surrounding spectrum. In the case of averaged periodogram, what is the effect of making the window size very small, e.g. 1 s? 1.3 Unbiased correlation estimation and preservation of non-negative spectra. Recall that the correlation-based definition of the PSD leads to the so-called correlogram spectral estimator given by N−1 P(ω) = 􏰉 rˆ(k)ejωk (11) k=−(N −1) where the estimated autocorrelation function rˆ(k) can be computed using the biased or unbiased estimators given by Correlation Estimation Biased: rˆ(k)=N Unbiased: rˆ(k)=N−k (12) (0≤k≤N−1). (13) 1 􏰉N x(n)x∗(n−k) x(n)x∗(n−k) n=k+1 1 􏰉N n=k+1 Although it may seem that the unbiased estimate is more appropriate as its mean matches the true mean of PSD, observe that this estimate (despite being exact) can be highly erratic for larger lags k (close to N), where fewer samples are available to estimate the PSD. As a consequence, the ACF may not be positive definite, resulting in negative PSD values. 5 a) Write a MATLAB script which calculates both biased and unbiased ACF estimates of a signal and then use these [10] ACF estimates to compute the corresponding correlogram in Eq. (11). Validate your code for different signals e.g. WGN, noisy sinusoidal signals and filtered WGN. Explain how the spectral estimates based on (12)-(13) differ from one another? In particular, how does the correlogram corresponding to the unbiased ACF estimates behave for large lags (i.e. k close to N )? Does the unbiased ACF estimate result in negative values for the estimated PSD? Plotting the PSD in dB. Depending on the estimation approach, the spectral estimate Pˆ(ω) can be asymptotically unbiased with variance μP2(ω), where μ > 0 is a constant. When several realisations of a random signal are available, it is possible to present the estimate PSD as a confidence interval defined by Pˆ(ω) ± μσˆP(ω), where Pˆ(ω) and σˆP(ω) are respectively the mean and standard deviation of the estimated PSDs of the available observations. A drawback of this approach is that, as stated earlier, the standard deviation is proportional to the value of the PSD and therefore the confidence interval widens in zones where the PSD increases, and it is these parts that we are particularly interested in. Fig. 1 shows an overlay plot of 100 realisations of the PSD of two sinusoids immersed in i.i.id WGN showing the mean (top), and the standard deviation of the set (bottom).
For ease of presentation, by plotting the PSD estimates in decibels we observe a more condensed realisation due to the contraction property of the logarithm.
b) Use your code from the previous section (only the biased ACF estimator) to generate the PSD estimate of several [5] realisations of a random process and plot them as in Fig. 1. Generate different signals composed of sinusoids corrupted by noise and elaborate on how disperse are the different realisation of the spectral estimate. Hint: use the
fft and fftshift commands in MATLAB.
c) Plot your estimates in dB, together with their associated standard deviation (again as in Fig. 1 for comparison). [5] How much spread out are the estimates now? Comment on the benefits of this representation.
PSD estimates (different realisations and mean) 300
200
100
0
0 5 10 15 20 25 30 35 40
40 20 0
Frequency [π radians] Standard deviation of the PSD estimate
0 5 10 15 20 25 30 35 40
Frequency [π radians]
Figure 1: PSD estimates of two sinusoids immersed in noise. Top: An overlay plot of 100 realisations and their mean.
Bottom: Standard deviation of the 100 estimates.
Frequency estimation by MUSIC. In order to accurately estimate the spectrum of closely-spaced sine waves using the periodogram, a large number of samples N is required since the frequency resolution of the periodogram is proportion- ate to 1/N. On the other hand, subspace methods assume a harmonic model consisting of a sum of sine waves, possibly complex, in additive noise. In this setting, the noise is also complex-valued.
For illustration, consider a complex-valued signal of 30 samples in length, generated using the following code:
n = 0:30;
noise = 0.2/sqrt(2)*(randn(size(n))+1j*randn(size(n)));
x = exp(1j*2*pi*0.3*n)+exp(1j*2*pi*0.32*n)+ noise;
1Included in MATLAB, use load sunspot.dat
2Download the EEG recording from http://www.commsp.ee.ic.ac.uk/~mandic/EEG_Data.zip
6

The signal consists of two complex exponentials (sine waves) with frequencies of 0.3 Hz and 0.32 Hz and additive complex white Gaussian noise. The noise has zero mean and variance of 0.2.
The spectral estimate using the periodogram (rectangular window, 128 frequency bins and unit sampling rate) is shown in Fig. 2. Observe that the periodogram was not able to identify the two lines in the spectrum; this is due to the resolution of the periodogram being proportionate to 1/N , which is greater than the separation between the two frequencies.
15
10
5
0
−5
−10
−15
−20
−25 0
100 200 300
400 500 600
Periodogram Power Spectral Density Estimate
Frequency (mHz)
1.4
[X,R] = corrmtx(x,14,’mod’);
[S,F] = pmusic(R,2,[ ],1,’corr’);
plot(F,S,’linewidth’,2); set(gca,’xlim’,[0.25 0.40]);
grid on; xlabel(’Hz’); ylabel(’Pseudospectrum’);
Explain the operation of the first three lines in the code using the MATLAB documentation and the lecture notes. [10] What is the meaning of the input arguments for the functions corrmtx and pmusic? Does the spectrum estimated
using the MUSIC algorithm provide more detailed information? State briefly the advantages and disadvantages of
the periodogram and the MUSIC algorithms and comment on the bias and variance. How accurate would a general spectrum estimate be when using MUSIC?
Spectrum of Autoregressive Processes
Figure 2: Periodogram of two complex exponentials with closely-spaced frequencies.
d) Familiarise yourself with the generation of complex exponential signals, and generate signals of different frequen- [5] cies and length. Verify that by considering more data samples the periodogram starts showing the correct line spectra.
e) Use the following code to find the desired line spectra using the MUSIC method.
In many spectrum estimation applications, only short data lengths are available; thus, classical spectrum estimation tech- niques based on the Fourier transform will not be able to resolve frequency elements spaced close to one another. In order to solve this problem, we can use modern spectrum estimation methods based on the pole-zero modelling of the data.
Consider a general ARMA(p, q) process given by
y(n) = a1y(n − 1) + · · · + apy(n − p) + w(n) + b1w(n − 1) + · · · + bqw(n − q)
The power spectrum of y has the form
jω |􏰇qk=1 bke−jkω|2 Py(e )=|1−􏰇pk=1ake−jkω|2
Thus, the power spectrum can be estimated through the parameters (a1 , …, ap , b1 , .., bq ). The assumption of an under- lying model for the data is the key difference between classical and modern spectrum estimation methods.
7
Power/frequency (dB/Hz)

For an AR process in particular, the power spectrum is the output of an all-pole filter given by
rx(1) . rx(0) .
..rx(p)11 . . rx(p−1)a1  0
j ω σ w2
) = |1 − 􏰇pk=1 ak(k)e−jkω|2
The parameters σw2 and a = 􏰂a1 . . .  rx(0)
Py(e
ap 􏰃T can be estimated by a set of (p + 1) linear equations
 rx(1)
… .  . =σ2  . 
. . . w. .. ....
rx(p) rx(p−1) . . . rx(0) ap 0 where rx(k) could be calculated using the biased autocorrelation estimate
..
1 N−1−k
􏰉 x(n + k)x(n)
rx(k) =
rx(k) =
a) Based on your answers in Section 2.1, elaborate on the shortcomings of using the unbiased ACF estimate when [5]
finding the AR parameters? [see Eq. (13)]
b) Generate 1000 samples of data in MATLAB, according to the following equation [10]
x(n) = 2.76x(n−1)−3.81x(n−2)+2.65x(n−3)−0.92x(n−4)+w(n)
where w ∼ N (0, 1) and discard the first 500 samples (x=x(500:end)) to remove the transient output of the filter. Estimate the power spectrum density of the signal using model orders p = 2, …, 14 and comment on the effects of increasing the order of the (assumed) underlying model by comparing the estimation to the true Power Spectral Density. Only plot the results of the model orders which produced the best results.
c) Repeat the experiment in b) for data length of 10, 000 samples. What happens to the PSD when the chosen model [5] order is lower (under-modelling) or higher (over-modelling) than the correct AR(4) model order?
1.5 Real World Signals: Respiratory Sinus Arrhythmia from RR-Intervals
Note: Use the RRI data from your experiment.
Respiratory sinus arrhythmia (RSA) refers to the modulation of cardiac function by respiratory effort. This can be readily observed by the speeding up of heart rate during inspiration (“breathing in”) and the slowing down of heart rate during expiration (“breathing out”). The strength of RSA in an individual can be used to assess cardiovascular health. Breathing at regular rates will highlight the presence of RSA in the cardiac (ECG) data.
a) Apply the standard periodogram as well as the averaged periodogram with different window lengths (e.g. 50 s, 150 [10] s ) to obtain the power spectral density of the RRI data. Plot the PSDs of the RRI data obtained from the three trials separately.
b) Explain the differences between the PSD estimates of the RRI data from the three trials? Can you identify the peaks [5] in the spectrum corresponding to frequencies of respiration for the three experiments?
c) Plot the AR spectrum estimate for the RRI signals for the three trials3. To find the optimal AR model order, [10] experiment with your model order until you observe a peak in the spectrum (approximately) corresponding to
the theoretical respiration rate. List the differences you observe between your estimated AR spectrum and the periodogram estimate in Part a).
3Use the MATLAB function aryule to estimate the AR coefficients for your RRI signal.
or the unbiased autocorrelation estimate
N n=0
1 N−1−k
􏰉 x(n + k)x(n)
N−k n=0
8

1.6 Robust Regression
Load the file4 PCAPCR.mat which includes the data matrices X ∈ RN ×dx and Y ∈ RN ×dy , described below.
Training Data
Testing Data
Note
X

Input variables, some of which are collinear. Each column repre- sents N measurements of an input variable.
Xnoise
Xtest
Noise corrupted input matrix Xnoise = X + NX, where the ele- ments of NX were drawn from a zero-mean Gaussian distribution.
Y
Ytest
Output variables obtained from Y = XB + NY where the co- efficient matrix B is unknown. Each column in Y represents N measurements of an output variable. The elements of NY were drawn from a zero-mean Gaussian distribution.
XN X
pT1 pT2 pTr ptT r+1
pT m
XNoise
=
……
Signal Subspace
+1
t1
t2
tr
tr
tm
Noise Subspace
Figure 3: Principle of PCA: Illustration of the signal and noise subspaces.
Using the Matlab command svd, obtain the singular value decomposition for the matrices X and Xnoise.
a) Plot the singular values of X and Xnoise (hint: use the stem command), and identify the rank of the input data [5] X. Plot the square error between each singular value of X and Xnoise. Explain the effect of noise on the singular values, and state at what point would it become hard to identify the rank of the matrix Xnoise.
b) Using only the r most significant principal components (as determined by the identified rank), create a low-rank [5] approximation of Xnoise, denoted by X ̃ noise. Compare the difference (error) between the variables (columns) of the noiseless input matrix, X, and those in the noise corrupted matrix Xnoise and denoised matrix X ̃ noise.
The output data are obtained as Y = XB + NY. The ordinary least squares (OLS) estimate for the unknown
regression matrix, B, is then given
BˆOLS =(XTX)−1XTY (14)
S
Since the matrix XT X which is calculated from the original data, X, is sub-rank, the OLS solution in (14) becomes intractable. On the other hand, for the noisy data, Xnoise, the term XTnoiseXnoise is full-rank, and therefore admits the OLS solution, however, this may introduce spurious correlations in the calculation of regression coefficients.
To circumvent this issue in the estimation of B, the principal component regression (PCR) method first applies prin- cipal component analysis (PCA) on the input matrix Xnoise. Specifically, the SVD of Xnoise is given by Xnoise = UΣVT . By retaining the r largest principal components (r-singular values and the associated singular vectors), the PCR solution is given by
BˆPCR = V1:r(Σ1:r)−1UT1:rY
where the subscript (1 : r) denotes the r-largest singular values and the corresponding singular vectors. In this way, the PCR solution avoids both the problem of collinearity and noise in the input matrix. Figure 4 illustrates the difference between the OLS and PCR methods.
c) Calculate the OLS and PCR solutions for the parameter matrix B, which relates Xnoise and Y. Next, compare the [5] estimation error between Y and Yˆ OLS = Xnoise Bˆ OLS and Yˆ PCR = X ̃ noise Bˆ PCR . Explain what happens when you estimate the data from the test-set using the regression coefficients computed from the training set, and quantify the performance by comparing Ytest and Yˆ test-OLS = Xtest Bˆ OLS with Yˆ test-PCR = X ̃ test Bˆ PCR .
In real world machine intelligence applications, a model is trained with a finite set of data which is referred to as the training set. After the training, the model is not only expected to be a good fit to the training data, but also it needs to model out-of-sample data. Any model which fits the training data well but has poor out-of-sample performance is
4Download the PCR files from: http://www.commsp.ee.ic.ac.uk/~mandic/PCR.zip 9

Yˆ OLS
=
=
=
XNoise
BˆOLS
Yˆ P C R
X
̃ XNoise
Bˆ OLS
Bˆ P C R
N X
BˆOLS
Figure 4: Comparing OLS and PCR solutions.
said to be “over-fitted”. Therefore, it is important to validate the regression model computed in this section on a test-set which is another realisation of the signal drawn from the statistical distribution of the training set. For this task, the file PCAPCR.mat contains both the training data and test data, which should be used to validate the effectiveness of the regression model derived from the OLS and PCR solutions.
d) The best way to assess the effectiveness of the PCR compared to the OLS solution is by testing the estimated [5] regressions coefficients, Bˆ , over an ensemble of test data. The file PCR.zip contains the script regval, the output of which is a new realisation of the test data, Y, and its estimate, Yˆ , the input are the regression coefficients,
and the function syntax is:
[Yˆ , Y] = regval(Bˆ ).
Using the same PCR and OLS regression coefficients as in (c), compute and compare the mean square error esti- mates for the PCR and OLS schemes, MSE = E{∥Y − Yˆ ∥2}, based on the realisations of Y and Yˆ provided by the function regval. Comment on the effectiveness of these schemes.
10