代写 R matlab Problem Set 1 MSIN0208: Big Data Analytics Due date: January 23.

Problem Set 1 MSIN0208: Big Data Analytics Due date: January 23.
Question 1 understanding principal component when T 2
For i 1,…,n and t 1,2 we observe outcomes yit that are generated from the
model
yit ift uit.
The factors f f1 , f2 R2 are nonrandom unknown parameters that we want to
estimate. The loadings i and errors ui ui1,ui2 R2 are random and unobserved,
they are independent from each other, and independent and identically distributed across
i, such that Ei 0, E2i 2, Eui 0,0, and Euiui u2 I2. Here, 2 0 and
u2 0 are unknown constants, and I2 is the 2 2 identity matrix. Define yi yi1, yi2.
Let f f , f be the eigenvector corresponding to the largest eigenvalue of the 2 2 12
matrix1
1 n
y n
subject to the normalizations 1 2
tions 1 2 f2 1 and f 0 for the true parameters f that generate the data.
y i y i ,
i1
f 2 1 and f 0. We impose the same normaliza 2t1t 1

2t1t 1
a Show that, as n , we have
y p u2 I 2 2 f f .
Hint: By the weak law of large numbers we know that 1 n
yiyi p Eyiyi
For simplicity, you can consider the special case f 1, 1.
Hint: Given the result in a you can, without proof, assume that the eigenvalues and eigenvectors of y will converge to the eigenvalues and eigenvectors of I22 ff. What is left to do then here is to calculate the eigenvalues and eigenvectors of y.
n i1 b Explain why the result in a implies that, as n , we have
ft p ft.

c If u2 0, then what fraction of the data yit is explained by the largest principal
component f?
1 As explained in the lecture, this is equivalent to defining f as the right singular value corresponding
to the largest singular value of the orginal n T matrix y yit. 1

Question 2 applying principal components to stock return data
Data on daily stock returns for n 13424 stocks over T 245 time periods daily returns for roughly one year: 2019 is available in the datafile intradayreturns.csv. You can load that data into matlab via the command
y dlmreadintradayreturns.csv, ,,1,1;
This will return an 13424 245 matrix, whose element are denoted by yit in the following. Normalize the data such that for each i 1,…,n we have
1 T
T it
1 T T1 it
y 0, t1
y2 1. t1
Calculate the principal component estimates and f for the model y f u.

If the leading R principal components are used, then R is an n R matrix, and fR is an
T R matrix. For given R 1, 2, 3, . . . , minn, T we define the unexplained residual
u R y R f R . Let uR,it be the elements of this residual matrix.
a Plot the magnitude of the principal components i.e. the singular values of y, anal ogous to the plot on slide 25 of the PCA lecture slides. Show your plot. What is the ratio between the first and second largest singular value?
b Plot the fraction explained for the whole dataset
fraction explainedR 1 n T y2 i1 t1 it
n T 2 i1 t1 uR,it
as a function of R 1,2,3,…, analogous to the plot on slide 26 of the PCA lecture slides. Show your plot. What fraction of the data is explained by only the first principal component R1?
c For R 1, 3, 5 plot the fraction explained for the each individual stock T 2
t1 uR,it fraction explainedi, R 1 T y2 .
t1 it
For each R 1, 3, 5 you should sort the stocks i by fraction explainedi, R in
increasing order, and then plot fraction explainedi, R as a function of i. Does the fraction explained differ very much across stocks?
2