The University of New South Wales
Department of Statistics
MATH5855 – Multivariate Analysis
Assignment 3
Due 12th October 2018, 5pm
1. (Use SAS as a software package.) The file bank.dat contains the measurements
of 100 genuine and 100 forged swiss 1000-frank bills. The columns correspond to the
following variables:
• X1: Length of the bill, X2 : Height of the bill, measured on the left,
• X3: Height of the bill, measured on the right, X4 : Distance of inner frame to lower
border,
• X5: Distance of inner frame to upper border, X6 : Length of image diagonal.
Perform principal component analysis by working with the covariance matrix. Answer
the following questions:
i) Estimate the first and the second principal component using the variables Xi. Give
a meaningful ”interpretation” of these components having in mind the magnitudes and
the signs of the component weights.
ii) Perform the same analysis using the standardized variables Zi (i.e., by using
the correlation matrix). How many principal components do you need to explain at least
90% of the variability in each case (i.e., when the analysis is performed on the covariance
matrix and when it is performed on the correlation matrix). Having in mind the nature of
the variables X1 −X6, why could you state that for this particular data set the analysis
using the covariance matrix is superior. Explain your answer.
iii) Create an indicator variable called forge with values 1 and 2 for the first and
second set of hundred measurements. Then plot the values of the second against the first
principal component’s value for each of the observations. Label the points by the value of
forge. Do the first two principal components deliver a good way of separating the forged
and the genuine banknotes?
iv) Perform a linear discriminant analysis using the given data set and evaluate the
accuracy of the classification by using the crosslisterr option. Report your findings.
2. For the covariance matrix Σ ∈ Mp,p it is known that all its elements σij > 0 for
every i, j = 1, 2, . . . , p. Using first principals and definitions, prove that:
a) Coefficients of the first principal component are all of the same sign,
b) Coefficients of each other principal component cannot be all of the same sign.
3. Data on n = 20 consecutive years has been collected reflecting annual average
prices of beef steers X1 and of hogs X2 and the annual per capita consumption of beef X3
and of pork X4. We are interested in the relation of livestock prices to meat production.
The file price-cons.dat contains the variables Y (year index) and X1, X2, X3, X4. We
1
could proceed by calculating U = (X1+X2)/2, V = X3+X4 and then regressing U on V. A
perhaps better procedure would be to construct a (weighted) price index U = a1X1+a2X2
and consumption index V = b3X3 + b4X4 and to look at the maximal correlation between
U and V. This is the canonical correlation analysis approach.
i) Find and list both canonical correlations and the related canonical variates. Express
the canonical variates using the raw coefficients and also by using the standardized
coefficients. Since the prices are in dollar units but the consumption is in pounds, does it
make sense to standardize here?
ii) Formulate the hypothesis of independence of the price index and of the consump-
tion index (intuition shows that it must be rejected). Using the output, explain precisely
how the Wilks statistic has been calculated using the roots form the output. Also, explain
precisely how the degrees of freedom for the F -approximation have been calculated.
iii) Is one only canonical variable pair enough (i.e., is the second canonical correlation
also significant)?
4. In Lecture 8, we formulated a result stating how to calculate the weights of a
variance-efficient portfolio of p stocks X1, X2, . . . , Xp.
a) Prove the following general linear algebra result: for a non-singular p × p matrix
A and for p-dimensional vectors U and V : (A+ UV ′)−1 = A−1 − 1
1+V ′A−1U
A−1UV ′A−1.
b) Using a) (or otherwise) show that for a portfolio of equally-correlated assets whose
returns have the same variances (that is, when
Σ = σ2
1 ρ ρ . . . ρ
ρ 1 ρ . . . ρ
ρ ρ 1 . . . ρ
. . . . . . . . . . . . . . .
ρ ρ . . . . . . 1
,−
1
p− 1
< ρ < 1)
the components in the variance-efficient portfolio have equal weights of 1/p.
c) Calculate the determinant of Σ. Using the value of the determinant (or otherwise)
explain why the restriction − 1
p−1 < ρ < 1 on the common correlation must hold.
2