The University of New South Wales Department of Statistics
School of Mathematics
MATH5855 – Multivariate Analysis I
Assignment 1
Due August Monday, 27th August, 2018, 5pm (absolute deadline). The assignment is to be handed in as a hard copy (no e-mails!)
Maximal number of pages: 8. 1. Consider the joint density
f(x ,x )=2e−x1 ,x >0,0<x <1. 12 x21 2
a) Compute fX2 (x2) and fX1|X2 (x1|x2).
b) Give the best approximation g∗(X2) in mean square sense for X1 (i.e. find explic- itly g∗(X2) that minimizes E[X1 − g(X2)]2 over all possible choices of g(X2) such that E[g(X2)2] < ∞).
c) For a given realization x2, calculate the mean square error of the best approximation (that is, calculate E{[X1 − g∗(X2)]2|X2 = x2}.)
3 2
2. Write down the spectral decomposition of the matrix Σ = 2 3 . Correspond-
242
ingly, find Σ1 , Σ1 and Σ−1 (give final answers). Find the eigenvectors of Σ−1.
has a multivariate normal distribution with Σ as above be- ing its covariance matrix and if n = 25 observations gave a vector of sample means
X ̄ 1 1 . 2 μ 1
X ̄ = 2.5 , draw a confidence ellipsoid for the mean vector μ at a level of 95%
22
as accurately as possible.
3. The table below gives an extract of the petal lengths and widths of two types of iris, Iris setosa and Iris versicolor.
In addition, if X
2
X1
Petal length Iris setosa |
Petal width Iris Setosa |
Petal length Iris versicolor |
Petal width Iris versicolor |
x1 |
x2 |
x3 |
x4 |
1.4 … 1.4 |
0.2 … 0.2 |
4.7 … 4.1 |
1.4 … 1.3 |
You should download the whole data set from the file iris.dat in moodle.
X1
X2
i) Test hypothesis about multivariate normality of the vector X = (use the
IML program discussed in the SAS Lab 2 or some other method).
Hint: to create a matrix x from the vector observations x1-x4 you can proceed as follows:
1
X3 X4
data iris; infile ’iris.dat’; input x1 x2 x3 x4;run; proc iml; use iris var{ x1 x2 x3 x4}; read all var _num_ into x;
ii) Find μˆ and the sample covariance matrix. Use the IML program again or use the SAS procedure CORR. (Hint for users of CORR : proc corr cov; var x1-x4;).
iii) Estimate the conditional distribution (i.e. give estimators of mean vector and covari- ance matrix) of (X3, X4) given that x1 = 1.3, x2 = 1.5. (Use submatrices within IML.)
iv) The sample covariance matrix is
0.0060693878 0.0111061224 −.0020000000
−.0156326531 −.0020000000 0.2208163265
−.0053183673 −.0036693878
0.0731020408
0.0301591837 0.0060693878
S =
−.0156326531
−.0053183673 −.0036693878 0.0731020408 0.0391061224
(or a reasonable rounding of it depending on whether you used IML or CORR). Calculate
its inverse S−1 e.g. by using the IML procedure. Calculate the T2 statistic and test the X1 1.2
X2 0.6
hypothesis H0 : E = at 5% level of significance.
X1
X 2
1 0 −1 0 0 1 0 −1
X =
the means of Y ).
X and reformulate the hypothesis in terms of
X3 4.0 X4 1.6
v) Test the hypothesis that the mean length and width of the Iris setosa equals those of the Iris versicolor at significance level 0.05. What is your conclusion? (Hint: Transform
X3 X4
0.89815 0.3241
1 0.1039 −0.15
0.1039 1 0.19245
−0.15
. Determine:
X3 X4
into Y =
4. A shop manager is studying the sales of certain brand over periods of time. He uses different marketing methods and measures variables such as: number of sales (X1), price (X2), advertisement costs in local newspaper (X3 ) and presence of sales assis- tant (X4) in hours per period. The data distribution can be approximated by X = X1 172 1000 −80 1100 275
105 1100 90 1500 200 94 275 −90 200 720
−0.11317 0.89815 0.3241
X2 104 −80 500 90 −90
∼ N4(μ,Σ) where μ = , Σ = , R =
1
−0.11317
0.19245 1
X2 100
a) The conditional distribution of X1 given X3 = 100 . Hence find the best
X4 100
linear approximation (w.t. to minimal mean-squared error) of sales when X2 = X3 =
X4 = 100.
b) The squared multiple correlation of the sales with the remaining variables (ρ21.234).
Compare it to ρ212 and comment.
X 100 c) The conditional distribution of (X1,X2) given 3 = .
2
X4 100