SAS 编程代写 MATH5855 Multivariate Analysis I Assignment 1

The University of New South Wales Department of Statistics

School of Mathematics

MATH5855 – Multivariate Analysis I

Assignment 1

Due August Monday, 27th August, 2018, 5pm (absolute deadline). The assignment is to be handed in as a hard copy (no e-mails!)

Maximal number of pages: 8. 1. Consider the joint density

f(x ,x )=2e−x1 ,x >0,0<x <1. 12 x21 2

a) Compute fX2 (x2) and fX1|X2 (x1|x2).

b) Give the best approximation g∗(X2) in mean square sense for X1 (i.e. find explic- itly g∗(X2) that minimizes E[X1 − g(X2)]2 over all possible choices of g(X2) such that E[g(X2)2] < ∞).

c) For a given realization x2, calculate the mean square error of the best approximation (that is, calculate E{[X1 − g∗(X2)]2|X2 = x2}.)

􏰀3 2􏰁
2. Write down the spectral decomposition of the matrix Σ = 2 3 . Correspond-

242
ingly, find Σ1 , Σ1 and Σ−1 (give final answers). Find the eigenvectors of Σ−1.

has a multivariate normal distribution with Σ as above be- ing its covariance matrix and if n = 25 observations gave a vector of sample means

􏰀 X ̄ 1 􏰁 􏰀 1 . 2 􏰁 􏰀 μ 1 􏰁
X ̄ = 2.5 , draw a confidence ellipsoid for the mean vector μ at a level of 95%

22

as accurately as possible.

3. The table below gives an extract of the petal lengths and widths of two types of iris, Iris setosa and Iris versicolor.

In addition, if X

2

􏰀 X1 􏰁

Petal length Iris setosa

Petal width Iris Setosa

Petal length Iris versicolor

Petal width Iris versicolor

x1

x2

x3

x4

1.4 … 1.4

0.2 … 0.2

4.7 … 4.1

1.4 … 1.3

You should download the whole data set from the file iris.dat in moodle.
 X1 

 X2 
i) Test hypothesis about multivariate normality of the vector X =   (use the

IML program discussed in the SAS Lab 2 or some other method).
Hint: to create a matrix x from the vector observations x1-x4 you can proceed as follows:

1

 X3  X4

data iris; infile ’iris.dat’; input x1 x2 x3 x4;run;
proc iml; use iris var{ x1 x2 x3 x4};
read all var _num_  into x;

ii) Find μˆ and the sample covariance matrix. Use the IML program again or use the SAS procedure CORR. (Hint for users of CORR : proc corr cov; var x1-x4;).

iii) Estimate the conditional distribution (i.e. give estimators of mean vector and covari- ance matrix) of (X3, X4) given that x1 = 1.3, x2 = 1.5. (Use submatrices within IML.)

iv) The sample covariance matrix is

0.0060693878 0.0111061224 −.0020000000

−.0156326531 −.0020000000 0.2208163265

−.0053183673 −.0036693878

 0.0731020408 

 0.0301591837  0.0060693878

S = 
 −.0156326531

−.0053183673 −.0036693878 0.0731020408 0.0391061224
(or a reasonable rounding of it depending on whether you used IML or CORR). Calculate

its inverse S−1 e.g. by using the IML procedure. Calculate the T2 statistic and test the  X1   1.2 

X2  0.6
hypothesis H0 : E   =   at 5% level of significance.

 X1 

X 2  

􏰀1 0 −1 0􏰁 0 1 0 −1

X =
the means of Y ).

X and reformulate the hypothesis in terms of

 X3   4.0  X4 1.6

v) Test the hypothesis that the mean length and width of the Iris setosa equals those of the Iris versicolor at significance level 0.05. What is your conclusion? (Hint: Transform

X3 X4

 

0.89815 0.3241

1 0.1039 −0.15

0.1039 1 0.19245

−0.15 
 . Determine:

X3 X4

into Y =

4. A shop manager is studying the sales of certain brand over periods of time. He uses different marketing methods and measures variables such as: number of sales (X1), price (X2), advertisement costs in local newspaper (X3 ) and presence of sales assis- tant (X4) in hours per period. The data distribution can be approximated by X =  X1   172   1000 −80 1100 275 

105 1100 90 1500 200 94 275 −90 200 720

−0.11317 0.89815 0.3241 

X2 104 −80 500 90 −90
  ∼ N4(μ,Σ) where μ =  , Σ =  , R =

1
 −0.11317

0.19245  1

X2 100
a) The conditional distribution of X1 given  X3  =  100  . Hence find the best

X4 100
linear approximation (w.t. to minimal mean-squared error) of sales when X2 = X3 =

X4 = 100.
b) The squared multiple correlation of the sales with the remaining variables (ρ21.234).

Compare it to ρ212 and comment.

􏰂X􏰃 􏰂100􏰃 c) The conditional distribution of (X1,X2) given 3 = .

2

X4 100