SAS 代写 Department of Statistics MATH5855 – Multivariate Analysis I

The University of New South Wales

Department of Statistics MATH5855 – Multivariate Analysis I

Assignment 2
Due Tuesday, 25th September 2018, 5pm

1. i) You are asked to write a subroutine (module) within SAS/IML with an input: • an arbitrary data matrix with n datapoints, each containing p dimensions (p < n)

• a vector a containing 2 integers among the set {1, 2, . . . , p}
If the integers are i and j, say, the module should calculate an estimate of the partial

correlation of the ith and jth component when the remaining ones have been fixed.

Head Length, First Son

Head Breadth, First Son

Head Length, Second Son

Head Breadth, Second Son

191 195 181

… 190

155 149 148

… 163

179 201 185

… 187

145 152 149

… 150

The complete file brothers.dat (available in moodle) contains the head lengths and breadths of brothers (first and second son in a sample of 25 families). Enter the 25 × 4 matrix within IML, call the module and calculate the partial correlation r34.12. Verify your calculation using the CORR procedure (study its help first) or by hand calculation. Calculate also the partial correlation r21.34.

Hint. You may consult the file imlregress1.sas in moodle for a hint in organising sub- routines in SAS/IML. Operators and control structures you may possibly need, include: DO..END, IF..THEN, START..FINISH, comparison operators, subsetting of matrices can be found in the help of the SAS/IML procedure. If you face a difficulty writing the mod- ule in its complete generality (that is, arbitrary indices i, j), write a simpler version with (i = 1, j = 2) which could then be used after the columns of the original data matrix have been reshuffled.

ii) Compare r12.34 to r12 and explain the differences having in mind the meaning of the four variables.

iii) Use Fisher’s z to find a confidence interval (CI) for ρ12.34 with a level of confidence 0.95.

iv) Estimate the multiple correlation between x3 and (x1,x2), and test its significance at 5% level.

v) Test the significance of the correlation coefficient ρ34, i.e., test H0 : ρ34 = 0 against a two-sided alternative, using level of significance α = 0.05.

1

2. Soil samples were taken at n = 45 randomly selected locations in South Queens- land. Measurements of nitrogen concentration in the soil were made at depths of 1, 3, 5 and 7 feet from the surface. The four measurements from the i-th location can be arranged in a vector as Xi = (X1i, X2i, X3i, X4i)′. Let

n − 1 i=1
where X ̄ is the sample mean. The data is in the file soil.dat on moodle. Multivariate

normality can be assumed.

i) Perform a test of the hypothesis that the mean nitrogen concentration is the same at all 4 depths. Report the relevant statistic. State your conclusions.

Hint Transform the four-dimensional data vector X into a three-dimensional vector 1 −1 0 0

Y = CX with C =  0 1 −1 0  and reformulate the hypothesis. 0 0 1 −1

ii) Test the null hypothesis that the mean nitrogen concentration decreases in such a way that the mean at one depth is half of the mean at the previous depth, i.e., H0 : (μi/μi−1) = 1/2, i = 2, 3, 4. Report and comment.

3. Consider identifying the neurotic state of an individual referred for psychiatric examination. Three measurements A, B, C are made on each individual. The mean scores for each of 3 groups are given as:

 2.30 0.25 0.47 
The pooled within group covariance matrix is: Σˆ =  0.25 0.60 0.03 . Assume equal

1n
S= 􏰋(Xi −X ̄)(Xi −X ̄)′

Group

A

B

C

Anxiety State Obsession Normal

2.9 4.6 0.6

1.2

1.6 0.15

0.75 1.2 0.25

0.47 misclassification costs and equal priors for the three groups.

0.03 0.59
a) Calculate the linear discriminant scores for classifying into one of the three groups.

A

B

C

Mary: Fred: Giselda:

3.000 4.000 1.000

1.200 1.400 0.500

1.000 1.320 0.330

b) Classify the following newly observed individuals:

c) Consider classifying individuals from the “Anxiety state” and “Obsession” groups only. Determine the linear discriminant function and estimate the probabilities of misclassifi- cation P(1|2) and P(2|1).

4. The vectors x1,x2,…,xn are a sample from Np(0,λD), where λ > 0 is an unknown

scalar and D is a known symmetric positive definite matrix. Show that the Maximum

Likelihood Estimator of λ is λˆ = 1 tr(D−1B) where B = 􏰌n xx′. Show also that np i=1 i i

npλˆ ∼ χ2 . Hence suggest a two-sided confidence interval for λ at level (1 − α). λ np

(Hint: You may find it useful to consider vectors Yi = D−1/2Xi) 2

5. For a random vector (X, Y )′ of continuous random variables with marginal distri- butions F and G, the coefficient of upper dependence is defined as

λupper =limP(Y >G−1(u)|X>F−1(u)) u→1

provided that the limit exists. In the context of copulae, this results in the investigation of

λupper =lim(1−2u+C(u,u))/(1−u). u→1

When λupper ∈ (0, 1] we say that there exists an asymptotic dependence in the upper tail; when λupper = 0 the random variables are said to be asymptotically independent in the upper tail.

Show that the Gumbell-Hougaard copula
Cθ(u, v) = exp(−[(−logu)θ + (−logv)θ]1/θ), u ∈ [0, 1], v ∈ [0, 1]

with a parameter θ ∈ [1, ∞) exhibits upper tail dependence when θ > 1).

(Reminder: as we know when θ = 1 the above copula coincides with the independence copula).

3