Semester 1 Assessment, 2020
School of Mathematics and Statistics MAST30025 Linear Statistical Models This exam consists of 24 pages (including this page)
Authorised materials: printed one-sided copy of the Exam or the Masked Exam made available earlier (or an offline electronic PDF reader), any amount of handwritten material, a Casio FX82 calculator, and blank A4 paper.
Instructions to Students
During exam writing time you may only interact with the device running the Zoom session with supervisor permission. The screen of any other device must be visible in Zoom from the start of the session.
If you have a printer, print out the exam single-sided and hand write your solutions into the answer spaces.
If you do not have a printer, or if your printer fails on the day of the exam,
(a) download the exam paper to a second device (not running Zoom), disconnect it from
the internet as soon as the paper is downloaded and read the paper on the second device;
(b) write your answers on the Masked Exam PDF if you were able to print it single-sided before the exam day.
If you do not have the Masked Exam PDF, write single-sided on blank sheets of paper.
If you are unable to answer the whole question in the answer space provided then you can append additional handwritten solutions to the end of your exam submission. If you do this you MUST make a note in the correct answer space or page for the question, warning the marker that you have appended additional remarks at the end.
Assemble all the exam pages (or template pages) in correct page number order and the correct way up, and add any extra pages with additional working at the end.
Scan your exam submission to a single PDF file with a mobile phone or a scanner. Scan from directly above to avoid any excessive keystone effect. Check that all pages are clearly readable and cropped to the A4 borders of the original page. Poorly scanned submissions may be impossible to mark.
Upload the PDF file via the Canvas Assignments menu and submit the PDF to the GradeScope tool by first selecting your PDF file and then clicking on Upload PDF.
Confirm with your Zoom supervisor that you have GradeScope confirmation of submission before leaving Zoom supervision.
You should attempt all questions.
There are 7 questions with marks as shown. The total number of marks available is 90.
Supplied by download for enrolled students only—©University of Melbourne 2020
Student Number
Page 1 of 24 — add extra pages after page 24 — Page 1 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
Question 1 (10 marks)
(a) [2 marks] Give an example of a 3 × 3 idempotent matrix which is not 0 or I3.
100
A simple example is 0 1 0 . 000
2
Subtotal 2
3 2
(b) [2 marks] Show that the matrix A = 2 3 is positive definite.
The eigenvalues of A satisfy the characteristic equation det A − λI = (3−λ)2 −4 = λ2 −6λ+5 = (λ−1)(λ−5) = 0, so the eigenvalues are 1 and 5, which are both greater than 0. Hence A is positive definite.
2
Subtotal 2
(c) [3 marks] Show directly that ∂ yT Ay = Ay + AT y. ∂y
∂T∂
∂yy Ay = ∂y yjAjkyk
i
i j,k
= 2Aiiyi + yjAji + Aikyk
j̸=i k̸=i = yjAji + Aikyk
jk
=ATy+Ay . i
3
Subtotal 3
Page 2 of 24 pages
Page 2 of 24 — add extra pages after page 24 — Page 2 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(d) [3 marks] Show directly that for any n × k matrix A with n ≥ k, the matrix I − A(AT A)cAT has a rank of n − r(A). (You may assume that this matrix is idempotent.)
r(I − A(AT A)cAT ) = tr(I − A(AT A)cAT )
= tr(In) − tr(A(AT A)cAT )
= n − r(A(AT A)cAT ) = n − r(A).
3
Subtotal 3
Page 3 of 24 pages
Page 3 of 24 — add extra pages after page 24 — Page 3 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
Question 2 (14 marks)
(a) [5 marks] Let X1 ∼ χ2k1,λ1 and X2 ∼ χ2k2,λ2 be independent. Show directly that X1+X2 ∼ χ2k1 +k2 ,λ1 +λ2 .
We can write Xi = yiTyi for some yi ∼ MVN(μi,Iki), with λi = 1μTi μi and y1 independent from y2. Then
2
and since
y X1+X2=yT yT 1
y μ
y μ 1 2
12y 2
1 ∼MVN 1 ,Ik+k , 22
we have the desired result, with noncentrality parameter
1 μ μTμT 1=λ1+λ2.
212μ 2
5
Subtotal 5
Page 4 of 24 pages
Page 4 of 24 — add extra pages after page 24 — Page 4 of 24
MAST30025 Linear Statistical Models
Semester 1, 2020
(b) [3 marks] Let
Calculate E[yT Ay].
−3 5 −3 y∼MVN 8 , −3 2 ,
0 −6 A= 6 7 .
> A <- matrix(c(0,-6,6,7),2,2,byrow=T)
> V <- matrix(c(5,-3,-3,2),2,2)
> mu <- c(-3,8)
> sum(diag(A%*%V)) + t(mu)%*%A%*%mu
[,1] [1,] 462
3
Subtotal 3
(c) [3 marks] Describe the distribution of 3y1 − 2y2.
3y1 − 2y2 ∼ N (Aμ, AV AT ).
> A <- matrix(c(3,-2),1,2)
> A %*% mu
[,1] [1,] -25
> A %*% V %*% t(A)
[,1] [1,] 89
3
Subtotal 3
Page 5 of 24 pages
Page 5 of 24 — add extra pages after page 24 — Page 5 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(d) [3 marks] Find all values of a and b for which ay1 + by2 is independent of 3y1 − 2y2.
We set
3 −2 A=ab
var Ay = AV AT
3−2 5 −3 3 a
=a b−3 2−2b 3−2? 5a−3b
=a b?−3a+2b
? 3(5a−3b)−2(−3a+2b)
=?? 21a−13b = 0
a = 13b. 21
3
Subtotal 3
Page 6 of 24 pages
Page 6 of 24 — add extra pages after page 24 — Page 6 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
Question 3 (18 marks)
The international bank UBS produced a report on prices and earnings in major cities throughout the world. One of the variables that they measured was the price of 1kg of rice, measured in minutes of labour required for a “typical” worker to purchase the rice. This was measured in 2003 (rice2003) and again in 2009 (rice2009).
We wish to model the 2009 price in terms of the 2003 price, using the linear model y = Xβ+ε. The following R calculations are performed:
> UBS <- read.csv('UBSprices.csv', header=T) > plot(UBS$rice2003, UBS$rice2009)
● ●●● ● ●
●●
●
● ●
● ●●●●● ●●
●●● ●●
●
●
●
● ●
●●
●
● ●
●● ●●●
●
●●●● ●●●
●
●
●
20 40 60 80
UBS$rice2003
> plot(log(UBS$rice2003), log(UBS$rice2009))
● ●●● ● ●
●● ●
● ●
● ●●●● ●
●● ●●●
●●
● ●
●
●●
●
● ●
●● ●●●
●
●
●
●●●● ●●●
●
●
●
1.5 2.0 2.5 3.0 3.5 4.0 4.5
log(UBS$rice2003)
Page 7 of 24 pages
Page 7 of 24 — add extra pages after page 24 — Page 7 of 24
log(UBS$rice2009) UBS$rice2009
2.0 2.5 3.0 3.5 4.0 10 20 30 40 50 60 70
MAST30025 Linear Statistical Models
Semester 1, 2020
> (n <- length(UBS$rice2009))
[1] 54
> X <- cbind(1, log(UBS$rice2003))
> y <- log(UBS$rice2009)
> t(X)%*%X
[,1] [,2]
[1,] 54.0000 151.5818
[2,] 151.5818 440.4496
> t(X)%*%y
[,1]
[1,] 158.3701
[2,] 456.1961
> t(y)%*%y
[,1]
[1,] 481.9005
> sum(y)
[1] 158.3701
> qt(0.975,50:55)
[1] 2.008559 2.007584 2.006647 2.005746 2.004879 2.004045
> qf(0.95,1,50:55)
[1] 4.034310 4.030393 4.026631 4.023017 4.019541 4.016195
> qf(0.95,2,50:55)
[1] 3.182610 3.178799 3.175141 3.171626 3.168246 3.164993
(Hint: To alleviate rounding error, keep as many digits in internal calculations as possible.) (a) [2 marks] A logarithmic transformation has been applied to both variables. Give two
reasons to justify this transformation.
The transformation is justified due to right-skewness in both vari- ables, as well as some heteroskedasticity in the original plot.
2
Subtotal 2
Page 8 of 24 pages
Page 8 of 24 — add extra pages after page 24 — Page 8 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(b) [3 marks] Calculate the least squares estimates of β.
> (b <- solve(t(X)%*%X,t(X)%*%y))
[,1]
[1,] 0.7470346
[2,] 0.7786571
3
Subtotal 3
(c) [3 marks] Calculate the sample variance s2.
> (SSRes <- t(y)%*%y - t(y)%*%X%*%b)
[,1]
[1,] 8.372216
> (s2 <- SSRes/(n-2))
[,1]
[1,] 0.1610041
3
Subtotal 3
Page 9 of 24 pages
Page 9 of 24 — add extra pages after page 24 — Page 9 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(d) [4 marks] In 2003, it cost 50 minutes of labour to buy 1kg of rice in the Republic of Linearmodelstan. Calculate (with 95% probability) an interval for the 2009 price of rice (in minutes of labour) in Linearmodelstan.
> tt <- c(1, log(50))
> exp(tt%*%b + c(-1,1)*qt(0.975,n-2)*sqrt(s2)*sqrt(1+t(tt)%*%solve
[1] 19.07946 103.30706
(e) [3 marks] Test for model relevance at the 5% significance level, using a corrected sum of squares.
Page 10 of 24 pages
4
Subtotal 4
Since 56.29 > 4.03, we reject the null hypothesis of model irrelevance. > (t(y)%*%y – (sum(y)^2/n) – SSRes)/s2
[,1]
[1,] 56.29408
3
Subtotal 3
Page 10 of 24 — add extra pages after page 24 — Page 10 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(f) [3 marks] It is claimed that, on average, the price of rice in 2003 is the same as the price of rice in 2009, in terms of labour. This corresponds to a parameter estimate of
0
β = 1 . Determine if this point lies within the joint 95% confidence region for the
parameters.
This point does not lie within the joint 95% confidence region.
> bst <- c(0,1)
> t(b-bst)%*%t(X)%*%X%*%(b-bst)
[,1]
[1,] 1.585738
> 2*s2*qf(0.95,2,n-2)
[,1]
[1,] 1.022422
3
Subtotal 3
Page 11 of 24 pages
Page 11 of 24 — add extra pages after page 24 — Page 11 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
Question 4 (12 marks)
Consider the full rank linear model y = Xβ + ε with p parameters. Now suppose that we transform the design variables x in a linear manner:
p
zi =ajixj, i=1,…,p.
j=1
(Note that the x variables include the intercept term.) Now consider the linear model y =
Zβ2 + ε2, which also has p parameters.
(a) [2 marks] Express the design matrix Z in terms of X, and state a condition under which
the second linear model is also full rank.
We have Z = XA. If A is invertible, then r(Z) = p and the model is full rank.
2
Subtotal 2
(b) [3 marks] Calculate the least squares estimators for β2 from the second model, and express them in terms of b, the least squares estimators for β.
b2 = (ZT Z)−1ZT y
= (AT XT XA)−1AT XT y
= A−1(XT X)−1(AT )−1AT XT y = A−1b.
3
Subtotal 3
Page 12 of 24 pages
Page 12 of 24 — add extra pages after page 24 — Page 12 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(c) [2 marks] Consider a subject with design variables x∗ (for the first model). Calculate a point estimate for the average response for this subject, using the second model, and express it in terms of b.
The point estimate is
(z∗)T b2 = (x∗)T AA−1b = (x∗)T b.
2
Subtotal 2
(d) [3 marks] Calculate the sample variance for the second model, and express it in terms of the sample variance for the first model.
SSReg(Z) = yT Zb2
= yT XAA−1b
= SSReg(X).
Since both SSReg and SSTotal are unchanged between models, so is SSRes and therefore s2.
3
Subtotal 3
Page 13 of 24 pages
Page 13 of 24 — add extra pages after page 24 — Page 13 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(e) [2 marks] Briefly discuss the implications of the results you have derived above in the context of fitting a linear model, with particular reference to variable standardisation.
We see that both models have identical predictions for response vari- ables and errors, so they are equivalent. Thus we can linearly trans- form design variables without changing the model, and in particular can standardise design variables (which is a linear transformation) without changing the model.
2
Subtotal 2
Page 14 of 24 pages
Page 14 of 24 — add extra pages after page 24 — Page 14 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
Question 5 (12 marks)
Consider the general linear model, y = Xβ + ε. This model may be of full or less than full rank.
(a) [2 marks] Define the term BLUE (best linear unbiased estimator), and give an example of when one might choose not to use the BLUE.
The BLUE of a quantity is an estimator of the form cT y which is unbiased, and has the lowest variance among all unbiased linear esti- mators. An example of when it might not be used is when you want to use a biased estimator, e.g., ridge regression or LASSO estimators.
2
Subtotal 2
(b) [2 marks] Describe how the parameters β of a linear model may be estimated by the method of maximum likelihood, and relate this to least squares estimation.
Maximum likelihood estimation chooses the parameters β which maximise the likelihood of the responses under the linear model as- sumption y ∼ N(Xβ,σ2I). Under this assumption, the MLE esti- mators are also the least squares estimators.
2
Subtotal 2
(c) [2 marks] Define the Cook’s distance and explain its purpose.
The Cook’s distance of the ith point is defined as 1 zi2 Hii , where k+1 1−Hii
zi is the standardised residual and Hii is the leverage of the ith point. It is used to determine if a point distorts the fit of the model.
2
Subtotal 2
Page 15 of 24 pages
Page 15 of 24 — add extra pages after page 24 — Page 15 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(d) [2 marks] Define estimability, and explain its significance for a linear model.
An estimable quantity is one which can be estimated by a linear function of the responses, i.e., tT β is estimable if there exists c such that E[cT y] = tT β. Only estimable quantities have true fixed values, and can thus be estimated or tested, in a linear model.
2
Subtotal 2
(e) [2 marks] Define interaction between a categorical and a continuous predictor, and ex- plain how to model it.
Interaction between a categorical and continuous predictor occurs when the effect of the continuous predictor on the response (i.e., the parameter) may vary depending on the level of the categorical predictor. It can be modelled by including one parameter per level of the categorical predictor for the continuous predictor.
2
Subtotal 2
(f) [2 marks] Define single and double blinding, and describe their use in experimental design.
Blinding refers to the practice of ensuring that the sample subjects do not know which treatment they are receiving (single blinding), and those administering also do not know (double blinding). It is used in experiments to ensure that there is no bias due to the placebo effect.
2
Subtotal 2
Page 16 of 24 pages
Page 16 of 24 — add extra pages after page 24 — Page 16 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
Question 6 (16 marks)
Data on 220 agricultural land sales in Minnesota over the period 2002–2011 were collected. The dataset contains the following variables:
id: ID
acrePrice: Sale price, in thousands of dollars per acre
region: One of six major agricultural regions in Minnesota
improvements: Percentage of property value in buildings
year: Year of sale
acres: Size of property
tillable: Percentage of tillable area of the land
financing: Type of financing (title transfer or seller financed)
crpPct: Percentage of land in the US Conservation Reserve Program productivity: A score measuring the productivity of the land
We wish to model the selling price (acrePrice) in terms of the other variables (except id). The following R calculations are produced:
> ML <- read.csv('ML2.csv', header=T)
> interaction_model <- lm(acrePrice ~ (. - id)^2, data=ML) > additive_model <- lm(acrePrice ~ . - id, data=ML)
> anova(additive_model, interaction_model)
Analysis of Variance Table
Model 1: acrePrice ~ (id + region + improvements + year + acres + tillable +
financing + crpPct + productivity) – id
Model 2: acrePrice ~ ((id + region + improvements + year + acres + tillable +
financing + crpPct + productivity) – id)^2
Res.Df RSS Df Sum of Sq F Pr(>F)
1 207 182.99
2 153 125.15 54 57.845 1.3096 0.1034
Page 17 of 24 pages
Page 17 of 24 — add extra pages after page 24 — Page 17 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
> selected_model <- step(additive_model)
Start: AIC=-14.52
acrePrice ~ (id + region + improvements + year + acres + tillable +
financing + crpPct + productivity) - id
Df Sum of Sq RSS AIC
- financing 1
- improvements 1
- acres 1
– productivity 1
1.135 184.13 -15.159
1.431 184.42 -14.806
1.582 184.58 -14.626
182.99 -14.519
4.189 187.18 -11.540
5.001 187.99 -10.588
6.770 189.76 -8.527
64.123 247.12 41.571
140.960 323.95 109.134
– crpPct
– tillable
– region
– year
1 1 5 1
Step: AIC=-15.16
acrePrice ~ region + improvements + year + acres + tillable +
crpPct + productivity
Df Sum of Sq RSS AIC
– improvements 1
– acres 1
– productivity 1
1.509 185.64 -15.363
1.596 185.72 -15.260
184.13 -15.159
4.168 188.30 -12.235
5.079 189.21 -11.173
6.439 190.57 -9.596
64.875 249.00 41.245
140.494 324.62 107.588
– crpPct
– tillable
– region
– year
1 1 5 1
Step: AIC=-15.36
acrePrice ~ region + year + acres + tillable + crpPct + productivity
Df Sum of Sq RSS AIC
185.64 -15.363
1.737 187.37 -15.314
4.353 189.99 -12.264
4.666 190.30 -11.902
5.163 190.80 -11.328
63.368 249.01 39.247
143.335 328.97 108.516
– acres
– crpPct
– productivity 1
– tillable
– region
– year
1 5 1
1 1
Page 18 of 24 pages
Page 18 of 24 — add extra pages after page 24 — Page 18 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
> summary(selected_model)
Call:
lm(formula = acrePrice ~ region + year + acres + tillable + crpPct +
productivity, data = ML)
Residuals:
Min 1Q Median 3Q Max
-2.1397 -0.5763 -0.1042 0.3114 5.8682
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.763e+02 5.338e+01 -12.670 < 2e-16 ***
regionNorthwest -1.915e+00 2.879e-01 -6.654 2.46e-10 ***
regionSouth Central 1.191e-03 2.376e-01 0.005 0.9960
regionSouth East 5.592e-02 2.887e-01 0.194 0.8466
regionSouth West -5.216e-01 2.236e-01 -2.332 0.0206 *
regionWest Central -1.064e+00 2.332e-01 -4.565 8.53e-06 ***
year
acres
tillable
crpPct
productivity
---
3.379e-01 2.660e-02 12.703 < 2e-16 ***
-7.921e-04 5.664e-04 -1.398 0.1635
1.109e-02 4.599e-03 2.411 0.0168 *
-9.941e-03 4.490e-03 -2.214 0.0279 *
1.319e-02 5.753e-03 2.292 0.0229 *
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9425 on 209 degrees of freedom
Multiple R-squared: 0.6094, Adjusted R-squared: 0.5907
F-statistic: 32.61 on 10 and 209 DF, p-value: < 2.2e-16
> qt(0.975,209:214)
[1] 1.971379 1.971325 1.971271 1.971217 1.971164 1.971111
> qf(0.95,5,209:214)
[1] 2.257274 2.257066 2.256860 2.256657 2.256455 2.256255
> qf(0.95,6,209:214)
[1] 2.142153 2.141943 2.141736 2.141530 2.141327 2.141125
Page 19 of 24 pages
Page 19 of 24 — add extra pages after page 24 — Page 19 of 24
MAST30025 Linear Statistical Models
Semester 1, 2020
> par(mfrow=c(2,2))
> plot(selected_model)
Residuals vs Fitted
0 1 2 3 4 5
Normal Q−Q
● ●●●●● ●● ●●●●●
● 168 ● 189
●●● ●●
165 ●
168 ● ● 189
●●●
● ● ●●●●● ●
● 165 ●
● ●
● ● ●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●●●●●●●● ● ●●
●●●●● ● ●● ●● ● ●●●●●●●●● ●
●●● ●●● ● ● ● ●
Fitted values
Scale−Location
1 2 3 4
Fitted values
●
●● ●
−3 −2 −1 0 1 2 3
Theoretical Quantiles
Residuals vs Leverage
0.0 0.1 0.2 0.3 0.4
Leverage
1 0.5
0.5 1
●● ●
●
●●●●● ● ●●● ●●●●●●
● ●●●● ● ●●●● ● ●● ●
● ● ●
● ●● ● ● ● ● ●
●● ● ●
● 168 ● 189
● ●●● ●●●
165 ●
●●
● ● ● ● ●● ● ● ● ●
● ●●●● ●
● ● ●
●●● ●
●●●● ●●●●● ●●
● 168 ● 189
● ●●
●●
●●● ● ●● ● ●●● ●
24●
●● ● ●● ● ●● ● ●● ●
●
●
● ●● ●
●●●●●●●● ●●●● ●
● ●●●●●● ● ● ● ●● ●
● ●●●●●● ● ●● ● ●● ●●● ●●●● ●
●● ●●●● ●● ●●● ●●●
●●● ● ●
● ● ●●
● ●●●
●● ● ●●● ●
● ●●
● ●
●● ●
●
● ●
●● ●
●
● Cook’s distance
0
5
(a) [2 marks] Interpret the output of the anova function.
According to the anova function, there is no significant pairwise in- teraction between the variables.
2
Subtotal 2
(b) [2 marks] Identify the variable selection procedure that has been used here.
Stepwise selection with AIC.
2
Subtotal 2
Page 20 of 24 pages
Page 20 of 24 — add extra pages after page 24 — Page 20 of 24
Standardized residuals 0.0 1.0 2.0
Residuals
Standardized residuals
Standardized residuals
−2 0 2 4 6
−2 0 2 4 6
−2 0 2 4 6
MAST30025 Linear Statistical Models Semester 1, 2020
(c) [3 marks] From the model selected_model, test for the relevance of the region variable, at the 5% level. Clearly state your F-statistic and critical value, and interpret your results in the context of the study.
> 63.368/5/0.9425^2
[1] 14.26715
Since 14.27 > 2.257, we reject the null hypothesis: region has a clear effect on the price per acre of the land.
3
Subtotal 3
(d) [2 marks] Perform one step of backwards elimination on selected_model.
Considering that region is a relevant variable, we would remove the acres variable.
2
Subtotal 2
(e) [4 marks] From the diagnostic plots, comment on the suitability of the linear model for this data. If the model is not suitable, suggest how it can be improved.
From the diagnostic plots, we can identify a slight non-linear trend for small fitted values; a slight right-skew in the residual QQ-plot; and a couple of outliers (ids 168 and 189). It might be worth removing the outliers and considering a logarithmic transformation on the response variable.
4
Subtotal 4
Page 21 of 24 pages
Page 21 of 24 — add extra pages after page 24 — Page 21 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(f) [3 marks] For a plot of agricultural land in the South West, calculate a 95% confidence interval for the effect of the year on the price per acre in the model selected_model.
The confidence interval does not change depending on the region, because there is no interaction in the model.
> 0.3379 + c(-1,1)*qt(0.975,209)*0.0266
[1] 0.2854613 0.3903387
3
Subtotal 3
Page 22 of 24 pages
Page 22 of 24 — add extra pages after page 24 — Page 22 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
Question 7 (8 marks)
(a) [5 marks] You wish to perform a study to determine if 3 treatments each produce no
effect, using a completely randomised design. To do this, you will test the hypothesis
H0 : μ+τ1 = τ1 −τ2 = τ2 −τ3 = 0. You are given resources to study 50 sample units.
Determine the optimal allocation of the number of units to assign to each treatment.
(Hint: In a completely randomised design with treatment effects τi, we have var (μ+τi) =
σ2 1 and var (τi−τj) = σ2 1 + 1 . To minimise a function f(x) under the constraint ni ni nj
g(x)=c, minimise f(x,λ)=f(x)+λ(g(x)−c).)
We want to minimise
3 2221
f(n1,n2,n3,λ)=σ n +n +n +λ ni−n. 1 2 3 i=1
This gives
∂f σ2
∂n =−2n2+λ=0
11 ∂f σ2
∂n =−2n2+λ=0 22
∂f σ2
∂n =−n2+λ=0
33 σ2
n 21 = 2 λ = n 2 2 = 2 n 23 √
n1 = n2 = 2n3.
This gives n1 = 19,n2 = 18,n3 = 13 — due to rounding, one of n1 or n2 has to be larger.
5
Subtotal 5
Page 23 of 24 pages
Page 23 of 24 — add extra pages after page 24 — Page 23 of 24
MAST30025 Linear Statistical Models Semester 1, 2020
(b) [3 marks] Compare and contrast blocking and randomisation as tools for eliminating confounding, and discuss their best use in experimental design.
Both blocking and randomisation will eliminate confounding, but blocking will only eliminate confounding in the blocking variable, while randomisation will eliminate all confounding. They should be used in conjunction to reduce variability from the blocking vari- able(s).
3
Subtotal 3
End of Exam—Total Available Marks = 90
Page 24 of 24 pages
Page 24 of 24 — add extra pages after page 24 — Page 24 of 24