留学生作业代写 MAST20005) & Elements of Statistics (MAST90058)

Analysis of variance
(Module 8)
Statistics (MAST20005) & Elements of Statistics (MAST90058)
School of Mathematics and Statistics University of Melbourne

Copyright By PowCoder代写 加微信 powcoder

Semester 2, 2022

Aims of this module
• Introduce the analysis of variance technique, which builds upon the variance decomposition ideas in previous modules.
• Revisit linear regression and apply the ideas of hypothesis testing and analysis of variance.
• Discuss ways to derive optimal hypothesis tests.

• Analysis of variance (ANOVA). Comparisons of more than two groups
• Regression.
Hypothesis testing for simple linear regression
• Likelihood ratio tests.
A method for deriving the best test for a given problem

Analysis of variance (ANOVA) Introduction
One-way ANOVA
Two-way ANOVA
Two-way ANOVA with interaction
Hypothesis testing in regression Analysis of variance approach
Likelihood ratio tests

Analysis of variance: introduction
• Initial aim: compare the means of more than two populations
• Broader and more advanced aims:
◦ Explore components of variation
◦ Evaluate the fit a (general) linear model
• Formulated as hypothesis tests
• Referred to as analysis of variance, or ANOVA for short
• Involves comparing different summaries of variation
• Related to the ‘analysis of variance’ and ‘variance decomposition’ formulae we derived previously

Example: large variation between groups
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●● ●●●● ● ●● ●● ●
● ● ● ● ●| ● ● ●

Example: smaller variation between groups
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●● ●●● ● ●● ●● |●●● ●● ● ● ●●●●● ● ● ● ●●● ●●●●●
● ● ● | ● ● ● ● ● ● ●● ● |● ●●● ●
●● ●●|●●●●●
● ●●●|● ●●

ANOVA: setting it up
• We have random samples from k populations, each having a normal distribution
• We sample ni iid observations from the ith population, which has mean μi
• All populations assumed to have the same variance, σ2
• Question of interest: do the populations all have the same mean?
• Hypotheses:
H0:μ1 =μ2 =···=μk =μ versus H1:H ̄0 (H ̄0 means ‘not H0’)
• This model is known as a one-way ANOVA, or single-factor ANOVA

Population Sample Statistics
N(μ1,σ2) X11,X12,…,X1n1 X ̄1· S12 N(μ2, σ2) X21, X22, . . . , X2n2 X ̄2· S2
N(μk,σ2) Xk1,Xk2,…,Xknk X ̄k· Sk2 Overall X ̄··

n = n1 + · · · + nk (total sample size)
X ̄i· = X ̄·· =
􏰃 Xij (group means)
􏰃 􏰃 Xij = n i=1 j=1
(grand mean)

Sum of squares (definitions)
• We now define statistics each called a sum of squares (SS) • The total SS is:
k ni SS(TO)=􏰃􏰃(Xij −X ̄··)2 i=1 j=1
• The treatment SS, or between groups SS, is:
SS(T ) = 􏰃 􏰃(X ̄i· − X ̄··)2 = 􏰃 ni(X ̄i· − X ̄··)2
i=1 j=1 i=1 • The error SS, or within groups SS, is:
SS(E) = 􏰃 􏰃(Xij − X ̄i·)2 = 􏰃(ni − 1)Si2
k i=1 j=1 i=1

Analysis of variance decomposition
• It turns out that:
SS(TO) = SS(T) + SS(E)
• This is similar to the analysis of variance formulae we derived
earlier, in simpler scenarios (iid model, regression model)
• We will use this relationship as a basis to derive a hypothesis test
• Let’s first prove the relationship. . .

• Start with the ‘add and subtract’ trick:
k ni SS(TO)=􏰃􏰃(Xij −X ̄··)2 i=1 j=1
= 􏰃 􏰃(Xij − X ̄i· + X ̄i· − X ̄··)2 i=1 j=1
= 􏰃 􏰃(Xij − X ̄i·)2 + 􏰃 􏰃(X ̄i· − X ̄··)2
i=1 j=1 i=1 j=1 k ni
+ 2 􏰃 􏰃(Xij − X ̄i·)(X ̄i· − X ̄··) i=1 j=1
= SS(E) + SS(T) + CP

• The cross-product term is:
CP = 2 􏰃 􏰃(Xij − X ̄i·)(X ̄i· − X ̄··)
• Thus, we have:
= 2 􏰃(X ̄i· − X ̄··) 􏰃(Xij − X ̄i·) i=1 j=1
= 2 􏰃(X ̄i· − X ̄··)(niX ̄i· − niX ̄i·)
SS(TO) = SS(T) + SS(E)

Sampling distribution of SS(E)
• The sample variance from the ith group, Si2, is an unbiased
estimator of σ2 and we know that (ni − 1)Si2/σ2 ∼ χ2ni−1
• The samples from each group are independent, so we can usefully
combine them,
􏰃k(ni−1)Si2 SS(E) 2
i=1 σ2 = σ2 ∼ χn−k
• Notethat:(n1−1)+(n2−1)+···+(nk−1)=n−k
• This also gives us an unbiased pooled estimator of σ2,
σˆ2 = SS(E) n−k
• These results are true irrespective of whether H0 is true or not 15 of 81

Null sampling distribution of SS(TO)
• If we assume H0, we can derive simple expressions for the sampling
distributions of the other sums of squares
• The combined data would be a sample of size n from N(μ,σ2).
Hence SS(T O)/(n − 1) is an unbiased estimator of σ2 and
S S ( T O ) ∼ χ 2n − 1 σ2

Null sampling distribution of SS(T ) • Under H0, we have X ̄i· ∼ N(μ, σ2 )
ni • X ̄1·,X ̄2·,…,X ̄k· areindependent
• (Can think of this as a sample of sample means, and then think about what its variance estimator is)
• It is possible to show that (proof not shown):
􏰃k ni(X ̄i· −X ̄··)2 SS(T) 2
i=1 σ2 = σ2 ∼ χk−1 and that this is independent of SS(E)

Null sampling distributions
In summary, under H0:
SS(TO) = SS(E) + SS(T)
SS(TO) ∼ χ2n−1, SS(E) ∼ χ2n−k, SS(T) ∼ χ2k−1,
σ2 σ2 σ2 SS(E) and SS(T) are independent

H0 is true
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●●●●●● ●●● ● ●

H1 is true
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●● ●●●● ● ●● ●● ●
● ● ● ● ●| ● ● ●

SS(T) under H1
• What happens if H1 is true?
• The population means differ, which will make SS(T) larger • Let’s make this precise. . .

• Let μ ̄ = n−1 􏰂ki=1 niμi, and then, 􏰠k􏰡􏰠k􏰡
=􏰃n E(X ̄2)−nE(X ̄2) ii· ··
E[SS(T)]=E 􏰃n(X ̄ −X ̄ )2 =E 􏰃nX ̄2−nX ̄2
i i· ·· i=1
i i· ·· i=1
= 􏰃 ni 􏰈var(X ̄i·) + E(X ̄i·)2􏰉 − n 􏰈var(X ̄··) + E(X ̄··)2􏰉 i=1
􏰃k 􏰎σ2 2􏰏 􏰎σ2 2􏰏 = nin+μi−nn+μ ̄
= (k − 1)σ2 + 􏰃 ni(μi − μ ̄)2 i=1

k E[SS(T)]=(k−1)σ2 +􏰃ni(μi −μ ̄)2
• Under H0 the second term is zero and we have,
E(SS(T)) = σ2 k−1
• Otherwise (under H1), the second term is positive and gives, E(SS(T)) > σ2
k−1 • In contrast, we always have,
E(SS(E)) = σ2 n−k

F-test statistic
• This movitates using the following as our test statistic: F = SS(T)/(k−1)
SS(E)/(n − k)
• Under H0, we have F ∼ Fk−1,n−k, since it is the ratio of
independent χ2 random variables
• Under H1, the numerator will tend to be larger
• Therefore, reject H0 if F > c
• This is known as an F-test

ANOVA table
The test quantities are often summarised using an ANOVA table:
Treatment Error
k − 1 n − k
SS(T) SS(E)
MS(T) = SS(T) k−1
MS(E) = SS(E) n−k
MS(T) MS(E)
n−1 SS(TO)
• MS = ‘Mean square’
• σˆ2 = MS(E) is an unbiased estimator

Example (one-way ANOVA)
Force required to pull out window studs in 5 positions on a car window.
120 140 160

> head(data1) Position Force 1 192 2 190 3 187 4 1105 5 186 6 183
> table(data1$Position)
12345 77777

> model1 <- lm(Force ~ factor(Position), data = data1) > anova(model1)
Analysis of Variance Table
Response: Force
Df Sum Sq Mean Sq F value Pr(>F)
factor(Position) 4 16672.1 4168.0 44.202 3.664e-12 ***
Residuals 30 2828.9 94.3
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
• Need to use factor() to denote categorical variables
• R doesn’t provide a ‘Total’ row, but we don’t need it • Residuals is the ‘Error’ row
• Pr(>F) is the p-value for the F-test

We conclude that the mean force required to pull out the window studs varies between the 5 positions on the car window (e.g. p-value < 0.01) This was obvious from the boxplots: positions 1 & 2 are quite different from 3, 4 & 5 120 140 160 Two factors • In one-way ANOVA, the observations were partitioned into k groups • In other words, they were defined by a single categorical variable (‘factor’) • What if we had two such variables? • We can extend the procedure to give two-way ANOVA, or two-factor ANOVA • For example, the fuel consumption of a car may depend on type of petrol and the brand of tyres Two-way ANOVA: setting it up • Factor 1 has a levels, Factor 2 has b levels • Suppose we have exactly one observation per factor combination • Observe Xij with factor 1 at level i and factor 2 at level j • Gives a total of n = ab observations • Assume Xij ∼ N(μij,σ2), i = 1,...,a, j = 1,...,b, and that these are independent • Consider the model: μij = μ + αi + βj with 􏰃αi =0, 􏰃βj =0 • μ is an overall effect, αi is the effect of the ith row and βj the effect of the jth column. • Forexample,a=4andb=4, 1234 • WeareusuallyinterestedinH0A:α1 =α2 =···=αa =0or H0B:β1 =β2 =···=βb =0 μ+α1 +β1 μ+α2 +β1 μ+α3 +β1 μ+α4 +β1 μ+α1 +β2 μ+α2 +β2 μ+α3 +β2 μ+α4 +β2 μ+α1 +β3 μ+α2 +β3 μ+α3 +β3 μ+α4 +β3 μ+α1 +β4 μ+α2 +β4 μ+α3 +β4 μ+α4 +β4 X ̄·· = 􏰃􏰃Xij, X ̄i· = 􏰃Xij, X ̄·j = 􏰃Xij ab i=1 j=1 b j=1 a i=1 • Arguing as before, ab SS(TO)=􏰃􏰃(Xij −X ̄··)2 i=1 j=1 = 􏰃 􏰃 􏰈(X ̄i· − X ̄··) + (X ̄·j − X ̄··) + (Xij − X ̄i· − X ̄·j + X ̄··)􏰉2 = b 􏰃(X ̄i· − X ̄··)2 + a 􏰃(X ̄·j − X ̄··)2 +􏰃􏰃(Xij −X ̄i· −X ̄·j +X ̄··)2 = SS(A) + SS(B) + SS(E) • Ifbothα1 =···=αa =0andβ1 =···=βb =0,thenwehave SS(A)/σ2 ∼ χ2a−1, SS(B)/σ2 ∼ χ2b−1 and SS(E)/σ2 ∼ χ2(a−1)(b−1) and these variables are independent (proof not shown) • RejectH0A:α1 =···=αa =0atsignificancelevelαif: FA = SS(A)/(a−1) >c
SS(E)/((a − 1)(b − 1)) where c is the 1 − α quantile of Fa−1,(a−1)(b−1)
• RejectH0B:β1 =···=βb =0atsignificancelevelαif: FB = SS(B)/(b−1) >c
SS(E)/((a − 1)(b − 1)) where c is the 1 − α quantile of Fb−1,(a−1)(b−1)

ANOVA table
Factor A Factor B Error
(a − 1)(b − 1)
SS(A) SS(B) SS(E)
MS(A) = SS(A) a−1
MS(B) = SS(B) b−1
MS(E) = SS(E) (a−1)(b−1)
MS(A) MS(E) MS(B) MS(E)
Total ab−1 SS(TO)

Example (two-way ANOVA)
Data on fuel consumption for three types of car (A) and four types of fuel (B).
> head(data2)
Car Fuel Consumption
111 16 212 18 313 21 414 21 521 14 622 15

Consumption
14 15 16 17 18 19 20 21

> model2 <- lm(Consumption ~ factor(Car) + factor(Fuel), + data = data2) > anova(model2)
Analysis of Variance Table
Response: Consumption
Df Sum Sq Mean Sq F value Pr(>F)
factor(Car) 2
factor(Fuel) 3
Residuals 6
24 12.0000 18 0.002915 **
30 10.0000 15 0.003401 **
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
From this we conclude there is a clear difference in fuel consumption between cars (we reject H0A : α1 = α2 = α3) and also between fuels (werejectH0B:β1 =β2 =β3 =β4).

Interaction terms
• In the previous example we assumed an additive model: μij = μ + αi + βj
• This assumes, for example, that the relative effect of petrol 1 is the same for all cars.
• If it is not true, then there is a statistical interaction (or simply an interaction) between the factors

• A more general model, which includes interactions, is: μij = μ + αi + βj + γij
where γij is the interaction term associated with combination (i, j).
• In addition to our previous assumptions, we also impose:
􏰃γij =0, and 􏰃γij =0
• The terms αi and βj are called main effects
• When written out as a table they are also often referred to as the row effects and column effects respectively

• Writing this out as a table: 12···
1 μ+α1 +β1 +γ11 2 μ+α2 +β1 +γ21 3 μ+α3 +β1 +γ31 4 μ+α4 +β1 +γ41
μ+α1 +β2 +γ12 ··· μ+α2 +β2 +γ22 ··· μ+α3 +β2 +γ32 ··· μ+α4 +β2 +γ42 ···
• We are now interested in testing whether: ◦ the row effects are zero
◦ the column effects are zero
◦ the interactions are zero (do this first!)
• To make inferences about the interactions we need more than one observation per cell
• Let Xijk, i = 1,…,a, j = 1,…,b, k = 1,…,c be the kth observation for combination (i, j)

X ̄i·· = X ̄·j· = X ̄··· =
􏰃􏰃Xijk bc j=1 k=1
􏰃􏰃Xijk ac i=1 k=1
1abc 􏰃􏰃􏰃Xijk
abc i=1 j=1 k=1

• and as before
abc SS(TO)=􏰃􏰃􏰃􏰆Xijk −X ̄···􏰇2
i=1 j=1 k=1 ab
= bc 􏰃(X ̄i·· − X ̄···)2 + ac 􏰃(X ̄·j· − X ̄···)2
+ c 􏰃 􏰃(X ̄ij· − X ̄i·· − X ̄·j· + X ̄···)2
i=1 j=1 abc
+ 􏰃 􏰃 􏰃(Xijk − X ̄ij·)2 i=1 j=1 k=1
= SS(A) + SS(B) + SS(AB) + SS(E)

Test statistics
• Familiar arguments show that to test
H0AB : γij = 0, i = 1,…,a, j = 1,…,b
we may use the statistic
F = SS(AB)/[(a − 1)(b − 1)] SS(E)/[ab(c − 1)]
which has a F distribution with (a − 1)(b − 1) and ab(c − 1) degrees of freedom.

we may use the statistic
H0A:αi =0, i=1,…,a F = SS(A)/[(a − 1)]
SS(E)/[ab(c − 1)]
which has a F distribution with (a − 1) and ab(c − 1) degrees of

we may use the statistic
H0B:βj =0, j=1,…,b F = SS(B)/[(b − 1)]
SS(E)/[ab(c − 1)]
which has a F distribution with (b − 1) and ab(c − 1) degrees of

ANOVA table
Factor A Factor B Factor AB Error
(a − 1)(b − 1) ab(c−1)
SS(B) SS(AB) SS(E)
MS(A) = SS(A) a−1
MS(B) = SS(B) b−1
MS(AB) = SS(AB) (a−1)(b−1)
MS(E)= SS(E) ab(c−1)
MS(A) MS(E) MS(B) MS(E) MS(AB) MS(E)
Total abc−1 SS(TO)

Example (two-way ANOVA with interaction)
• Six groups of 18 people
• Each person takes an arithmetic test: the task is to add three
numbers together
• The numbers are presented either in a down array or an across
array; this defines 2 levels of factor A
• The numbers have either one, two or three digits; this defines 3
levels of factor B
• The response variable, X, is the average number of problems
completed correctly over two 90-second sessions

• Example of adding one-digit numbers in an across array: 2+5+1=?
• Example of adding two-digit numbers in an down array:

> head(data3) ABX 1 down 1 19.5 2 down 1 18.5 3 down 1 32.0 4 down 1 21.5 5 down 1 28.5 6 down 1 33.0
> table(data3[, 1:2])
123 18 18 18 across 18 18 18

> model3 <- lm(X ~ factor(A) * factor(B), data = data3) > anova(model3)
Analysis of Variance Table
Response: X
Df Sum Sq Mean Sq F value Pr(>F)
factor(A) 1 48.7 48.7 2.8849 0.09246 .
factor(B) 2 8022.7 4011.4 237.7776 < 2e-16 *** factor(A):factor(B) 2 185.9 93.0 5.5103 0.00534 ** Residuals 102 1720.8 16.9 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Note the use of ‘*’ in the model formula. The interaction is significant at a 5% level (or even at 1%). Interaction plot with(data3, interaction.plot(B, A, X, col = "blue")) down across 5 10 15 20 25 Beyond the F-test • We have rejected the null. . . now what? • This is often only the beginning of a statistical analysis of this type • Will be interested in more detailed inferences, e.g. CIs/tests about individual parameters • You know enough to be able to work some of this out. . . • . . . and later subjects will go into this in more detail (e.g. MAST30025) Analysis of variance (ANOVA) Introduction One-way ANOVA Two-way ANOVA Two-way ANOVA with interaction Hypothesis testing in regression Analysis of variance approach Likelihood ratio tests Recap of simple linear regression • Y a response variable, e.g. student’s grade in first-year calculus • x a predictor variable, e.g. student’s high school mathematics mark • Data: pairs (x1,y1),...,(xn,yn) • Linear regression model: Y i = α + β ( x i − x ̄ ) + ε i where εi ∼ N(0, σ2) is a random error • Note: α here plays the same role as α0 from Module 5. We have dropped the ‘0’ subscript for convenience, and also to avoid confusion with its use to denote null hypotheses. • The MLE (and OLS) estimators are: 􏰂ni=1 Yi(xi − x ̄) αˆ=Y, β= 􏰂ni=1(xi−x ̄)2 σˆ2 = n − 2 [Yi − αˆ − βˆ(xi − x ̄)]2 • We also derived: 􏰌 σ2􏰍 αˆ ∼ N α , n ˆ 􏰌 σ2 􏰍 β∼N β,􏰂ni=1(xi−x ̄)2 2 􏰂n 􏰔Y −αˆ−βˆ(x −x ̄)􏰕2 (n−2)σˆ = i=1 i i • From these we obtain, Tα = σˆ/√n ∼ tn−2 αˆ − α βˆ − β Tβ = σˆ/􏰄􏰂ni=1(xi − x ̄)2 ∼ tn−2 • We used these previously to construct confidence intervals • We can also use them to construct hypothesis tests • For example, to test H0 : β = β0 versus H1 : β ̸= β0 (orβ>β0 orβ<β0),weuseTβ astheteststatistic Example: testing the slope parameter (β) • Data: 10 pairs of scores on a preliminary test and a final exam • Estimates: αˆ = 81.3, βˆ = 0.742, σˆ2 = 27.21 • Test H0: β = 0 versus H1: β ̸= 0 with a 1% significance level • Reject H0 if: |Tβ| 􏰁 3.36 (0.995 quantile of t8) • For the observed data, tβ = 􏰄27.21/756.1 = 3.91 so we reject H0, concluding there is sufficient evidence that the slope differs from zero. Note regarding the intercept parameter (α) • Software packages (such as R) will typically fit the model: Yi = α + βxi + εi • This is equivalent to Y i = α ∗ + β ( x i − x ̄ ) + ε i • The formulation Yi = α∗ + β(x − x ̄) + ε is easier to examine • We saw that αˆ∗ = Y ̄ , and αˆ = Y ̄ − βˆx ̄ • αˆ or αˆ∗ are rarely of direct interest 60 of 81 where α = α∗ − βx ̄ theoretically. Use R to fit the regression model for the slope example: > m1 <- lm(final_exam ~ prelim_test) > summary(m1)
lm(formula = final_exam ~ prelim_test)
Residuals:
Min 1Q Median 3Q Max
-6.883 -3.264 -0.530 3.438 8.470

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.6147 13.0622 2.344 0.04714 *
prelim_test 0.7421 0.1897 3.912 0.00447 **
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 5.217 on 8 degrees of freedom
Multiple R-Squared: 0.6567, Adjusted R-squared: 0.6137
F-statistic: 15.3 on 1 and 8 DF, p-value: 0.004471
The t-value and the p-value are for testing H0: α = 0 and H0: β = 0 respectively.

Interpreting the R output
• Usually most interested in testing H0 : β = 0 versus H1 : β ̸= 0
• If we reject H0 then we conclude there is sufficient evidence of (at
least) a linear relationship between the mean response and x
• In the example,
t = 0.7421 = 3.912 0.1897
• This test statistic has a t-distribution with 10 − 2 = 8 degrees of freedom, and the associated p-value is 0.00447 < 0.05 so at the 5% level of significance we reject H0 • It is also possible to represent this test using an ANOVA table Deriving the variance decomposition formula • Independentpairs(x1,Y1),...,(xn,Yn) • Parameter estimates, 􏰂ni=1 Yi(xi − x ̄) αˆ=Y, β= 􏰂ni=1(xi−x ̄)2 • Fitted value (estimated mean), Yˆ i = Y ̄ + βˆ ( x i − x ̄ ) • Do the ‘add and subtract’ trick again: 􏰃(Yi −Y ̄)2 =􏰃(Yi −Yˆi +Yˆi −Y ̄)2 i=1 i=1 nn = 􏰃 ( Y i − Yˆ i ) 2 + 􏰃 ( Yˆ i − Y ̄ ) 2 +2􏰃(Yi −Yˆi)(Yˆi −Y ̄) • Deal with the cross-product term, 􏰃(Yi − Yˆi)(Yˆi − Y ̄ ) = 􏰃 􏰔Yi − Y ̄ − βˆ(xi − x ̄)􏰕 βˆ(xi − x ̄) = βˆ 􏰃 􏰔 Y i − Y ̄ − βˆ ( x i − x ̄ ) 􏰕 ( x i − x ̄ ) i=1 􏰠nn􏰡 = βˆ 􏰃(Yi − Y ̄)(xi − x ̄) − βˆ􏰃(xi − x ̄)2 i=1 i=1 􏰠nn􏰡 =βˆ 􏰃Yi(xi−x ̄)−βˆ􏰃(xi−x ̄)2 i=1 i=1 • That gives us, 􏰃 ( Y i − Y ̄ ) 2 = 􏰃 ( Y i − Yˆ i ) 2 + 􏰃 ( Yˆ i − Y ̄ ) 2 i=1 i=1 i=1 • We can write this as follows, SS(T O) = SS(E) + SS(R) where SS(R) is the regression SS or model SS • The regression SS quantifies the variation due to the straight line • The error SS quantifies the variation around 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com