Analysis of variance
(Module 8)
Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022
1 Analysis of variance (ANOVA) 1
Copyright By PowCoder代写 加微信 powcoder
1.1 Introduction…………………………………………… 1
1.2 One-wayANOVA………………………………………… 2
1.3 Two-wayANOVA ……………………………………….. 7
1.4 Two-wayANOVAwithinteraction ……………………………….. 10
2 Hypothesis testing in regression 13 2.1 Analysisofvarianceapproach………………………………….. 15
3 Likelihood ratio tests 16
Aims of this module
• Introduce the analysis of variance technique, which builds upon the variance decomposition ideas in previous modules.
• Revisit linear regression and apply the ideas of hypothesis testing and analysis of variance.
• Discuss ways to derive optimal hypothesis tests.
• Analysis of variance (ANOVA). Comparisons of more than two groups
• Regression. Hypothesis testing for simple linear regression
• Likelihood ratio tests. A method for deriving the best test for a given problem
1 Analysis of variance (ANOVA) 1.1 Introduction
Analysis of variance: introduction
• Initial aim: compare the means of more than two populations • Broader and more advanced aims:
– Explore components of variation
– Evaluate the fit a (general) linear model
• Formulated as hypothesis tests
• Referred to as analysis of variance, or ANOVA for short
• Involves comparing different summaries of variation
• Related to the ‘analysis of variance’ and ‘variance decomposition’ formulae we derived previously
Example: large variation between groups
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●● ●●● ● ●● ● ●●
●● ●●● ● |
●● ● ● ●●● ● ● ●●●●
● ● ●●●●● ● ● ●
Example: smaller variation between groups
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●● ●●● ● ●● ●● |●●● ●● ● ●● ● ●● ●
● ● ● | ● ● ● ● ● ● ●● ● |● ●●● ●
●● ●●|●●●●●
1.2 One-way ANOVA
ANOVA: setting it up
• We have random samples from k populations, each having a normal distribution
• We sample ni iid observations from the ith population, which has mean μi
• All populations assumed to have the same variance, σ2
• Question of interest: do the populations all have the same mean?
• Hypotheses:
(H ̄0 means ‘not H0’)
H0:μ1 =μ2 =···=μk =μ versus H1:H ̄0
• This model is known as a one-way ANOVA, or single-factor ANOVA Notation
Population
N(μ1, σ2) N(μ2, σ2)
. N(μk, σ2)
Sample Statistics
X11, X12, . . . , X1n1 X21, X22, . . . , X2n2
Xk1, Xk2, . . . , Xknk Overall
X ̄1· S12 X ̄2· S2
. . X ̄k· Sk2
Sum of squares (definitions)
• The treatment SS, or between groups SS, is:
n = n1 + · · · + nk (total sample size)
X ̄i· = X ̄·· =
• We now define statistics each called a sum of squares (SS)
• The total SS is:
k ni SS(TO)=(Xij −X ̄··)2 i=1 j=1
Xij (group means)
1 k ni 1 k
Xij = niX ̄i· (grand mean) nn
i=1 j=1 i=1
SS(T ) = (X ̄i· − X ̄··)2 = ni(X ̄i· − X ̄··)2
k i=1 j=1 i=1
• The error SS, or within groups SS, is:
SS(E) = (Xij − X ̄i·)2 = (ni − 1)Si2
Analysis of variance decomposition
• It turns out that:
• This is similar to the analysis of variance formulae we derived earlier, in simpler scenarios (iid model, regression
• We will use this relationship as a basis to derive a hypothesis test
• Let’s first prove the relationship. . .
• Start with the ‘add and subtract’ trick:
k ni SS(TO)=(Xij −X ̄··)2 i=1 j=1
=(Xij −X ̄i· +X ̄i· −X ̄··)2 i=1 j=1
= (Xij − X ̄i·)2 + (X ̄i· − X ̄··)2
i=1 j=1 i=1 j=1 k ni
+ 2 (Xij − X ̄i·)(X ̄i· − X ̄··) i=1 j=1
= SS(E) + SS(T) + CP
k i=1 j=1 i=1
SS(TO) = SS(T) + SS(E)
• The cross-product term is:
• Thus, we have:
Sampling distribution of SS(E)
CP = 2 (Xij − X ̄i·)(X ̄i· − X ̄··)
=2(X ̄i· −X ̄··)(Xij −X ̄i·) i=1 j=1
= 2 (X ̄i· − X ̄··)(niX ̄i· − niX ̄i·)
SS(TO) = SS(T) + SS(E)
• The sample variance from the ith group, Si2, is an unbiased estimator of σ2 and we know that (ni − 1)Si2/σ2 ∼ χ 2n i − 1
• This also gives us an unbiased pooled estimator of σ2,
σˆ2 = SS(E)
• These results are true irrespective of whether H0 is true or not
Null sampling distribution of SS(TO)
• If we assume H0, we can derive simple expressions for the sampling distributions of the other sums of squares
• The combined data would be a sample of size n from N(μ, σ2). Hence SS(T O)/(n − 1) is an unbiased estimator
• The samples from each group are independent, so we can usefully combine them, k(ni−1)Si2 SS(E) 2
σ2 =σ2∼χn−k • Note that: (n1 − 1) + (n2 − 1) + · · · + (nk − 1) = n − k
S S ( T O ) ∼ χ 2n − 1 σ2
• (Can think of this as a sample of sample means, and then think about what its variance estimator is)
• It is possible to show that (proof not shown):
Null sampling distribution of SS(T ) • Under H0, we have X ̄i· ∼ N(μ, σ2 )
ni • X ̄1·,X ̄2·,…,X ̄k· areindependent
k ni(X ̄i· −X ̄··)2
= σ2 ∼χk−1
i=1 and that this is independent of SS(E)
Null sampling distributions
In summary, under H0:
SS(TO) = SS(E) + SS(T) σ2 σ2 σ2
SS(TO) ∼ χ2n−1, SS(E) ∼ χ2n−k, SS(T) ∼ χ2k−1, σ2 σ2 σ2
H0 is true
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
H1 is true
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
SS(T) under H1
• What happens if H1 is true?
• The population means differ, which will make SS(T) larger • Let’s make this precise. . .
SS(E) and SS(T) are independent
●● ● ● ●●●●
● ●● ●●● ●●
●● ●●●●●●● ●
●● ●●● ● ●● ● ●●
●● ●●● ● |
● ●●● ●●●●
● ● ● ● ●| ● ● ●
• Let μ ̄ = n−1 ki=1 niμi, and then,
E[SS(T)]=En(X ̄−X ̄)2 =EnX ̄2−nX ̄2
=n E(X ̄2)−nE(X ̄2) ii· ··
= ni var(X ̄i·) + E(X ̄i·)2 − n var(X ̄··) + E(X ̄··)2 i=1
k σ2 σ2 = nin+μ2i−nn+μ ̄2
=(k−1)σ2 +ni(μi −μ ̄)2 i=1
• Under H0 the second term is zero and we have,
E(SS(T)) = σ2
• Otherwise (under H1), the second term is positive and gives,
i i· ·· i=1
i i· ·· i=1
• In contrast, we always have,
F-test statistic
E(SS(T)) > σ2 k−1
E(SS(E)) = σ2 n−k
• This movitates using the following as our test statistic:
F = SS(T)/(k−1)
SS(E)/(n − k)
• Under H0, we have F ∼ Fk−1,n−k, since it is the ratio of independent χ2 random variables • Under H1, the numerator will tend to be larger
• Therefore, reject H0 if F > c
• This is known as an F-test
ANOVA table
The test quantities are often summarised using an ANOVA table:
Treatment Error
MS MS(T)= SS(T)
MS(T) MS(E)
n−1 SS(TO)
k−1 MS(E) = SS(E)
• MS = ‘Mean square’
• σˆ2 = MS(E) is an unbiased estimator
Example (one-way ANOVA)
Force required to pull out window studs in 5 positions on a car window.
> head(data1) Position Force 1 192 2 190 3 187 4 1105 5 186 6 183
> table(data1$Position)
12345 77777
120 140 160
> model1 <- lm(Force ~ factor(Position), data = data1)
> anova(model1)
Analysis of Variance Table
Response: Force
Df Sum Sq Mean Sq F value Pr(>F)
factor(Position) 4 16672.1 4168.0 44.202 3.664e-12 ***
Residuals 30 2828.9 94.3
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
• Need to use factor() to denote categorical variables • R doesn’t provide a ‘Total’ row, but we don’t need it • Residuals is the ‘Error’ row
• Pr(>F) is the p-value for the F-test
We conclude that the mean force required to pull out the window studs varies between the 5 positions on the car window (e.g. p-value < 0.01)
This was obvious from the boxplots: positions 1 & 2 are quite different from 3, 4 & 5
1.3 Two-way ANOVA
Two factors
• In one-way ANOVA, the observations were partitioned into k groups
• In other words, they were defined by a single categorical variable (‘factor’) • What if we had two such variables?
• We can extend the procedure to give two-way ANOVA, or two-factor ANOVA
• For example, the fuel consumption of a car may depend on type of petrol and the brand of tyres
Two-way ANOVA: setting it up
• Factor 1 has a levels, Factor 2 has b levels
• Suppose we have exactly one observation per factor combination
• Observe Xij with factor 1 at level i and factor 2 at level j
• Gives a total of n = ab observations
• Assume Xij ∼ N(μij,σ2), i = 1,...,a, j = 1,...,b, and that these are independent • Consider the model:
μij = μ + αi + βj
with αi =0, βj =0
• μ is an overall effect, αi is the effect of the ith row and
• Forexample,a=4andb=4, 1234
βj the effect of the jth column.
1 μ+α1 +β1
2 μ+α2 +β1
3 μ+α3 +β1
4 μ+α4 +β1
μ+α1 +β2 μ+α2 +β2 μ+α3 +β2 μ+α4 +β2
μ+α1 +β3 μ+α2 +β3 μ+α3 +β3 μ+α4 +β3
μ+α1 +β4 μ+α2 +β4 μ+α3 +β4 μ+α4 +β4
• WeareusuallyinterestedinH0A:α1 =α2 =···=αa =0orH0B:β1 =β2 =···=βb =0 • Let
• Arguing as before,
Xij, X ̄i· = Xij, X ̄·j = Xij ab b a
i=1 j=1 j=1 i=1
ab SS(TO)=(Xij −X ̄··)2 i=1 j=1
=(X ̄i· −X ̄··)+(X ̄·j −X ̄··)
+ (Xij − X ̄i· − X ̄·j + X ̄··)2
= b (X ̄i· − X ̄··)2 + a (X ̄·j − X ̄··)2
+(Xij −X ̄i· −X ̄·j +X ̄··)2
ab i=1 j=1
= SS(A) + SS(B) + SS(E)
•Ifbothα1 =···=αa =0andβ1 =···=βb =0,thenwehaveSS(A)/σ2 ∼χ2a−1,SS(B)/σ2 ∼χ2b−1 and
SS(E)/σ2 ∼ χ2(a−1)(b−1) and these variables are independent (proof not shown) • RejectH0A:α1 =···=αa =0atsignificancelevelαif:
FA = SS(A)/(a−1) >c SS(E)/((a − 1)(b − 1))
where c is the 1 − α quantile of Fa−1,(a−1)(b−1)
• RejectH0B:β1 =···=βb =0atsignificancelevelαif:
FB = SS(B)/(b−1) >c
SS(E)/((a − 1)(b − 1)) where c is the 1 − α quantile of Fb−1,(a−1)(b−1)
ANOVA table
Factor A Factor B Error
(a − 1)(b − 1)
SS(B) SS(E)
MS MS(A) = SS(A)
MS(A) MS(E) MS(B) MS(E)
a−1 MS(B) = SS(B)
b−1 MS(E) = SS(E)
Total ab−1 SS(TO) Example (two-way ANOVA)
(a−1)(b−1)
Data on fuel consumption for three types of car (A) and four types of fuel (B).
> head(data2)
Car Fuel Consumption
111 16 212 18 313 21 414 21 521 14 622 15
> model2 <- lm(Consumption ~ factor(Car) + factor(Fuel),
+ data = data2)
> anova(model2)
Analysis of Variance Table
Response: Consumption
Df Sum Sq Mean Sq F value Pr(>F)
factor(Car) 2
factor(Fuel) 3
Residuals 6
24 12.0000 18 0.002915 **
30 10.0000 15 0.003401 **
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
From this we conclude there is a clear difference in fuel consumption between cars (we reject H0A : α1 = α2 = α3) and also between fuels (we reject H0B : β1 = β2 = β3 = β4).
Consumption
14 15 16 17 18 19 20 21
1.4 Two-way ANOVA with interaction
Interaction terms
• In the previous example we assumed an additive model:
μij = μ + αi + βj
• This assumes, for example, that the relative effect of petrol 1 is the same for all cars.
• If it is not true, then there is a statistical interaction (or simply an interaction) between the factors
• A more general model, which includes interactions, is:
μij = μ + αi + βj + γij
where γij is the interaction term associated with combination (i,j).
• In addition to our previous assumptions, we also impose:
γij =0, and γij =0 i=1 j=1
• The terms αi and βj are called main effects
• When written out as a table they are also often referred to as the row effects and column effects respectively
• Writing this out as a table:
1 μ+α1 +β1 +γ11
2 μ+α2 +β1 +γ21
3 μ+α3 +β1 +γ31
4 μ+α4 +β1 +γ41
μ+α1 +β2 +γ12 μ+α2 +β2 +γ22 μ+α3 +β2 +γ32 μ+α4 +β2 +γ42
μ+α1 +β3 +γ13 μ+α2 +β3 +γ23 μ+α3 +β3 +γ33 μ+α4 +β3 +γ43
μ+α1 +β4 +γ14 μ+α2 +β4 +γ24 μ+α3 +β4 +γ34 μ+α4 +β4 +γ44
• We are now interested in testing whether:
– the row effects are zero
– the column effects are zero
– the interactions are zero (do this first!)
• To make inferences about the interactions we need more than one observation per cell
• Let Xijk, i = 1,…,a, j = 1,…,b, k = 1,…,c be the kth observation for combination (i,j)
X ̄i·· = X ̄·j·= X ̄··· =
j=1 k=1 1ac
i=1 k=1 1abc
Xijk i=1 j=1 k=1
• and as before
Test statistics
abc SS(TO)=Xijk −X ̄···2
i=1 j=1 k=1 ab
= bc (X ̄i·· − X ̄···)2 + ac (X ̄·j· − X ̄···)2
• Familiar arguments show that to test
H0AB : γij = 0, i = 1,…,a, j = 1,…,b
we may use the statistic
F = SS(AB)/[(a − 1)(b − 1)] SS(E)/[ab(c − 1)]
which has a F distribution with (a − 1)(b − 1) and ab(c − 1) degrees of freedom.
+ c (X ̄ij· − X ̄i·· − X ̄·j· + X ̄···)2
i=1 j=1 abc
+ (Xijk − X ̄ij·)2 i=1 j=1 k=1
= SS(A) + SS(B) + SS(AB) + SS(E)
we may use the statistic
H0A:αi =0, i=1,…,a F = SS(A)/[(a − 1)]
SS(E)/[ab(c − 1)]
which has a F distribution with (a − 1) and ab(c − 1) degrees of freedom.
we may use the statistic
H0B:βj =0, j=1,…,b F = SS(B)/[(b − 1)]
SS(E)/[ab(c − 1)]
which has a F distribution with (b − 1) and ab(c − 1) degrees of freedom.
ANOVA table
Factor A Factor B Factor AB Error
(a − 1)(b − 1) ab(c−1)
SS(B) SS(AB) SS(E)
MS MS(A) = SS(A)
MS(A) MS(E) MS(B) MS(E) MS(AB) MS(E)
a−1 MS(B) = SS(B)
b−1 MS(AB) = SS(AB)
(a−1)(b−1) MS(E)= SS(E)
Total abc−1 SS(TO)
Example (two-way ANOVA with interaction)
• Six groups of 18 people
• Each person takes an arithmetic test: the task is to add three numbers together
• The numbers are presented either in a down array or an across array; this defines 2 levels of factor A • The numbers have either one, two or three digits; this defines 3 levels of factor B
• The response variable, X, is the average number of problems completed correctly over two 90-second sessions • Example of adding one-digit numbers in an across array:
• Example of adding two-digit numbers in an down array:
123 18 18 18 across 18 18 18
> model3 <- lm(X ~ factor(A) * factor(B), data = data3)
> anova(model3)
Analysis of Variance Table
Response: X
Df Sum Sq Mean Sq F value Pr(>F)
factor(A) 1 48.7 48.7 2.8849 0.09246 .
factor(B) 2 8022.7 4011.4 237.7776 < 2e-16 ***
factor(A):factor(B) 2 185.9 93.0 5.5103 0.00534 **
Residuals 102 1720.8 16.9
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Note the use of ‘*’ in the model formula.
The interaction is significant at a 5% level (or even at 1%).
Interaction plot
with(data3, interaction.plot(B, A, X, col = "blue"))
> head(data3) ABX 1 down 1 19.5 2 down 1 18.5 3 down 1 32.0 4 down 1 21.5 5 down 1 28.5 6 down 1 33.0
> table(data3[, 1:2])
down across
5 10 15 20 25
Beyond the F-test
• We have rejected the null. . . now what?
• This is often only the beginning of a statistical analysis of this type of data
• Will be interested in more detailed inferences, e.g. CIs/tests about individual parameters • You know enough to be able to work some of this out. . .
• . . . and later subjects will go into this in more detail (e.g. MAST30025)
Hypothesis testing in regression
Recap of simple linear regression
• Y a response variable, e.g. student’s grade in first-year calculus
• x a predictor variable, e.g. student’s high school mathematics mark
• Data: pairs (x1,y1),…,(xn,yn)
• Linear regression model:
Y i = α + β ( x i − x ̄ ) + ε i
• Note: α here plays the same role as α0 from Module 5. We have dropped the ‘0’ subscript for convenience, and
where εi ∼ N(0, σ2) is a random error
also to avoid confusion with its use to denote null hypotheses.
• The MLE (and OLS) estimators are:
• We also derived:
• From these we obtain,
• We used these previously to construct confidence intervals
• We can also use them to construct hypothesis tests
• Forexample,totestH0:β=β0 versusH1:β̸=β0 (orβ>β0 orβ<β0),weuseTβ astheteststatistic
ni=1 Yi(xi − x ̄) αˆ=Y, β= ni=1(xi−x ̄)2
αˆ ∼ N α , n
β∼N β,ni=1(xi−x ̄)2
n Y −αˆ−βˆ(x −x ̄)2
σˆ2 = n − 2
[Yi − αˆ − βˆ(xi − x ̄)]2
(n−2)σˆ i=1 i i 2
σ2= σ2 ∼χn−2 αˆ − α
Tα = σˆ/√n ∼ tn−2 βˆ − β
Tβ = σˆ/ni=1(xi − x ̄)2 ∼ tn−2
Example: testing the slope parameter (β)
• Data: 10 pairs of scores on a preliminary test and a final exam
• Estimates: αˆ = 81.3, βˆ = 0.742, σˆ2 = 27.21
• TestH0:β=0versusH1:β̸=0witha1%significancelevel
• Reject H0 if:
• For the observed data,
|Tβ| 3.36 (0.995 quantile of t8)
tβ = 27.21/756.1 = 3.91
so we reject H0, concluding there is sufficient evidence that the slope differs from zero. Note regarding the intercept parameter (α)
• Software packages (such as R) will typically fit the model:
Yi = α + βxi + εi
• This is equivalent to
Yi =α∗ +β(xi −x ̄)+εi
• The formulation Yi = α∗ + β(x − x ̄) + ε is easier to examine theoretically.
where α = α∗ − βx ̄
• We saw that
• αˆ or αˆ∗ are rarely of direct interest Using R
αˆ∗=Y ̄, and αˆ=Y ̄−βˆx ̄
Use R to fit the regression model for the slope example:
> m1 <- lm(final_exam ~ prelim_test)
> summary(m1)
lm(formula = final_exam ~ prelim_test)
Residuals:
Min 1Q Median 3Q Max
-6.883 -3.264 -0.530 3.438 8.470
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.6147 13.0622 2.344 0.04714 *
prelim_test 0.7421 0.1897 3.912 0.00447 **
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 5.217 on 8 degrees of freedom
Multiple R-Squared: 0.6567, Adjusted R-squared: 0.6137
F-statistic: 15.3 on 1 and 8 DF, p-value: 0.004471
The t-value and the p-value are for testing H0 : α = 0 and H0 : β = 0 respectively.
Interpreting the R output
• Usually most interested in testing H0 : β = 0 versus H1 : β ̸= 0
• If we reject H0 then we conclude there is sufficient evidence of (at least) a linear relationship between the mean
response and x
• In the example,
t = 0.7421 = 3.912 0.1897
• This test statistic has a t-distribution with 10−2 = 8 degrees of freedom, and the associated p-value is 0.00447 < 0.05 so at the 5% level of significance we reject H0
• It is also possible to represent this test using an ANOVA table
2.1 Analysis of variance approach
Deriving the variance decomposition formula
• Independent pairs (x1, Y1), . . . , (xn, Yn)
• Parameter estimates,
• Fitted value (estimated mean),
• Do the ‘add and subtract’ trick again:
ni=1 Yi(xi − x ̄) αˆ=Y, β= ni=1(xi−x ̄)2
Yˆ i = Y ̄ + βˆ ( x i − x ̄ )
• Deal with the cross-product term,
• That gives us,
• We can write this as follows,
where SS(R) is the regression SS or model SS
• The regression SS quantifies the variation due to the straight line 15
(Yi −Y ̄)2 =(Yi −Yˆi +Yˆi −Y ̄)2
i=1 i=1 nn
= ( Y i − Yˆ i ) 2 + ( Yˆ i − Y ̄ ) 2
i=1 +2(Yi −Yˆi)(Yˆi −Y ̄)
(Yi −Yˆi)(Yˆi −Y ̄)=Yi −Y ̄ −βˆ(xi −x ̄)βˆ(xi −x ̄)
=βˆYi −Y ̄ −βˆ(xi −x ̄)(xi −x ̄) i=1
nn = βˆ (Yi − Y ̄)(xi − x ̄) − βˆ(xi − x ̄)2
i=1 i=1 nn
=βˆ Yi(xi−x ̄)−βˆ(xi−x ̄)2 i=1 i=1
(Yi − Y ̄)2 = (Yi − Yˆi)2 + (Yˆi − Y ̄)2 i=1 i=1 i=1
SS(T O) = SS(E) + SS(R)
• The error SS quantifies the variation around the straight line • To complete the specification,
• Then we have the test statistic,
ANOVA table
M S ( R ) =
M S ( E ) = n − 2 = n − 2
F = MS(R) ∼ F1,n−2
( Y i − Yˆ i ) 2 = σˆ 2 ( Yˆ i − Y ̄ ) 2
i=1 SS(R) n
> anova(m1)
Analysis of Variance Table
Model Error
MS(R) = SS(R) 1
MS(E) = SS(E) n−2
MS(R) MS(E)
Total n−1 SS(TO)
Response: final_exam
Df Sum Sq Mean Sq F value
prelim_test 1 416.39 416.39 15.301 0.004471 **
Residuals 8 217.71 27.21
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
• The F-statistic tests the ‘significance of the regression’ • Thatis,H0:β=0versusH1:β̸=0
3 Likelihood ratio tests
Is there a ‘best’ test?
• We have examined a variety of commonly used te
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com