Analysis of variance
(Module 8)
Statistics (MAST20005) & Elements of Statistics (MAST90058)
School of Mathematics and Statistics University of Melbourne
Copyright By PowCoder代写 加微信 powcoder
Semester 2, 2022
Aims of this module
• Introduce the analysis of variance technique, which builds upon the variance decomposition ideas in previous modules.
• Revisit linear regression and apply the ideas of hypothesis testing and analysis of variance.
• Discuss ways to derive optimal hypothesis tests.
• Analysis of variance (ANOVA). Comparisons of more than two groups
• Regression.
Hypothesis testing for simple linear regression
• Likelihood ratio tests.
A method for deriving the best test for a given problem
Analysis of variance (ANOVA) Introduction
One-way ANOVA
Two-way ANOVA
Two-way ANOVA with interaction
Hypothesis testing in regression Analysis of variance approach
Likelihood ratio tests
Analysis of variance: introduction
• Initial aim: compare the means of more than two populations
• Broader and more advanced aims:
◦ Explore components of variation
◦ Evaluate the fit a (general) linear model
• Formulated as hypothesis tests
• Referred to as analysis of variance, or ANOVA for short
• Involves comparing different summaries of variation
• Related to the ‘analysis of variance’ and ‘variance decomposition’ formulae we derived previously
Example: large variation between groups
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●● ●●●● ● ●● ●● ●
● ● ● ● ●| ● ● ●
Example: smaller variation between groups
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●● ●●● ● ●● ●● |●●● ●● ● ● ●●●●● ● ● ● ●●● ●●●●●
● ● ● | ● ● ● ● ● ● ●● ● |● ●●● ●
●● ●●|●●●●●
● ●●●|● ●●
ANOVA: setting it up
• We have random samples from k populations, each having a normal distribution
• We sample ni iid observations from the ith population, which has mean μi
• All populations assumed to have the same variance, σ2
• Question of interest: do the populations all have the same mean?
• Hypotheses:
H0:μ1 =μ2 =···=μk =μ versus H1:H ̄0 (H ̄0 means ‘not H0’)
• This model is known as a one-way ANOVA, or single-factor ANOVA
Population Sample Statistics
N(μ1,σ2) X11,X12,…,X1n1 X ̄1· S12 N(μ2, σ2) X21, X22, . . . , X2n2 X ̄2· S2
N(μk,σ2) Xk1,Xk2,…,Xknk X ̄k· Sk2 Overall X ̄··
n = n1 + · · · + nk (total sample size)
X ̄i· = X ̄·· =
Xij (group means)
Xij = n i=1 j=1
(grand mean)
Sum of squares (definitions)
• We now define statistics each called a sum of squares (SS) • The total SS is:
k ni SS(TO)=(Xij −X ̄··)2 i=1 j=1
• The treatment SS, or between groups SS, is:
SS(T ) = (X ̄i· − X ̄··)2 = ni(X ̄i· − X ̄··)2
i=1 j=1 i=1 • The error SS, or within groups SS, is:
SS(E) = (Xij − X ̄i·)2 = (ni − 1)Si2
k i=1 j=1 i=1
Analysis of variance decomposition
• It turns out that:
SS(TO) = SS(T) + SS(E)
• This is similar to the analysis of variance formulae we derived
earlier, in simpler scenarios (iid model, regression model)
• We will use this relationship as a basis to derive a hypothesis test
• Let’s first prove the relationship. . .
• Start with the ‘add and subtract’ trick:
k ni SS(TO)=(Xij −X ̄··)2 i=1 j=1
= (Xij − X ̄i· + X ̄i· − X ̄··)2 i=1 j=1
= (Xij − X ̄i·)2 + (X ̄i· − X ̄··)2
i=1 j=1 i=1 j=1 k ni
+ 2 (Xij − X ̄i·)(X ̄i· − X ̄··) i=1 j=1
= SS(E) + SS(T) + CP
• The cross-product term is:
CP = 2 (Xij − X ̄i·)(X ̄i· − X ̄··)
• Thus, we have:
= 2 (X ̄i· − X ̄··) (Xij − X ̄i·) i=1 j=1
= 2 (X ̄i· − X ̄··)(niX ̄i· − niX ̄i·)
SS(TO) = SS(T) + SS(E)
Sampling distribution of SS(E)
• The sample variance from the ith group, Si2, is an unbiased
estimator of σ2 and we know that (ni − 1)Si2/σ2 ∼ χ2ni−1
• The samples from each group are independent, so we can usefully
combine them,
k(ni−1)Si2 SS(E) 2
i=1 σ2 = σ2 ∼ χn−k
• Notethat:(n1−1)+(n2−1)+···+(nk−1)=n−k
• This also gives us an unbiased pooled estimator of σ2,
σˆ2 = SS(E) n−k
• These results are true irrespective of whether H0 is true or not 15 of 81
Null sampling distribution of SS(TO)
• If we assume H0, we can derive simple expressions for the sampling
distributions of the other sums of squares
• The combined data would be a sample of size n from N(μ,σ2).
Hence SS(T O)/(n − 1) is an unbiased estimator of σ2 and
S S ( T O ) ∼ χ 2n − 1 σ2
Null sampling distribution of SS(T ) • Under H0, we have X ̄i· ∼ N(μ, σ2 )
ni • X ̄1·,X ̄2·,…,X ̄k· areindependent
• (Can think of this as a sample of sample means, and then think about what its variance estimator is)
• It is possible to show that (proof not shown):
k ni(X ̄i· −X ̄··)2 SS(T) 2
i=1 σ2 = σ2 ∼ χk−1 and that this is independent of SS(E)
Null sampling distributions
In summary, under H0:
SS(TO) = SS(E) + SS(T)
SS(TO) ∼ χ2n−1, SS(E) ∼ χ2n−k, SS(T) ∼ χ2k−1,
σ2 σ2 σ2 SS(E) and SS(T) are independent
H0 is true
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●●●●●● ●●● ● ●
H1 is true
All groups
Group 1 Group 2 Group 3 Group 4 Group 5
●● ●●●● ● ●● ●● ●
● ● ● ● ●| ● ● ●
SS(T) under H1
• What happens if H1 is true?
• The population means differ, which will make SS(T) larger • Let’s make this precise. . .
• Let μ ̄ = n−1 ki=1 niμi, and then, kk
=n E(X ̄2)−nE(X ̄2) ii· ··
E[SS(T)]=E n(X ̄ −X ̄ )2 =E nX ̄2−nX ̄2
i i· ·· i=1
i i· ·· i=1
= ni var(X ̄i·) + E(X ̄i·)2 − n var(X ̄··) + E(X ̄··)2 i=1
k σ2 2 σ2 2 = nin+μi−nn+μ ̄
= (k − 1)σ2 + ni(μi − μ ̄)2 i=1
k E[SS(T)]=(k−1)σ2 +ni(μi −μ ̄)2
• Under H0 the second term is zero and we have,
E(SS(T)) = σ2 k−1
• Otherwise (under H1), the second term is positive and gives, E(SS(T)) > σ2
k−1 • In contrast, we always have,
E(SS(E)) = σ2 n−k
F-test statistic
• This movitates using the following as our test statistic: F = SS(T)/(k−1)
SS(E)/(n − k)
• Under H0, we have F ∼ Fk−1,n−k, since it is the ratio of
independent χ2 random variables
• Under H1, the numerator will tend to be larger
• Therefore, reject H0 if F > c
• This is known as an F-test
ANOVA table
The test quantities are often summarised using an ANOVA table:
Treatment Error
k − 1 n − k
SS(T) SS(E)
MS(T) = SS(T) k−1
MS(E) = SS(E) n−k
MS(T) MS(E)
n−1 SS(TO)
• MS = ‘Mean square’
• σˆ2 = MS(E) is an unbiased estimator
Example (one-way ANOVA)
Force required to pull out window studs in 5 positions on a car window.
120 140 160
> head(data1) Position Force 1 192 2 190 3 187 4 1105 5 186 6 183
> table(data1$Position)
12345 77777
> model1 <- lm(Force ~ factor(Position), data = data1)
> anova(model1)
Analysis of Variance Table
Response: Force
Df Sum Sq Mean Sq F value Pr(>F)
factor(Position) 4 16672.1 4168.0 44.202 3.664e-12 ***
Residuals 30 2828.9 94.3
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
• Need to use factor() to denote categorical variables
• R doesn’t provide a ‘Total’ row, but we don’t need it • Residuals is the ‘Error’ row
• Pr(>F) is the p-value for the F-test
We conclude that the mean force required to pull out the window studs varies between the 5 positions on the car window (e.g. p-value < 0.01)
This was obvious from the boxplots: positions 1 & 2 are quite different from 3, 4 & 5
120 140 160
Two factors
• In one-way ANOVA, the observations were partitioned into k groups
• In other words, they were defined by a single categorical variable (‘factor’)
• What if we had two such variables?
• We can extend the procedure to give two-way ANOVA, or
two-factor ANOVA
• For example, the fuel consumption of a car may depend on type of petrol and the brand of tyres
Two-way ANOVA: setting it up
• Factor 1 has a levels, Factor 2 has b levels
• Suppose we have exactly one observation per factor combination
• Observe Xij with factor 1 at level i and factor 2 at level j
• Gives a total of n = ab observations
• Assume Xij ∼ N(μij,σ2), i = 1,...,a, j = 1,...,b, and that these are independent
• Consider the model:
μij = μ + αi + βj
with αi =0, βj =0
• μ is an overall effect, αi is the effect of the ith row and βj the effect of the jth column.
• Forexample,a=4andb=4, 1234
• WeareusuallyinterestedinH0A:α1 =α2 =···=αa =0or H0B:β1 =β2 =···=βb =0
μ+α1 +β1 μ+α2 +β1 μ+α3 +β1 μ+α4 +β1
μ+α1 +β2 μ+α2 +β2 μ+α3 +β2 μ+α4 +β2
μ+α1 +β3 μ+α2 +β3 μ+α3 +β3 μ+α4 +β3
μ+α1 +β4 μ+α2 +β4 μ+α3 +β4 μ+α4 +β4
X ̄·· = Xij, X ̄i· = Xij, X ̄·j = Xij
ab i=1 j=1 b j=1 a i=1
• Arguing as before,
ab SS(TO)=(Xij −X ̄··)2 i=1 j=1
= (X ̄i· − X ̄··) + (X ̄·j − X ̄··)
+ (Xij − X ̄i· − X ̄·j + X ̄··)2
= b (X ̄i· − X ̄··)2 + a (X ̄·j − X ̄··)2
+(Xij −X ̄i· −X ̄·j +X ̄··)2
= SS(A) + SS(B) + SS(E)
• Ifbothα1 =···=αa =0andβ1 =···=βb =0,thenwehave SS(A)/σ2 ∼ χ2a−1, SS(B)/σ2 ∼ χ2b−1 and
SS(E)/σ2 ∼ χ2(a−1)(b−1) and these variables are independent (proof not shown)
• RejectH0A:α1 =···=αa =0atsignificancelevelαif: FA = SS(A)/(a−1) >c
SS(E)/((a − 1)(b − 1)) where c is the 1 − α quantile of Fa−1,(a−1)(b−1)
• RejectH0B:β1 =···=βb =0atsignificancelevelαif: FB = SS(B)/(b−1) >c
SS(E)/((a − 1)(b − 1)) where c is the 1 − α quantile of Fb−1,(a−1)(b−1)
ANOVA table
Factor A Factor B Error
(a − 1)(b − 1)
SS(A) SS(B) SS(E)
MS(A) = SS(A) a−1
MS(B) = SS(B) b−1
MS(E) = SS(E) (a−1)(b−1)
MS(A) MS(E) MS(B) MS(E)
Total ab−1 SS(TO)
Example (two-way ANOVA)
Data on fuel consumption for three types of car (A) and four types of fuel (B).
> head(data2)
Car Fuel Consumption
111 16 212 18 313 21 414 21 521 14 622 15
Consumption
14 15 16 17 18 19 20 21
> model2 <- lm(Consumption ~ factor(Car) + factor(Fuel),
+ data = data2)
> anova(model2)
Analysis of Variance Table
Response: Consumption
Df Sum Sq Mean Sq F value Pr(>F)
factor(Car) 2
factor(Fuel) 3
Residuals 6
24 12.0000 18 0.002915 **
30 10.0000 15 0.003401 **
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
From this we conclude there is a clear difference in fuel consumption between cars (we reject H0A : α1 = α2 = α3) and also between fuels (werejectH0B:β1 =β2 =β3 =β4).
Interaction terms
• In the previous example we assumed an additive model: μij = μ + αi + βj
• This assumes, for example, that the relative effect of petrol 1 is the same for all cars.
• If it is not true, then there is a statistical interaction (or simply an interaction) between the factors
• A more general model, which includes interactions, is: μij = μ + αi + βj + γij
where γij is the interaction term associated with combination (i, j).
• In addition to our previous assumptions, we also impose:
γij =0, and γij =0
• The terms αi and βj are called main effects
• When written out as a table they are also often referred to as the row effects and column effects respectively
• Writing this out as a table: 12···
1 μ+α1 +β1 +γ11 2 μ+α2 +β1 +γ21 3 μ+α3 +β1 +γ31 4 μ+α4 +β1 +γ41
μ+α1 +β2 +γ12 ··· μ+α2 +β2 +γ22 ··· μ+α3 +β2 +γ32 ··· μ+α4 +β2 +γ42 ···
• We are now interested in testing whether: ◦ the row effects are zero
◦ the column effects are zero
◦ the interactions are zero (do this first!)
• To make inferences about the interactions we need more than one observation per cell
• Let Xijk, i = 1,…,a, j = 1,…,b, k = 1,…,c be the kth observation for combination (i, j)
X ̄i·· = X ̄·j· = X ̄··· =
Xijk bc j=1 k=1
Xijk ac i=1 k=1
1abc Xijk
abc i=1 j=1 k=1
• and as before
abc SS(TO)=Xijk −X ̄···2
i=1 j=1 k=1 ab
= bc (X ̄i·· − X ̄···)2 + ac (X ̄·j· − X ̄···)2
+ c (X ̄ij· − X ̄i·· − X ̄·j· + X ̄···)2
i=1 j=1 abc
+ (Xijk − X ̄ij·)2 i=1 j=1 k=1
= SS(A) + SS(B) + SS(AB) + SS(E)
Test statistics
• Familiar arguments show that to test
H0AB : γij = 0, i = 1,…,a, j = 1,…,b
we may use the statistic
F = SS(AB)/[(a − 1)(b − 1)] SS(E)/[ab(c − 1)]
which has a F distribution with (a − 1)(b − 1) and ab(c − 1) degrees of freedom.
we may use the statistic
H0A:αi =0, i=1,…,a F = SS(A)/[(a − 1)]
SS(E)/[ab(c − 1)]
which has a F distribution with (a − 1) and ab(c − 1) degrees of
we may use the statistic
H0B:βj =0, j=1,…,b F = SS(B)/[(b − 1)]
SS(E)/[ab(c − 1)]
which has a F distribution with (b − 1) and ab(c − 1) degrees of
ANOVA table
Factor A Factor B Factor AB Error
(a − 1)(b − 1) ab(c−1)
SS(B) SS(AB) SS(E)
MS(A) = SS(A) a−1
MS(B) = SS(B) b−1
MS(AB) = SS(AB) (a−1)(b−1)
MS(E)= SS(E) ab(c−1)
MS(A) MS(E) MS(B) MS(E) MS(AB) MS(E)
Total abc−1 SS(TO)
Example (two-way ANOVA with interaction)
• Six groups of 18 people
• Each person takes an arithmetic test: the task is to add three
numbers together
• The numbers are presented either in a down array or an across
array; this defines 2 levels of factor A
• The numbers have either one, two or three digits; this defines 3
levels of factor B
• The response variable, X, is the average number of problems
completed correctly over two 90-second sessions
• Example of adding one-digit numbers in an across array: 2+5+1=?
• Example of adding two-digit numbers in an down array:
> head(data3) ABX 1 down 1 19.5 2 down 1 18.5 3 down 1 32.0 4 down 1 21.5 5 down 1 28.5 6 down 1 33.0
> table(data3[, 1:2])
123 18 18 18 across 18 18 18
> model3 <- lm(X ~ factor(A) * factor(B), data = data3)
> anova(model3)
Analysis of Variance Table
Response: X
Df Sum Sq Mean Sq F value Pr(>F)
factor(A) 1 48.7 48.7 2.8849 0.09246 .
factor(B) 2 8022.7 4011.4 237.7776 < 2e-16 ***
factor(A):factor(B) 2 185.9 93.0 5.5103 0.00534 **
Residuals 102 1720.8 16.9
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Note the use of ‘*’ in the model formula.
The interaction is significant at a 5% level (or even at 1%).
Interaction plot
with(data3, interaction.plot(B, A, X, col = "blue"))
down across
5 10 15 20 25
Beyond the F-test
• We have rejected the null. . . now what?
• This is often only the beginning of a statistical analysis of this type
• Will be interested in more detailed inferences, e.g. CIs/tests about individual parameters
• You know enough to be able to work some of this out. . .
• . . . and later subjects will go into this in more detail
(e.g. MAST30025)
Analysis of variance (ANOVA) Introduction
One-way ANOVA
Two-way ANOVA
Two-way ANOVA with interaction
Hypothesis testing in regression Analysis of variance approach
Likelihood ratio tests
Recap of simple linear regression
• Y a response variable, e.g. student’s grade in first-year calculus
• x a predictor variable, e.g. student’s high school mathematics mark
• Data: pairs (x1,y1),...,(xn,yn)
• Linear regression model:
Y i = α + β ( x i − x ̄ ) + ε i where εi ∼ N(0, σ2) is a random error
• Note: α here plays the same role as α0 from Module 5. We have dropped the ‘0’ subscript for convenience, and also to avoid confusion with its use to denote null hypotheses.
• The MLE (and OLS) estimators are:
ni=1 Yi(xi − x ̄) αˆ=Y, β= ni=1(xi−x ̄)2
σˆ2 = n − 2
[Yi − αˆ − βˆ(xi − x ̄)]2
• We also derived:
σ2 αˆ ∼ N α , n
ˆ σ2 β∼N β,ni=1(xi−x ̄)2
2 n Y −αˆ−βˆ(x −x ̄)2 (n−2)σˆ = i=1 i i
• From these we obtain,
Tα = σˆ/√n ∼ tn−2
αˆ − α βˆ − β
Tβ = σˆ/ni=1(xi − x ̄)2 ∼ tn−2
• We used these previously to construct confidence intervals
• We can also use them to construct hypothesis tests
• For example, to test H0 : β = β0 versus H1 : β ̸= β0
(orβ>β0 orβ<β0),weuseTβ astheteststatistic
Example: testing the slope parameter (β)
• Data: 10 pairs of scores on a preliminary test and a final exam
• Estimates: αˆ = 81.3, βˆ = 0.742, σˆ2 = 27.21
• Test H0: β = 0 versus H1: β ̸= 0 with a 1% significance level
• Reject H0 if:
|Tβ| 3.36 (0.995 quantile of t8)
• For the observed data,
tβ = 27.21/756.1 = 3.91
so we reject H0, concluding there is sufficient evidence that the slope differs from zero.
Note regarding the intercept parameter (α)
• Software packages (such as R) will typically fit the model:
Yi = α + βxi + εi
• This is equivalent to
Y i = α ∗ + β ( x i − x ̄ ) + ε i
• The formulation Yi = α∗ + β(x − x ̄) + ε is easier to examine
• We saw that
αˆ∗ = Y ̄ , and αˆ = Y ̄ − βˆx ̄
• αˆ or αˆ∗ are rarely of direct interest 60 of 81
where α = α∗ − βx ̄ theoretically.
Use R to fit the regression model for the slope example:
> m1 <- lm(final_exam ~ prelim_test)
> summary(m1)
lm(formula = final_exam ~ prelim_test)
Residuals:
Min 1Q Median 3Q Max
-6.883 -3.264 -0.530 3.438 8.470
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.6147 13.0622 2.344 0.04714 *
prelim_test 0.7421 0.1897 3.912 0.00447 **
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 5.217 on 8 degrees of freedom
Multiple R-Squared: 0.6567, Adjusted R-squared: 0.6137
F-statistic: 15.3 on 1 and 8 DF, p-value: 0.004471
The t-value and the p-value are for testing H0: α = 0 and H0: β = 0 respectively.
Interpreting the R output
• Usually most interested in testing H0 : β = 0 versus H1 : β ̸= 0
• If we reject H0 then we conclude there is sufficient evidence of (at
least) a linear relationship between the mean response and x
• In the example,
t = 0.7421 = 3.912 0.1897
• This test statistic has a t-distribution with 10 − 2 = 8 degrees of freedom, and the associated p-value is 0.00447 < 0.05 so at the 5% level of significance we reject H0
• It is also possible to represent this test using an ANOVA table
Deriving the variance decomposition formula
• Independentpairs(x1,Y1),...,(xn,Yn) • Parameter estimates,
ni=1 Yi(xi − x ̄) αˆ=Y, β= ni=1(xi−x ̄)2
• Fitted value (estimated mean),
Yˆ i = Y ̄ + βˆ ( x i − x ̄ )
• Do the ‘add and subtract’ trick again:
(Yi −Y ̄)2 =(Yi −Yˆi +Yˆi −Y ̄)2
i=1 i=1 nn
= ( Y i − Yˆ i ) 2 + ( Yˆ i − Y ̄ ) 2
+2(Yi −Yˆi)(Yˆi −Y ̄)
• Deal with the cross-product term,
(Yi − Yˆi)(Yˆi − Y ̄ ) = Yi − Y ̄ − βˆ(xi − x ̄) βˆ(xi − x ̄)
= βˆ Y i − Y ̄ − βˆ ( x i − x ̄ ) ( x i − x ̄ ) i=1
nn = βˆ (Yi − Y ̄)(xi − x ̄) − βˆ(xi − x ̄)2
i=1 i=1 nn
=βˆ Yi(xi−x ̄)−βˆ(xi−x ̄)2 i=1 i=1
• That gives us,
( Y i − Y ̄ ) 2 = ( Y i − Yˆ i ) 2 + ( Yˆ i − Y ̄ ) 2
i=1 i=1 i=1
• We can write this as follows,
SS(T O) = SS(E) + SS(R)
where SS(R) is the regression SS or model SS
• The regression SS quantifies the variation due to the straight line
• The error SS quantifies the variation around
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com