Residual Analysis for two-way ANOVA
The twoway model with K replicates, including inter-
action, is Yijk = µij + �ijk = µ + αi + βj + γij + �ijk
with i = 1, . . . , I, j = 1, . . . , J , k = 1, . . . , K.
In carrying out the F tests for interaction, and for the
main effects of factors A and B, we have assumed that
�ijk are as sample from N(0, σ
2).
Among other things, this means that:
• the distribution of the errors (and in particular, the
variance σ2) does not differ depending on the level
of factor A, the level of factor B, or the mean of
the response (µij = µ + αi + βj + γij)
• the errors are a sample from a normal distribution
If these assumptions hold, then the p-values for the
tests of interaction and main effects are valid. If the as-
sumptions do not hold, then the p-values may substan-
tially over- or under-estimate the evidence against the null
hypotheses.
Residuals are usually defined as the difference “data-
prediction”.
In the twoway anova model with interaction, the pre-
dicted value of Yijk is µ̂ij , and so the residuals are
rijk = Yijk − µ̂ij = Yijk − Ȳij.
(Another way of writing the residual for the twoway model
with interaction is rijk = Yijk − µ̂ − α̂i − β̂j − γ̂ij .)
If the sample size is moderately large, the residuals should
be approximately equal to the errors �ijk, and so we use
the residuals (which are known to us) in place of the errors
�ijk (which are unknown) to assess the plausibility of the
model assumptions.
The following plots are often useful in this regard:
1. A QQ plot of the residuals is used to assess the
assumption of normality of errors
2. To assess the assumption that the distribution of
the errors (in particular the variance of the distri-
bution) does not depend on the levels of either fac-
tor A or factor B, the residuals should be plotted
against:
(a) the levels of factor A
(b) the levels of factor B
(c) the fitted values Ȳij.
Example: A two factor experiment was carried out
in which the survival times (in units of 10 hours) were
measured for groups of four animals (replicates) randomly
allocated to three poisons and four treatments.
The data were as follows:
Poison Treatment Data
I A 0.31 0.45 0.46 0.43
II A 0.36 0.29 0.40 0.23
III A 0.22 0.21 0.18 0.23
I B 0.82 1.10 0.88 0.72
II B 0.92 0.61 0.49 1.24
III B 0.30 0.37 0.38 0.29
I C 0.43 0.45 0.63 0.76
II C 0.44 0.35 0.31 0.40
III C 0.23 0.25 0.24 0.22
I D 0.45 0.71 0.66 0.62
II D 0.56 1.02 0.71 0.38
III D 0.30 0.36 0.31 0.33
The data were entered into minitab, and a twoway anova
was carried out, as follows:
MTB > print c1
0.31 0.45 0.46 0.43
0.36 0.29 0.40 0.23
0.22 0.21 0.18 0.23
0.82 1.10 0.88 0.72
0.92 0.61 0.49 1.24
0.30 0.37 0.38 0.29
0.43 0.45 0.63 0.76
0.44 0.35 0.31 0.40
0.23 0.25 0.24 0.22
0.45 0.71 0.66 0.62
0.56 1.02 0.71 0.38
0.30 0.36 0.31 0.33
MTB > set c2
DATA> 4(1 1 1 1 2 2 2 2 3 3 3 3)
DATA> set c3
DATA> 12(1) 12(2) 12(3) 12(4)
MTB > twoway c1 c2 c3;
SUBC> residuals c4;
SUBC> fits c5.
Two-way ANOVA: C1 versus C2, C3
Source DF SS MS F P
C2 2 1.03301 0.516506 23.22 0.000
C3 3 0.92121 0.307069 13.81 0.000
Interaction 6 0.25014 0.041690 1.87 0.112
Error 36 0.80073 0.022242
Total 47 3.00508
S = 0.1491 R-Sq = 73.35% R-Sq(adj) = 65.21%
MTB > nscores c4 c6
The normal scores and residual plots are as follows:
Figure 1: Normal scores plot of residuals
Figure 2: Plot of residuals vs type of poison
Figure 3: Plot of residuals vs treatment
Figure 4: Plot of residuals vs fitted values
If the QQ plot shows evidence of non-normality, or if
the disribution of the residuals appears to depend on the
levels of one or both factors, then the inferences (eg p-
values) concerning the model parameters may be invalid.
In this case, the QQ plot provides some suggestion of
non-normality. The plots of residual vs factor level suggest
that the variance of the residuals is not constant across
levels of either factor. A definite pattern can be seen in
the plot of residuals vs predicted values, in which variance
of the residual is increasing as the fitted value increases.
This suggests that the variance of Y is increasing with
the mean of Y . Consequently, our conclusions regarding
the significance of effects and interactions may be in error
due to incorrect assumptions.
In such cases, one approach which is often taken is to
try to find a transformation of the dependent variable
to a form for which the model assumptions are better
satisfied. Transformations which are sometimes tried are
to replace Y by
√
Y , log(Y ), or 1/Y .
There are some results from probability and statistical
theory which provide techniques to search for so-called
variance stabilizing transformations. These ideas are
studied in some higher level statistics courses. After care-
ful examination of the pattern of residuals, we are led
to consider the reciprocal transformation, Zijk = 1/Yijk.
(In this case, where Y are measurements of time, then
Z = 1/Y are described as rates, and have units of
1/time.)
A twoway model was fit for Zijk, leading to the fol-
lowing output:
MTB > let c7=1/c1
MTB > twoway c7 c2 c3;
SUBC> residuals c8;
SUBC> fits c9.
Two-way ANOVA: C7 versus C2, C3
Source DF SS MS F P
C2 2 34.8771 17.4386 72.63 0.000
C3 3 20.4143 6.8048 28.34 0.000
Interaction 6 1.5708 0.2618 1.09 0.387
Error 36 8.6431 0.2401
Total 47 65.5053
S = 0.4900 R-Sq = 86.81% R-Sq(adj) = 82.77%
MTB > nscores c8 c10
The normal scores and residual plots for the trans-
formed data are as follows:
Figure 5: Plot of residuals vs poison – transformed data
Figure 6: Plot of residuals vs treatment – transformed
data
Figure 7: Plot of residuals vs fitted values – transformed
data
Figure 8: Normal scores plot of residuals – transformed
data
These residual plots suggest few departures from the
model assumptions, and so we can be confident about the
validity of our conclusions for the transformed data.
Reference: Box and Cox (1964) An analysis of trans-
formation. J.Roy.Stat.Soc.B, 26, 211.
The data were part of a larger investigation to combat
the effects of toxic agents.