The Analysis of Variance for Simple Linear
Regression
• the total variation in an observed
response about its mean can be written
as a sum of two parts – its deviation from
the fitted value plus the deviation of the
fitted value from the mean response
yi − ȳ = (yi − ŷi) + (ŷi − ȳ)
• squaring both sides gives the total sum
of squares on the left, and two terms on
the right (the third vanishes)
• this is the analysis of variance
decomposition for simple linear
regression
SST = SSE + SSR
• as always, the total is
SST =
n
∑
i=1
(yi − ȳ)2 = SSY Y
1
• the residual sum of squares is
SSE =
n
∑
i=1
(yi − ŷi)2
=
n
∑
i=1
(yi − ȳ − β̂1(xi − x̄))2
= SSY Y − 2β̂1SSXY + β̂21SSXX
= SSY Y − β̂21SSXX
= SSY Y − β̂1SSXY
= SSY Y −
SS2XY
SSXX
• the regression sum of squares is
SSR =
n
∑
i=1
(ŷi − ȳ)2
=
n
∑
i=1
(β̂1(xi − x̄))2
2
=
n
∑
i=1
β̂2
1
(xi − x̄)2
= β̂2
1
SSXX = β̂1SSXY =
SS2XY
SSXX
• in completing the square above, the third
term is
2
n
∑
i=1
(yi − ŷi)(ŷi − ȳ)
= 2
n
∑
i=1
(yi − ŷi)β̂1(xi − x̄)
= 2β̂1
n
∑
i=1
êi(xi − x̄) = 2β̂1SSêX
= 0
using the result that the residuals are
uncorrelated with the predictors
• the degrees of freedom are n − 1, n − 2
and 1 corresponding to SST, SSE and
SSR
3
• the results can be summarized in tabular
form
Source DF SS MS
Regression 1 SSR MSR = SSR/1
Residual n − 2 SSE MSE = SSE/(n-2)
Total n − 1 SST
Example: For the Ozone data
• SST = SSY Y = 1014.75
• SSR = SS
2
xy
SSxx
= (−2.7225)2/.009275 =
799.1381
• SSE = SST − SSR =
1014.75 − 799.1381 = 215.62
• degrees of freedom: total = 4-1=3,
regression = 1, error = 2
4
• goodness of fit of the regression line is
measured by the coefficient of
determination
R2 =
SSR
SST
• this is the proportion of variation in y
explained by the regression on x
• R2 is always between 0, indicating
nothing is explained, and 1, indicating all
points must lie on a straight line
• for simple linear regression R2 is just the
square of the (Pearson) correlation
coefficient
R2 =
SSR
SST
=
SS2XY /SSXX
SSY Y
=
SS2XY
SSXXSSY Y
= r2
5
• this gives another interpretation of the
correlation coefficient – its square is the
coefficient of determination, the
proportion of variation explained by the
regression
• note that with R2 and SST, one can
calculate
SSR = R2SST
and
SSE = (1 − R2)SST
Example: Ozone data
• we saw r = −.8874, so R2 = .78875 of
the variation in y is explained by the
regression
• with SST = 1014.75, we can get
SSR = R2SST = .78875(1014.75)
= 800.384
6
and
SSE = (1 − R2)SST
= (1 − .78875)1014.75 = 214.3659
• these answers differ slightly from above
due to round-off error
A statistical model for simple linear regression
• we assume that an observed response
value yi is related to its predictor xi
according to the model
yi = β0 + β1xi + �i
• where β0 and β1 are the intercept and
slope
• �i is an additive random deviation or
‘error’, assumed to have zero mean and
constant variance σ2
• any two deviations �i and �j are assumed
to be independent
7
• the mean of yi is
µxi = β0 + β1xi
which is linear in xi
• the variance is assumed to be the same
for each case, and this justifies giving
each case the same weight when
minimizing SSE
• under these assumptions, the least
squares estimators
β̂1 =
SSXY
SSXX
and
β̂0 = ȳ − β̂1x̄
have good statistical properties
• among all linear unbiased estimators,
they have minimum variance
8
• an unbiased estimator has a sampling
distribution with mean equal to the
parameter being estimated
• the variance of the deviations σ2 is
estimated using the average squared
residual,
s2 =
1
n − 2
n
∑
i=1
(yi−ŷi)2 =
SSE
n − 2
= MSE
where division is by n − 2 here because
two β’s have been estimated
• to make inferences about the model
parameters we also need to assume that
the deviations �i are normally distributed
9
Statistical inferences for regression
Standard errors for regression coefficients
• regression coefficient values, β̂0 and β̂1,
are point estimates of the true intercept
and slope, β0 and β1 respectively.
• using our assumptions about the
deviations, and the rules for mean and
variance, the sampling distribution of the
slope estimator can be derived to be
β̂1 ∼ N(β1,
σ2
SSxx
)
• this means that if we had a large number
of data sets and calculated the slope
estimate each time, their histogram
would look normal, be centered around
the true slope and have variance as given
above
• the standard deviation of β̂1 is
√
σ2
SSxx
10
• the value of σ2 is unknown, so the
estimator MSE is used in its place to
produce the standard error of the
estimate β̂1, as
SEβ̂1 =
√
MSE√
SSxx
=
s√
SSxx
• the standard error for the intercept
estimator β̂0 is
SEβ̂0 =
√
MSE(
1
n
+
x̄2
SSxx
)
Example: Ozone data
• standard errors for the regression
coefficients are estimated below.
• SSxx = .009275 and MSE = 107.80
• SEβ̂1 =
√
MSE/SSxx =
√
107.80/.009275 = 107.81
11
• SEβ̂0 =
√
MSE( 1
n
+ x̄
2
SSxx
) =
√
107.80((1/4) + (.0399/.009275)) =
10.77
Tests for regression coefficients
• the most common and useful test is
whether or not the relationship between
the response and predictor is significant
• H0 : β1 = 0, there is no linear
relationship
• Ha : β1 6= 0, there is a linear relationship
• the alternative is usually two sided
• the test statistic is
T =
β̂1
SEβ̂1
and this is compared to the tn−2
distribution
12
• on occasion, we specify a value β1,0
other than 0 in the null hypothesis
• then the test statistic becomes
T =
β̂1 − β1,0
SEβ̂1
• one can also test hypotheses about the
intercept
• H0 : β0 = β0,0,
• Ha : β0 6= β0,0
• often we are interested in whether the
intercept is zero
• the test statistic is
T =
β̂0 − β0,0
SEβ̂0
and this is compared to the tn−2
distribution
13
Example: Ozone data
• we saw β̂1 = −293.531 and
SEβ̂1 = 107.81
• the test of H0 : β1 = 0 versus
Ha : β1 6= 0 gives
T =
−293.531
107.81
= −2.7227
• comparing to the t4−2=2 distribution
gives P = .11 exactly, or .10 < P < .20
using the tables
• in spite of the high correlation calculated
earlier, the relationship between ozone
and yield is not significant using α = .10
or smaller
Example: Tree data.
• earlier we obtained β̂1 = 11.036, n = 20,
r = .976, sy = 91.7 and sx = 8.1 for the
straight line fit
14
• we can determine that
SSXX = 19s
2
x = 19(8.1)
2 = 1246.59
and
SST = SSY Y = 19(91.7)
2 = 159, 768.9
• from this we can calculate
SSE = (1 − R2)SST
= (1 − .9762)159768.9 = 7576.88
and
MSE =
SSE
n − 2
=
7576.88
18
= 420.9378
• the standard error of the slope estimate
is
SEβ̂1 =
√
MSE
SSXX
=
√
420.9378
1246.59
= .5811
15
• the test statistic for an association
between diameter and usable volume is
T =
11.036
.5811
= 18.99
and there are 20 − 2 = 18 degrees of
freedom
• the P value is less than .01, using the
tables, so we conclude that the linear
association between usable volume and
diameter at chest height is statistically
significant
• if you compare with the computer output
shown earlier, you will see that the
values calculated by hand are slightly
different, due to round-off error
MTB > regress c2 1 c1;
SUBC> residuals c3.
The regression equation is
volume = – 191 + 11.0 diameter
Predictor Coef Stdev t-ratio p
16
Constant -191.12 16.98 -11.25 0.000
diameter 11.0413 0.5752 19.19 0.000
s = 20.33 R-sq = 95.3% R-sq(adj) = 95.1%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 152259 152259 368.43 0.000
Error 18 7439 413
Total 19 159698
Confidence intervals for regression coefficients
• confidence intervals are constructed
using the standard errors as follows
β̂i ± tα/2,n−2SEβ̂i
for i = 0 or i = 1
• the degrees of freedom for the t
distribution are the same as the degrees
of freedom associated with MSE
Example: Ozone data
• 95% confidence intervals for β1 and β0
are computed as follows
17
• tα/2,n−2 = t.025,2 = 4.303
• for the slope, β1:
−293.531 ± 4.303(107.81)
(−757.4, 170.3)
• note that this interval contains zero,
which confirms that the slope is not
significantly different from zero
• for the intercept, β0:
253.434 ± 4.303(10.77)
(207.1, 299.8)
Estimating the mean of Y at x = x∗
• the estimated mean of Y when x = x∗ is
µ̂x∗ = β̂0 + β̂1x
∗ = ȳ + β̂1(x
∗ − x̄)
• because both β̂0 and β̂1 have normal
sampling distributions, µx∗ does as well
18
• the mean of this distribution is the true
mean
µx∗ = β0 + β1x
∗
because both β̂0 and β̂1 have means
equal to their population values
• the variance of µ̂x∗ is
σ2
(
1
n
+
(x∗ − x̄)2
SSxx
)
which is the sum of the variances of ȳ
and β̂1(x
∗ − x̄)
• in short
µ̂x∗ ∼ N
(
β0 + β1x
∗, σ2
(
1
n
+
(x∗ − x̄)2
SSxx
))
• the standard error of µ̂x∗ is
SEµ̂x∗ =
√
√
√
√MSE
(
1
n
+
(x∗ − x̄)2
SSxx
)
19
• a confidence interval for the mean
µx∗ = β0 + β1x
∗ when x = x∗ is given by
µ̂x∗ ± tα/2,n−2SEµ̂x∗
Example: Ozone data
• a 95% confidence interval for the mean
yield at x = 0.10 is obtained as follows
• when x∗ = 0.10, the estimated mean is
µ̂.1 = 253.434 − 293.531(0.1) = 224.08
• the standard error of this estimate is
SEµ̂.1 =
√
√
√
√107.8
(
1
4
+
(0.1 − .0875)2
.009275
)
= 5.36
• the table value is
tα/2,n−2 = t.025,2 = 4.303
• the half width of the interval, or margin
of error, is
tα/2,n−2SEµ̂.1 = 4.303(5.36) = 23.08
20
• so the interval is 224.08 ± 23.08 or
(201, 247.16)
Predicting a new response value at x = x∗
• in making a prediction interval for a
future observation on y when x = x∗, we
need to incorporate two sources of
variation
• the first is the variation in the estimate
µ̂x∗ about the actual mean µx∗
• the second is the variation of the new
response y about its mean
• the error of prediction is
y − (β̂0 + β̂1x∗) = (y − (β0 + β1x∗)) −
(β̂0 + β̂1x
∗ − (β0 + β1x∗))
• the first term in brackets on the right
hand side of this expression is �∗, which
has a N(0, σ2) distribution.
21
• the second term is the deviation of µ̂x∗
from the actual mean µx∗ which we have
seen is
N
(
0, σ2
(
1
n
+
(x∗ − x̄)2
SSxx
))
• as y represents a future observation, the
distributions of the two terms are
independent, and it follows that the
distribution of the prediction error
y − (β̂0 + β̂1x∗) is
N
(
0, σ2
(
1 +
1
n
+
(x∗ − x̄)2
SSxx
))
• the standard error of the prediction error
is estimated by
√
√
√
√MSE
(
1 +
1
n
+
(x∗ − x̄)2
SSxx
)
22
• and the prediction interval for y is given
by
β̂0+β̂1x
∗±tα/2,n−2
√
√
√
√MSE
(
1 +
1
n
+
(x∗ − x̄)2
SSxx
)
Ozone example: A 95% prediction interval
for y when x = 0.10 is calculated.
• when x∗ = 0.10, the prediction is
µ̂x∗ = 253.434 − 293.531(0.1) = 224.08
• the standard error of prediction is
SEy∗ =
√
√
√
√107.8
(
1 +
1
4
+
(0.1 − .0875)2
.009275
)
= 11.69
• the margin of error is
tα/2,n−2SEy∗ = 4.303(11.69) = 50.29
• so the prediction interval is
224.08 ± 50.29
23
• or (173.79, 274.37)
Tree example: Minitab can be used to find
confidence intervals for the mean at x∗ and
for prediction intervals for a new value at x∗.
• the output below was obtained using
Stat > Regression > Options, where a
diameter of 30 in. was used
MTB > Name c3 “CLIM1” c4 “CLIM2” c5 “PLIM1” c6 “PLIM2”
MTB > Regress c2 1 c1;
SUBC> Constant;
SUBC> Predict 30;
SUBC> CLimits ’CLIM1’-’CLIM2’;
SUBC> PLimits ’PLIM1’-’PLIM2’;
SUBC> Brief 2.
Regression Analysis: C2 versus C1
The regression equation is
C2 = – 191 + 11.0 C1
Predictor Coef SE Coef T P
Constant -191.12 16.98 -11.25 0.000
C1 11.0413 0.5752 19.19 0.000
24
S = 20.3290 R-Sq = 95.3% R-Sq(adj) = 95.1%
Analysis of Variance
Source DF SS MS F P
Regression 1 152259 152259 368.43 0.000
Residual Error 18 7439 413
Total 19 159698
Predicted Values for
Fit SE Fit 95% CI 95% PI
1 140.11 4.63 (130.38, 149.85) (96.31, 183.92)
Values of Predictors for
C1
1 30.0
• for this dataset we previously saw that
n = 20, SSXX = 1246.59 and
MSE = 420.9378
25
• the mean diameter is x̄ = 28.45, so the
standard error for estimating the mean
at X = 30 is
SEµ̂x∗ =
√
√
√
√420.9378 ∗
(
1
20
+
(30 − 28.45)2
1246.59
)
= 4.6753
• this is close to the SE Fit given in the
output
26