CS计算机代考程序代写 The Analysis of Variance for Simple Linear

The Analysis of Variance for Simple Linear
Regression

• the total variation in an observed
response about its mean can be written
as a sum of two parts – its deviation from
the fitted value plus the deviation of the
fitted value from the mean response

yi − ȳ = (yi − ŷi) + (ŷi − ȳ)

• squaring both sides gives the total sum
of squares on the left, and two terms on
the right (the third vanishes)

• this is the analysis of variance
decomposition for simple linear
regression

SST = SSE + SSR

• as always, the total is

SST =
n

i=1

(yi − ȳ)2 = SSY Y

1

• the residual sum of squares is

SSE =
n

i=1

(yi − ŷi)2

=
n

i=1

(yi − ȳ − β̂1(xi − x̄))2

= SSY Y − 2β̂1SSXY + β̂21SSXX
= SSY Y − β̂21SSXX
= SSY Y − β̂1SSXY

= SSY Y −
SS2XY
SSXX

• the regression sum of squares is

SSR =
n

i=1

(ŷi − ȳ)2

=
n

i=1

(β̂1(xi − x̄))2

2

=
n

i=1

β̂2
1
(xi − x̄)2

= β̂2
1
SSXX = β̂1SSXY =

SS2XY
SSXX

• in completing the square above, the third
term is

2
n

i=1

(yi − ŷi)(ŷi − ȳ)

= 2
n

i=1

(yi − ŷi)β̂1(xi − x̄)

= 2β̂1
n

i=1

êi(xi − x̄) = 2β̂1SSêX

= 0

using the result that the residuals are
uncorrelated with the predictors

• the degrees of freedom are n − 1, n − 2
and 1 corresponding to SST, SSE and
SSR

3

• the results can be summarized in tabular
form

Source DF SS MS
Regression 1 SSR MSR = SSR/1
Residual n − 2 SSE MSE = SSE/(n-2)
Total n − 1 SST

Example: For the Ozone data

• SST = SSY Y = 1014.75

• SSR = SS
2
xy

SSxx
= (−2.7225)2/.009275 =

799.1381

• SSE = SST − SSR =
1014.75 − 799.1381 = 215.62

• degrees of freedom: total = 4-1=3,
regression = 1, error = 2

4

• goodness of fit of the regression line is
measured by the coefficient of
determination

R2 =
SSR

SST

• this is the proportion of variation in y
explained by the regression on x

• R2 is always between 0, indicating
nothing is explained, and 1, indicating all
points must lie on a straight line

• for simple linear regression R2 is just the
square of the (Pearson) correlation
coefficient

R2 =
SSR

SST
=

SS2XY /SSXX
SSY Y

=
SS2XY

SSXXSSY Y
= r2

5

• this gives another interpretation of the
correlation coefficient – its square is the
coefficient of determination, the
proportion of variation explained by the
regression

• note that with R2 and SST, one can
calculate

SSR = R2SST

and
SSE = (1 − R2)SST

Example: Ozone data

• we saw r = −.8874, so R2 = .78875 of
the variation in y is explained by the
regression

• with SST = 1014.75, we can get

SSR = R2SST = .78875(1014.75)

= 800.384

6

and

SSE = (1 − R2)SST
= (1 − .78875)1014.75 = 214.3659

• these answers differ slightly from above
due to round-off error

A statistical model for simple linear regression

• we assume that an observed response
value yi is related to its predictor xi
according to the model

yi = β0 + β1xi + �i

• where β0 and β1 are the intercept and
slope

• �i is an additive random deviation or
‘error’, assumed to have zero mean and
constant variance σ2

• any two deviations �i and �j are assumed
to be independent

7

• the mean of yi is

µxi = β0 + β1xi

which is linear in xi

• the variance is assumed to be the same
for each case, and this justifies giving
each case the same weight when
minimizing SSE

• under these assumptions, the least
squares estimators

β̂1 =
SSXY
SSXX

and
β̂0 = ȳ − β̂1x̄

have good statistical properties

• among all linear unbiased estimators,
they have minimum variance

8

• an unbiased estimator has a sampling
distribution with mean equal to the
parameter being estimated

• the variance of the deviations σ2 is
estimated using the average squared
residual,

s2 =
1

n − 2
n

i=1

(yi−ŷi)2 =
SSE

n − 2
= MSE

where division is by n − 2 here because
two β’s have been estimated

• to make inferences about the model
parameters we also need to assume that
the deviations �i are normally distributed

9

Statistical inferences for regression

Standard errors for regression coefficients

• regression coefficient values, β̂0 and β̂1,
are point estimates of the true intercept
and slope, β0 and β1 respectively.

• using our assumptions about the
deviations, and the rules for mean and
variance, the sampling distribution of the
slope estimator can be derived to be

β̂1 ∼ N(β1,
σ2

SSxx
)

• this means that if we had a large number
of data sets and calculated the slope
estimate each time, their histogram
would look normal, be centered around
the true slope and have variance as given
above

• the standard deviation of β̂1 is

σ2

SSxx

10

• the value of σ2 is unknown, so the
estimator MSE is used in its place to
produce the standard error of the
estimate β̂1, as

SEβ̂1 =


MSE√
SSxx

=
s√

SSxx

• the standard error for the intercept
estimator β̂0 is

SEβ̂0 =

MSE(
1

n
+

x̄2

SSxx
)

Example: Ozone data

• standard errors for the regression
coefficients are estimated below.

• SSxx = .009275 and MSE = 107.80

• SEβ̂1 =

MSE/SSxx =

107.80/.009275 = 107.81

11

• SEβ̂0 =

MSE( 1
n

+ x̄
2

SSxx
) =

107.80((1/4) + (.0399/.009275)) =
10.77

Tests for regression coefficients

• the most common and useful test is
whether or not the relationship between
the response and predictor is significant

• H0 : β1 = 0, there is no linear
relationship

• Ha : β1 6= 0, there is a linear relationship

• the alternative is usually two sided
• the test statistic is

T =
β̂1

SEβ̂1

and this is compared to the tn−2
distribution

12

• on occasion, we specify a value β1,0
other than 0 in the null hypothesis

• then the test statistic becomes

T =
β̂1 − β1,0

SEβ̂1

• one can also test hypotheses about the
intercept

• H0 : β0 = β0,0,

• Ha : β0 6= β0,0
• often we are interested in whether the

intercept is zero

• the test statistic is

T =
β̂0 − β0,0

SEβ̂0

and this is compared to the tn−2
distribution

13

Example: Ozone data

• we saw β̂1 = −293.531 and
SEβ̂1 = 107.81

• the test of H0 : β1 = 0 versus
Ha : β1 6= 0 gives

T =
−293.531
107.81

= −2.7227

• comparing to the t4−2=2 distribution
gives P = .11 exactly, or .10 < P < .20 using the tables • in spite of the high correlation calculated earlier, the relationship between ozone and yield is not significant using α = .10 or smaller Example: Tree data. • earlier we obtained β̂1 = 11.036, n = 20, r = .976, sy = 91.7 and sx = 8.1 for the straight line fit 14 • we can determine that SSXX = 19s 2 x = 19(8.1) 2 = 1246.59 and SST = SSY Y = 19(91.7) 2 = 159, 768.9 • from this we can calculate SSE = (1 − R2)SST = (1 − .9762)159768.9 = 7576.88 and MSE = SSE n − 2 = 7576.88 18 = 420.9378 • the standard error of the slope estimate is SEβ̂1 = √ MSE SSXX = √ 420.9378 1246.59 = .5811 15 • the test statistic for an association between diameter and usable volume is T = 11.036 .5811 = 18.99 and there are 20 − 2 = 18 degrees of freedom • the P value is less than .01, using the tables, so we conclude that the linear association between usable volume and diameter at chest height is statistically significant • if you compare with the computer output shown earlier, you will see that the values calculated by hand are slightly different, due to round-off error MTB > regress c2 1 c1;

SUBC> residuals c3.

The regression equation is

volume = – 191 + 11.0 diameter

Predictor Coef Stdev t-ratio p

16

Constant -191.12 16.98 -11.25 0.000

diameter 11.0413 0.5752 19.19 0.000

s = 20.33 R-sq = 95.3% R-sq(adj) = 95.1%

Analysis of Variance

SOURCE DF SS MS F p

Regression 1 152259 152259 368.43 0.000

Error 18 7439 413

Total 19 159698

Confidence intervals for regression coefficients

• confidence intervals are constructed
using the standard errors as follows

β̂i ± tα/2,n−2SEβ̂i
for i = 0 or i = 1

• the degrees of freedom for the t
distribution are the same as the degrees
of freedom associated with MSE

Example: Ozone data

• 95% confidence intervals for β1 and β0
are computed as follows

17

• tα/2,n−2 = t.025,2 = 4.303
• for the slope, β1:
−293.531 ± 4.303(107.81)

(−757.4, 170.3)
• note that this interval contains zero,

which confirms that the slope is not
significantly different from zero

• for the intercept, β0:
253.434 ± 4.303(10.77)

(207.1, 299.8)

Estimating the mean of Y at x = x∗

• the estimated mean of Y when x = x∗ is

µ̂x∗ = β̂0 + β̂1x
∗ = ȳ + β̂1(x

∗ − x̄)

• because both β̂0 and β̂1 have normal
sampling distributions, µx∗ does as well

18

• the mean of this distribution is the true
mean

µx∗ = β0 + β1x

because both β̂0 and β̂1 have means
equal to their population values

• the variance of µ̂x∗ is

σ2
(

1

n
+

(x∗ − x̄)2
SSxx

)

which is the sum of the variances of ȳ
and β̂1(x

∗ − x̄)
• in short

µ̂x∗ ∼ N
(

β0 + β1x
∗, σ2

(

1

n
+

(x∗ − x̄)2
SSxx

))

• the standard error of µ̂x∗ is

SEµ̂x∗ =

√MSE

(

1

n
+

(x∗ − x̄)2
SSxx

)

19

• a confidence interval for the mean
µx∗ = β0 + β1x

∗ when x = x∗ is given by

µ̂x∗ ± tα/2,n−2SEµ̂x∗
Example: Ozone data

• a 95% confidence interval for the mean
yield at x = 0.10 is obtained as follows

• when x∗ = 0.10, the estimated mean is

µ̂.1 = 253.434 − 293.531(0.1) = 224.08

• the standard error of this estimate is

SEµ̂.1 =

√107.8

(

1

4
+

(0.1 − .0875)2
.009275

)

= 5.36

• the table value is
tα/2,n−2 = t.025,2 = 4.303

• the half width of the interval, or margin
of error, is

tα/2,n−2SEµ̂.1 = 4.303(5.36) = 23.08

20

• so the interval is 224.08 ± 23.08 or

(201, 247.16)

Predicting a new response value at x = x∗

• in making a prediction interval for a
future observation on y when x = x∗, we
need to incorporate two sources of
variation

• the first is the variation in the estimate
µ̂x∗ about the actual mean µx∗

• the second is the variation of the new
response y about its mean

• the error of prediction is

y − (β̂0 + β̂1x∗) = (y − (β0 + β1x∗)) −
(β̂0 + β̂1x

∗ − (β0 + β1x∗))

• the first term in brackets on the right
hand side of this expression is �∗, which
has a N(0, σ2) distribution.

21

• the second term is the deviation of µ̂x∗
from the actual mean µx∗ which we have
seen is

N

(

0, σ2
(

1

n
+

(x∗ − x̄)2
SSxx

))

• as y represents a future observation, the
distributions of the two terms are
independent, and it follows that the
distribution of the prediction error
y − (β̂0 + β̂1x∗) is

N

(

0, σ2
(

1 +
1

n
+

(x∗ − x̄)2
SSxx

))

• the standard error of the prediction error
is estimated by

√MSE

(

1 +
1

n
+

(x∗ − x̄)2
SSxx

)

22

• and the prediction interval for y is given
by

β̂0+β̂1x
∗±tα/2,n−2

√MSE

(

1 +
1

n
+

(x∗ − x̄)2
SSxx

)

Ozone example: A 95% prediction interval

for y when x = 0.10 is calculated.

• when x∗ = 0.10, the prediction is

µ̂x∗ = 253.434 − 293.531(0.1) = 224.08

• the standard error of prediction is

SEy∗ =

√107.8

(

1 +
1

4
+

(0.1 − .0875)2
.009275

)

= 11.69

• the margin of error is
tα/2,n−2SEy∗ = 4.303(11.69) = 50.29

• so the prediction interval is

224.08 ± 50.29

23

• or (173.79, 274.37)
Tree example: Minitab can be used to find
confidence intervals for the mean at x∗ and
for prediction intervals for a new value at x∗.

• the output below was obtained using
Stat > Regression > Options, where a
diameter of 30 in. was used

MTB > Name c3 “CLIM1” c4 “CLIM2” c5 “PLIM1” c6 “PLIM2”

MTB > Regress c2 1 c1;

SUBC> Constant;

SUBC> Predict 30;

SUBC> CLimits ’CLIM1’-’CLIM2’;

SUBC> PLimits ’PLIM1’-’PLIM2’;

SUBC> Brief 2.

Regression Analysis: C2 versus C1

The regression equation is

C2 = – 191 + 11.0 C1

Predictor Coef SE Coef T P

Constant -191.12 16.98 -11.25 0.000

C1 11.0413 0.5752 19.19 0.000

24

S = 20.3290 R-Sq = 95.3% R-Sq(adj) = 95.1%

Analysis of Variance

Source DF SS MS F P

Regression 1 152259 152259 368.43 0.000

Residual Error 18 7439 413

Total 19 159698

Predicted Values for

Fit SE Fit 95% CI 95% PI

1 140.11 4.63 (130.38, 149.85) (96.31, 183.92)

Values of Predictors for

C1

1 30.0

• for this dataset we previously saw that
n = 20, SSXX = 1246.59 and
MSE = 420.9378

25

• the mean diameter is x̄ = 28.45, so the
standard error for estimating the mean
at X = 30 is

SEµ̂x∗ =

√420.9378 ∗
(

1

20
+

(30 − 28.45)2
1246.59

)

= 4.6753

• this is close to the SE Fit given in the
output

26