CS计算机代考程序代写 Properties of the least squares fit

Properties of the least squares fit

1. the fitted line passes through (x̄, ȳ)

• see this by substituting xi = x̄
into the fitted line

ŷi = ȳ + β̂1(xi − x̄)

2. the mean of the fitted values is the
same as the mean of the observed
responses

• the mean of the fitted values is

¯̂y =
1

n

n∑

i=1

ŷi

=
1

n

n∑

i=1

(ȳ + β̂1(xi − x̄))

= ȳ +
β̂1

n

n∑

i=1

(xi − x̄)

= ȳ

1

3. the mean of the residuals is zero

• the residuals are êi = yi − ŷi

• so, using (2) above

¯̂e =
1

n

n∑

i=1

(yi − ŷi)

= ȳ − ¯̂y = 0

4. the residuals have zero correlation
with the predictor

• we can show SSêX = 0

SSêX =
n∑

i=1

(êi − ē)(xi − x̄)

=
n∑

i=1

(yi − ŷi)(xi − x̄)

=
n∑

i=1

(yi − ȳ − β̂1(xi − x̄))(xi − x̄)

2

= SSXY − β̂1SSXX = 0

• the residuals have zero
correlation with the fitted values

• we can show SSêŷ = 0

SSêŷ =
n∑

i=1

êi(ŷi − ȳ)

=
n∑

i=1

(yi − ŷi)β̂1(xi − x̄)

=
n∑

i=1

(yi − ȳ − β̂1(xi − x̄))β̂1(xi − x̄)

= β̂1SSXY − β̂
2
1SSXX

= β̂1SSXY − β̂1SSXY = 0

3

Ozone example: the fitted values,
residuals, sums and crossproducts are
shown below

xi yi ŷi êi = yi − ŷi êixi

.02 242 247.563 -5.563 -.1113

.07 237 232.887 4.113 .28791

.11 231 221.146 9.854 1.0840

.15 201 209.404 -8.404 -1.2606

Sum 911 911 0 0

• the observed and fitted responses
have the same sum

• the residuals have zero sum

• the correlation between residuals and
predictors will be zero because the
sum of cross products is zero

4

Plotting residuals to assess fit

• from (3) above, the residuals have
zero mean, and from (4) and (5)
they are uncorrelated with the
predictor x and the fitted values ŷ

• a scatterplot of the residuals versus x
should show random scatter about 0,
with no linear association with x

• the scatterplot of residuals versus
fitted values should be similar

• various problems can be revealed
from the plot of ê versus x or ŷ

– curvature indicates that the form
of the model is not correct

∗ this can be fixed by adding
the term x2 to the model or
by transforming the response
variable

5

– the magnitude of the residuals
may increase or decrease with
the predictor – sometimes called
‘fanning’ out

∗ when we use least squares
and minimize SSE, we give
equal weight to all n
deviations

∗ this implicitly assumes that
the deviations are all roughly
the same size

∗ this problem can be fixed
using a weighted least
squares criterion (giving
smaller weight to the larger
deviations) or by
transformation

6

Example: Lumber example – useable
volume versus diameter at chest height
MTB > plot c3 c1

– *

25+ * * *

– *

C3 – *

– *

0+ * * * *

– * *

– * * *

– *

-25+

– * *

– *

+———+———+———+———+———+——C1

15.0 20.0 25.0 30.0 35.0 40.0

• there is clearly some curvature here

• one remedy is to add a quadratic
term in the equation, giving

y = β0 + β1x + β2x
2

7

• MINITAB can fit this too

MTB > let c3 = c1**2

MTB > regress c2 2 c1 c3;

SUBC> residuals c4.

The regression equation is

volume = 29.7 – 5.62 diameter + 0.290 C3

Predictor Coef Stdev t-ratio p

Constant 29.74 51.39 0.58 0.570

diameter -5.620 3.792 -1.48 0.157

C3 0.29037 0.06572 4.42 0.000

s = 14.27 R-sq = 97.8% R-sq(adj) = 97.6%

Analysis of Variance

SOURCE DF SS MS F p

Regression 2 156236 78118 383.54 0.000

Error 17 3463 204

Total 19 159698

SOURCE DF SEQ SS

diameter 1 152259

C3 1 3976

MTB > plot c4 c1

C4 –

– *

20+

– * *

– * *

– * *

– * * *

0+ * *

– *

– *

– * *

– * *

-20+ *

– *

+———+———+———+———+———+——diameter

15.0 20.0 25.0 30.0 35.0 40.0

8

• the new residual plot shows no
curvature

Example: PCBs in lake trout

• consider the PCB concentration in
Cayuga Lake Trout, plotted against
the age of the fish






• ••

••
• •




age

pc
bs

(
pp

m
)

2 4 6 8 10 12

0
5

10
15

20
25

30

• the fitted least squares line is

PCB = −1.45 + 1.56age

9

• the residuals, however show problems






••

••



• •

pcb.age

p
cb

re
g

$
re

si
d

u
a

ls

2 4 6 8 10 12

-1
0

-5
0

5
1

0
1

5

• the residuals are larger at larger ages

• there is some curvature in the plot

• the plot of log(PCB) versus age, with
least squares line is shown

10

• the least squares fit is

log(PCB) = .03 + .259age


••




• •

age

lo
g

e
(p

cb
)

2 4 6 8 10 12

0
1

2
3

11

• the residual plot shows even spread
for all ages



age

re
si

d
u

a
l

2 4 6 8 10 12

-1
.0

-0
.5

0
.0

0
.5

1
.0

• the model says

PCB = e.03+.259age

• comparing model predictions at age
and age + 1 gives

12

PCBage+1

PCBage
=

e.03+.259(age+1)

e.03+.259age
= e.259 = 1.3

so

PCBage+1 = 1.3PCBage

• this is an example of exponential
growth

– where growth increases by a fixed
percentage of the previous total

– linear growth increases by a fixed
amount

– growth of bacteria, compound
interest are both examples of
exponential growth

13