CS计算机代考程序代写 Multiple Linear Regression

Multiple Linear Regression

• is used to relate a continuous response
(or dependent) variable Y to several
explanatory (or independent) (or
predictor) variables X1, X2, . . . , Xk

• assumes a linear relationship between
mean of Y and the X’s with additive
normal errors

• Xij is the value of independent variable
j for subject i.

• Yi is the value of the dependent variable
for subject i, i = 1, 2, . . . , n.

• Statistical model

Yi = β0+β1Xi1+β2Xi2+. . .+βkXik+�i

the additive errors are assumed to be a
random sample from N(0, σ2)

• the mean of Y at X1, . . . , Xk is

β0 + β1Xi1 + β2Xi2 + . . . + βkXik

1

• as before β0 is the intercept, the value of
the mean when all other predictors are
zero

• βj , j = 1, . . . , k, is the partial slope for
predictor Xj , giving the change in the
mean for a unit change in Xj when all
other predictors are held fixed

Types of (Linear) Regression Models

• there are many possible model forms

• choosing the best one is a complicated
process

• the predictors can be continuous
variables, or counts, or indicators

• indicator or “dummy” variables take the
values 0 or 1 and are used to combine
and contrast information across binary
variables, like gender

• some examples are shown below

2

Curve

• Conc = β0 + β1t + β2t
2

time

C
o

n
c

0 10 20 30 40 50 60

0
.0

0
.0

5
0

.1
0

0
.1

5

Conc = 0 + 0.00460*t -0.00004*t^2

One continuous, one binary predictor
Two parallel lines

• Conc = β0 + β1time + β2X, where X =
0 for Males, 1 for Females

Time

C
o

n
c

0 10 20 30 40 50 60

0
.0

0
.1

0
.2

0
.3

Conc=.01 + .0015*t + .2*X

3

Two nonparallel lines

• Conc = β0+β1time+β2X+β3time∗X,
where X = 0 for Males, 1 for Females

Time

C
o

n
c

0 10 20 30 40 50 60

0
.0

0
.1

0
.2

0
.3

0
.4

0
.5

Conc = .01 + .0015*t + .12*X + .0030*t*X

Two continuous predictors
First order

• Conc = β0 + β1time + β2Dose

• effect of dose constant over time

Time

C
o

n
c

0 10 20 30 40 50 60

0
.0

0
.1

0
.2

0
.3

Conc=.01 + .0015*t + 20*dose

Dose = .01

Dose = .10

4

Interaction

• Conc =
β0 +β1time+β2Dose+β3 ∗ time ∗ dose

• effect of dose changes with time

Time

C
o
n
c

0 10 20 30 40 50 60

0
.0

0
.1

0
.2

0
.3

0
.4

0
.5

Conc = .09 + .0015*t + 1.1*dose + .0185*t*dose

D = .01

D = .1

Estimation and ANOVA

• The regression parameters are estimated
using least squares.

• That is, we choose β0, β1, . . . , βk to
minimize

SSE =
n∑

i=1

(yi−β0−β1xi1− . . .−βkxik)
2

5

• Minitab can fit multiple regression
models easily

• we will soon learn a formula for these
estimates using matrices

• the error variance is estimated as before

s
2 =

SSE

n − k − 1
= MSE

• The ANOVA table similar to that for
simple linear regression, with changes to
degrees of freedom to match the number
of predictor variables.

Source d.f. SS MS
Regression k SSR MSR=SSR/k
Residual n-k-1 SSE MSE=SSE/(n-k-1)
Total n-1 SST

• later we will see that SSR can be
partitioned into a part explained by one
set of predictors, SSR(X1) and the

6

remainder, SSR(X2|X1), explained by
the rest of the variables

• the coefficient of determination R2 is

R
2 =

SSR

SST

as before, and is the fraction of the total
variability in y accounted for by the
regression line

• it ranges between 0 and 1

• R2 = 1.00 indicates a perfect (linear) fit

• R2 = 0.00 is a complete lack of linear fit.

7