422. REGRESSIONMETHODSANDMODELBUILDINGPRINCIPLES
Example (2.3.1 revisited). We can use the stepAIC function in the MASS package in R to do a forward and backwards search which stops when the smallest AIC value is found.
> library(MASS)
> stepAIC(lm(Y ~X1+X2+X3+X4+X5+X6+X7+X8+X9, data=house.price) )
[ … ]
DfSumofSq RSS AIC 139.60 52.257 1 20.836 160.43 53.596 1 21.669 161.27 53.720 1 47.409 187.01 57.274 1 156.606 296.20 68.312
Call:
lm(formula = Y ~ X1 + X2 + X5 + X7, data = house.price)
Coefficients:
(Intercept) X1 X2 X5 X7
13.621 2.412 8.459 2.060 -2.215
> summary(lm(formula = Y ~ X1 + X2 + X5 + X7, data = house.price))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.6212
X1 2.4123
X2 8.4589
X5 2.0604
X7 -2.2154
3.6725 3.709 0.001489 **
0.5225 4.617 0.000188 ***
3.3300 2.540 0.019970 *
1.2235 1.684 0.108541
1.2901 -1.717 0.102176
Example (2.5.5 revisited). If we look again at the simulated exercise we can see how well the AIC based search works.
> stepAIC(lm(y~-1+x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+
x11+x12+x13+x14+x15+x16+x17+x18+x19+x20 ))
[ … ]
lm(formula = y ~ x1 + x2 + x3 + x4)
Coefficients:
x1 x2 x3 x4 x5
0.9878 0.6347 0.4192 0.3909 0.2943
It picks out the p = 5 model that Fig. 2.7 shows has the good prediction error, but of course unlike that figure it has not used any knowledge of the true model to do it.