程序代写代做代考 data mining Bayesian algorithm chain Modeling Issues in Linear Regression

Modeling Issues in Linear Regression

Modeling Issues in Linear Regression
Dr. Randall R. Rojas

Note: To access your textbook resources type the following on the console:

#library(car)
#carWeb()

1 Residuals and Influence Measures

When diagnosing a regression fit, we have to carefully assess the impact that unusual observations may have
the fit. Typically, we characterize Unusual observations in terms of outliers, leverage, and influence. Next
we discuss each one in detail.

A. Outliers
They are easy to find visually because their y value is relatively large (conditioned on the predictor(s)),
given the distribution of other y values in the vicinity.

*
* *

* * *
*
*
*

*
*

**
** *

**
*

*
*

*

*

**

*

*
*

*
*

**

*
**

0 5 10 15 20 25

0
5

0
1

0
0

1
5

0
2

0
0

Speed

D
is

ta
n

c
e

Outliers

B. Leverage
In this case we are looking at observations that have an extreme x (i.e., predictor) value relative to the
mean.

1

*

*
*

*
*
*

*

*

*

*

*

*
*
*
* *

**

*

*

*

*

*

*
*

*

*

*

*

*

*

0 5 10 15 20 25 30

0
2

0
4

0
6

0
8

0

Speed

D
is

ta
n

c
e

High Leverage

C. Influence
An influential observation can be considered as an outlier with high leverage. Therefore, they can alter
the estimates of the regression coefficients. In this case, when we remove them, the regression fit may
become more stable.

*
* *

* * *
*
*
*

*
*
**
** *
**
*

*
*

*

*

**

*

*
*
*
*

*

0 5 10 15 20 25 30

0
5

0
1

0
0

1
5

0
2

0
0

Speed

D
is

ta
n

c
e

Influential

There are several metrics we can use to better classify unusual observations, however, since these tools
depend on analyzing the residuals, we will first look at understanding residuals and then we can
discuss the related metrics.

1.1 Residual Plots

The traditional residuals are given by ei = yi − ŷi, i = 1, . . . , N. They are a good diagnostic of how well the
model fit the data if when plotted vs. e.g., a predictor, they are randomly distributed about y = 0 and have a
constant variance. In more general terms, we can propose a functional form for the residuals such as

var(ei) = σ
2(1− hi)

2

where hi is called the leverage or hat-value is bounded between 0 and 1. Large values of hi (a.k.a high leverage)
are associated with unusual values of xi. If hi is close to zero, we then obtain the classic linear regression SR1
condition of constant variance.

For the case of SLR, i.e., only one predictor, hi is given by:

hi =
(xi − x)2

∑Ni=1(xi − x)2
+

1
N

From the formula we can interpret hi (leverage) as the proportion of the total sum of squares of the
explanatory variable contributed by the iˆ{th} case.

# Example 1: Food expenditure vs. income (non-constant variance)

library(car) #for hccm() robust standard errors

##
## Attaching package: ‘car’

## The following object is masked from ‘package:purrr’:
##
## some

## The following object is masked from ‘package:dplyr’:
##
## recode

library(PoEdata) #for PoE4 datasets

data(“food”,package=”PoEdata”)
mreg.mod <- lm(food_exp~income, data=food) plot(food$income,food$food_exp, pch=20, xlab="Income", ylab="Food Expenditure") abline(mreg.mod,lwd=2,col="blue") 5 10 15 20 25 30 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 Income F o o d E xp e n d itu re 3 plot(food$income, mreg.mod$residuals, pch=20,ylab="Resdiduals", xlab="Income") abline(h=0,col="red", lwd=2) 5 10 15 20 25 30 − 2 0 0 − 1 0 0 0 1 0 0 2 0 0 Income R e sd id u a ls # From the residuals plot we can see that the variance is not constant # Example 2: US Investment Data ## Chapter 3 in Greene (2003) ## transform (and round) data to match Table 3.1 data("USInvest", package = "AER") us <- as.data.frame(USInvest) #us$invest <- round(0.1 * us$invest/us$price, digits = 3) #us$gnp <- round(0.1 * us$gnp/us$price, digits = 3) plot(us$gnp, us$invest , ylab="Investment", xlab="GNP",pch=20) mreg.mod=lm(us$invest ~ us$gnp, data = us) abline(mreg.mod,lwd=2,col="blue") 4 1000 1500 2000 2500 3000 1 5 0 2 5 0 3 5 0 4 5 0 GNP In ve st m e n t plot(us$gnp, mreg.mod$residuals, pch=20,ylab="Resdiduals", xlab="GNP",ylim=c(-45,45)) abline(h=0,col="red",lwd=2) 1000 1500 2000 2500 3000 − 4 0 − 2 0 0 2 0 4 0 GNP R e sd id u a ls # These residuals also show some hint of non-constant variance. There are transformations that can be applied to the residuals to make the variance more constant. However, because they are applied to the residuals, the underlying regression model still needs to be reevaluated. For example, we can consider the Standardized Residuals as given by: eSi = ei σ̂2 √ 1− hi , # Example 3: Standardized Residuals 5 library(car) data("USInvest", package = "AER") us <- as.data.frame(USInvest) mreg.mod=lm(us$invest ~ us$gnp, data = us) library(car) residualPlots(mreg.mod,type = "rstandard") 1000 2000 3000 − 2 − 1 0 1 us$gnp R st a n d a rd r e si d u a ls 150 250 350 450 − 2 − 1 0 1 Fitted values R st a n d a rd r e si d u a ls ## Test stat Pr(>|Test stat|)
## us$gnp -1.8139 0.09477 .
## Tukey test -1.8139 0.06970 .
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

or the Studentized Residuals:

eTi =
ei

σ̂(−i)

1− hi
.

# Example 4: Studentized Residuals
library(car)
data(“USInvest”, package = “AER”)
us <- as.data.frame(USInvest) mreg.mod=lm(us$invest ~ us$gnp, data = us) residualPlots(mreg.mod,type = "rstudent") 6 1000 2000 3000 − 3 − 2 − 1 0 1 2 us$gnp R st u d e n t re si d u a ls 150 250 350 450 − 3 − 2 − 1 0 1 2 Fitted values R st u d e n t re si d u a ls ## Test stat Pr(>|Test stat|)
## us$gnp -1.8139 0.09477 .
## Tukey test -1.8139 0.06970 .
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

Another alternative is to compute the Pearson Residuals which are obtained from the Weighted Least Squares
(WLS) Regression, where the weights are equal to wi and the residuals can be computed from:

ePi =

wiei

. As can be seen, when wi = 1, we recover the standard residuals.

# Example 5: Pearson Residuals
library(car)
data(“USInvest”, package = “AER”)
us <- as.data.frame(USInvest) mreg.mod=lm(us$invest ~ us$gnp, data = us) library(car) residualPlots(mreg.mod) 7 1000 2000 3000 − 6 0 − 4 0 − 2 0 0 2 0 4 0 us$gnp P e a rs o n r e si d u a ls 150 250 350 450 − 6 0 − 4 0 − 2 0 0 2 0 4 0 Fitted values P e a rs o n r e si d u a ls ## Test stat Pr(>|Test stat|)
## us$gnp -1.8139 0.09477 .
## Tukey test -1.8139 0.06970 .
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

# Example 6: All Residuals
#We can also graph all the relevant residual plots
prestige.mod.2 <- lm(prestige ~ education + income + type, data=Prestige) residualPlots(prestige.mod.2) 8 6 8 10 12 14 16 − 1 5 0 1 0 education P e a rs o n r e si d u a ls 5000 15000 25000 − 1 5 0 1 0 income P e a rs o n r e si d u a ls bc prof wc − 1 5 0 1 0 type P e a rs o n r e si d u a ls 30 40 50 60 70 80 90 − 1 5 0 1 0 Fitted values P e a rs o n r e si d u a ls ## Test stat Pr(>|Test stat|)
## education -0.6836 0.495942
## income -2.8865 0.004854 **
## type
## Tukey test -2.6104 0.009043 **
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

1.2 Identifying and Classifying Unusual Observations

Unusual observations can be somewhat difficult to quantify through visual inspection but fortunately there
are several robust metrics that can be used in a more objective fashion.

A. Q-Q Plot
This plot shows the studentized residuals vs. the corresponding quantiles of a t-distribution with
N − K − 2 degrees of freedom, and allows us to identify outliers (std. residuals & 2 are considered
large).

# Example 6: Q-Q Plot
library(car)
mreg.mod <- lm(prestige ~ income + education, data=Duncan) qqPlot(mreg.mod, id=list(n=3)) 9 −2 −1 0 1 2 − 2 − 1 0 1 2 3 t Quantiles S tu d e n tiz e d R e si d u a ls (m re g .m o d ) minister reporter contractor ## minister reporter contractor ## 6 9 17 # It looks like minister might be a potential outlier. To be sure, # we can perform an outlier test known as the Bonferroni-corrcted t-test: outlierTest(mreg.mod) ## No Studentized residuals with Bonferonni p < 0.05 ## Largest |rstudent|: ## rstudent unadjusted p-value Bonferonni p ## minister 3.134519 0.0031772 0.14297 # The large p-value (Bonferroni) suggests that this value is not surpsing. B. Cook’s Distance Plot Cook’s Distance is defined as: Di = e2Si k + 1 × hi 1− hi where large values of Di (larger than 1 and/or larger relative to the majority of the values) are typically associated with influential observations. # Example 7: Cook's Distance Plot library(car) mreg.mod <- lm(prestige ~ income + education, data=Duncan) influenceIndexPlot(mreg.mod, id=list(n=3),vars="Cook") 10 0 .0 0 .2 0 .4 C o o k' s d is ta n ce 0 10 20 30 40 minister conductor reporter Diagnostic Plots Index # It looks like minister, conductor and reporter are potential outliers # Note: if you don't specify the flag "vars", it will plot all of them: influenceIndexPlot(mreg.mod, id=list(n=3)) 0 .0 0 .3 C o o k' s d is ta n ce minister conductor reporter − 2 0 2 S tu d e n tiz e d r e si d u a ls minister reporter contractor 0 .2 0 .6 1 .0 B o n fe rr o n i p − va lu e minister reporteraccountant 0 .0 5 0 .2 0 h a t− va lu e s 0 10 20 30 40 RR.engineer conductorminister Diagnostic Plots Index C. Hat-Values Plot This plot is designed for identifying high leverage observations (some of which could also be influential). On average hi = p/N, therefore, hi & 2p/N is considered large. # Example 8: Hat-Values Plot library(car) 11 mreg.mod <- lm(prestige ~ income + education, data=Duncan) influenceIndexPlot(mreg.mod, id=list(n=3),vars="hat") 0 .0 5 0 .1 5 0 .2 5 h a t− va lu e s 0 10 20 30 40 RR.engineer conductor minister Diagnostic Plots Index # It looks like minister, conductor and RR.engineer are influential # We can check if e.g., minister is influential by comparing the # coefficient estimates w/ and w/o 'minister' mod.duncan <- lm(prestige ~ income + education, data=Duncan) mod.duncan.2 <- update(mod.duncan, subset= rownames(Duncan) != "minister") compareCoefs(mod.duncan, mod.duncan.2) ## Calls: ## 1: lm(formula = prestige ~ income + education, data = Duncan) ## 2: lm(formula = prestige ~ income + education, data = Duncan, subset = ## rownames(Duncan) != "minister") ## ## Model 1 Model 2 ## (Intercept) -6.06 -6.63 ## SE 4.27 3.89 ## ## income 0.599 0.732 ## SE 0.120 0.117 ## ## education 0.5458 0.4330 ## SE 0.0983 0.0963 ## # We can also depict the hat-values via: influencePlot(mreg.mod, id=list(n=3)) 12 0.05 0.10 0.15 0.20 0.25 − 2 − 1 0 1 2 3 Hat−Values S tu d e n tiz e d R e si d u a ls minister reporter conductor contractor RR.engineer StudRes Hat CookD minister 3.1345186 0.1730582 0.5663797 reporter -2.3970224 0.0543936 0.0989846 conductor -1.7040324 0.1945416 0.2236412 contractor 2.0438046 0.0432552 0.0585235 RR.engineer 0.8089221 0.2690896 0.0809681 D. Added Variable Plot (Partial Regression) Depicts the relationship between y and one predictor x, adjusting for the effects of the other predictors. These can be used to identify influential observations and high leverage observations. For example, high leverage observations appear the in added variable plots as points horizontally distant from the rest of the data. They are also useful for identifying jointly influential points. An alternative method is to look at the individual coefficient differences between estimates from the original data (bj) and those where observation i is removed (b(−i)j) . d f βij = b(−i)j − bj for j = 0, . . . , k # Example 9: Added-Variable Plot library(car) mreg.mod <- lm(prestige ~ income + education, data=Duncan) avPlots(mreg.mod, id=list(n=3, method="mahal")) 13 −40 0 20 40 − 3 0 − 1 0 0 1 0 2 0 3 0 4 0 income | others p re st ig e | o th e rs minister RR.engineer conductor −60 −20 0 20 40 − 4 0 − 2 0 0 2 0 4 0 6 0 education | others p re st ig e | o th e rs minister RR.engineer conductor Added−Variable Plots # It looks like minister, conductor and RR.engineer are influential # minister & conductor --> decrease the income slope
# minister & conductor –> increase the education slope
# RR.engineer seems fine for the most part

# We can check if e.g., minister & conductor are jointly influential
# by comparing the coefficient estimates w/ and w/o ‘minister & conductor’
mod.duncan <- lm(prestige ~ income + education, data=Duncan) mod.duncan.3 <- update(mod.duncan, subset = - whichNames(c("minister", "conductor"), Duncan)) compareCoefs(mod.duncan, mod.duncan.2, mod.duncan.3, se=FALSE) ## Calls: ## 1: lm(formula = prestige ~ income + education, data = Duncan) ## 2: lm(formula = prestige ~ income + education, data = Duncan, subset = ## rownames(Duncan) != "minister") ## 3: lm(formula = prestige ~ income + education, data = Duncan, subset = ## -whichNames(c("minister", "conductor"), Duncan)) ## ## Model 1 Model 2 Model 3 ## (Intercept) -6.06 -6.63 -6.41 ## income 0.599 0.732 0.867 ## education 0.546 0.433 0.332 # Based on the comparison is does appear that these points were # jointly influential. # We can also depict the hat-values via: influencePlot(mreg.mod, id=list(n=3)) 14 0.05 0.10 0.15 0.20 0.25 − 2 − 1 0 1 2 3 Hat−Values S tu d e n tiz e d R e si d u a ls minister reporter conductor contractor RR.engineer StudRes Hat CookD minister 3.1345186 0.1730582 0.5663797 reporter -2.3970224 0.0543936 0.0989846 conductor -1.7040324 0.1945416 0.2236412 contractor 2.0438046 0.0432552 0.0585235 RR.engineer 0.8089221 0.2690896 0.0809681 # Lastly, we can compute the df_betas and cehck the individual differences mreg.mod <- lm(prestige ~ income + education, data=Duncan) dfbs <- dfbetas(mreg.mod) # We can plot the influence on the income coefficient against the # influence on the education coefficient plot(dfbs[ ,c("income", "education")]) # for b1 and b2 showLabels(dfbs[ , "income"], dfbs[ , "education"], labels=rownames(Duncan), method=c(6, 16, 27)) 15 −1.0 −0.5 0.0 0.5 0 .0 0 .5 1 .0 income e d u ca tio n minister conductor RR.engineer ## minister conductor RR.engineer ## 6 16 27 # Notice that once again, minister and conductor stand out # We can alos plot them all via: avPlots(lm(prestige~income+education+type, data=Duncan)) −40 −20 0 20 40 60 − 2 0 0 2 0 4 0 income | others p re st ig e | o th e rs minister machinist RR.engineer minister −30 −20 −10 0 10 20 − 2 0 0 2 0 education | others p re st ig e | o th e rs minister machinist contractor store.manager −0.4 −0.2 0.0 0.2 0.4 0.6 − 1 0 1 0 3 0 typeprof | others p re st ig e | o th e rs minister machinist store.manager contractor −0.4 −0.2 0.0 0.2 0.4 0.6 − 2 0 0 2 0 typewc | others p re st ig e | o th e rs minister machinist conductorstore.clerk Added−Variable Plots 16 # Example 10: Other Plots library(car) mreg.mod <- lm(prestige ~ income + education + type, data=Prestige) # Marginal Plots marginalModelPlots(mreg.mod, sd=TRUE) ## Warning in mmps(...): Interactions and/or factors skipped 5000 15000 25000 2 0 4 0 6 0 8 0 income p re st ig e Data Model 6 8 10 12 14 16 2 0 4 0 6 0 8 0 education p re st ig e Data Model 30 40 50 60 70 80 90 2 0 4 0 6 0 8 0 Fitted values p re st ig e Data Model Marginal Model Plots # Marginal & Added-Variable Plots mcPlots(mreg.mod, ~ education, overlaid=FALSE) 17 −4 −2 0 2 4 − 3 0 − 1 0 0 1 0 2 0 3 0 4 0 Marginal plot of education Centered education C e n te re d p re st ig e −4 −2 0 2 4 − 3 0 − 1 0 0 1 0 2 0 3 0 4 0 Added−Variable plot of education education | others p re st ig e | o th e rs mcPlots(mreg.mod, ~ income, overlaid=FALSE) −5000 5000 15000 − 3 0 − 1 0 0 1 0 2 0 3 0 4 0 Marginal plot of income Centered income C e n te re d p re st ig e −5000 5000 15000 − 3 0 − 1 0 0 1 0 2 0 3 0 4 0 Added−Variable plot of income income | others p re st ig e | o th e rs 18 2 Predictor Transformations Suppose we have a linear model y = β1 + β2x2 + β3x3 + e, but are interested in the relationship between y and x3 only. Therefore, we would like to plot y vs. x3 but with the effect of x2 removed. This is equivalent to plotting y− β1 + β2x2 vs. x3. We can accomplish this with a Component-Plus-Residual Plot. Mathematically, we are really doing an approximation by estimating a partial residual for x3. Algorithmically, here is how we can estimate it: i. Obtain the coefficient estimates βk, for k = 1, 2, 3 from E[y|x] = β1 + β2x2 + β3x3 and respective residuals, e ii. Estimate the partial residual (epartial) according to: epartial = y− β̂1 + β̂2x2 iii. The Component-Plus-Residual is given by: epartial = β̂2x2 + e In general, we can express the partial residual as epartial,ij = ei + bjxij. This plot is often used to help detect the need for transformations of the predictor variables by looking for any departures from linearity in the plots of epartial,ij vs. xij. If we identify any nonlinear dynamics, we can then apply the techniques from Lecture 1 to the respective predictor variables. # Example 11: Component-Plus-Residual Plot library(car) mreg.mod <- lm(prestige ~ income + education + women, data=Prestige) crPlots(mreg.mod, order=2) 0 5000 15000 25000 − 2 0 0 2 0 income C o m p o n e n t+ R e si d u a l( p re st ig e ) 6 8 10 12 14 16 − 2 0 0 2 0 education C o m p o n e n t+ R e si d u a l( p re st ig e ) 0 20 40 60 80 100 − 2 0 0 1 0 women C o m p o n e n t+ R e si d u a l( p re st ig e ) Component + Residual Plots # The dashed blue line is a partial fit, and the magenta curve a lowess smoother. Of the # 3 predicators, income seems have the largest degree of curvature. Therefore, we might # consider transforming it using, e.g., the bulging rule. 19 # We can try applying a log tranformation to income mreg.mod2 <- update(mreg.mod, . ~ . + log2(income) - income) crPlots(mreg.mod2, order=2) 6 8 10 12 14 16 − 2 0 0 2 0 education C o m p o n e n t+ R e si d u a l( p re st ig e ) 0 20 40 60 80 100 − 1 5 0 1 0 women C o m p o n e n t+ R e si d u a l( p re st ig e ) 10 11 12 13 14 − 2 0 0 2 0 log2(income) C o m p o n e n t+ R e si d u a l( p re st ig e ) Component + Residual Plots # It looks better, but maybe we should also apply a quadratic transformation to women mreg.mod3 <- update(mreg.mod2, . ~ . - women + poly(women, 2)) crPlots(mreg.mod3, order=1) 20 6 8 10 12 14 16 − 2 0 0 2 0 education C o m p o n e n t+ R e si d u a l( p re st ig e ) 10 11 12 13 14 − 2 0 0 2 0 log2(income) C o m p o n e n t+ R e si d u a l( p re st ig e ) −2 0 2 4 6 − 1 5 0 1 0 poly(women, 2) C o m p o n e n t+ R e si d u a l( p re st ig e ) Component + Residual Plots # Note: When there are strong non-linear relations between predictors, the # component-plus-residual plots may not work, in this case, we instead look # at CERES Plots (Combining conditional Expectations and REsiduals) ceresPlots(lm(prestige~income+education+type, data=Prestige)) ## Warning in ceresPlots(lm(prestige ~ income + education + type, data = ## Prestige)): Factors skipped in drawing CERES plots. 21 5000 15000 25000 − 1 0 0 1 0 2 0 income C E R E S R e si d u a l( p re st ig e ) 6 8 10 12 14 16 − 3 0 − 2 0 − 1 0 0 1 0 2 0 education C E R E S R e si d u a l( p re st ig e ) CERES Plots A more general method for dealing with nonlinear predictors is to a apply the Box-Tidwell method for transforming predictors. Their proposed model assumes the form: y = β0 + β1TBBC(x1, γ1) + · · ·+ βkTBC(xk, γk) + ε, where TBC is the Box-Cox power transformation introduced in Lecture 1. # Example 12: Box-Tidwell Transformation library(car) # In R we can use the the functon boxTidwell to suggest the # appropriate transformation(s) via MLE, while also allowing # for user specifed transformations # For example, let's apply a quadratic tranformation to women # and see what it recommmends for income and education: boxTidwell(prestige ~ income + education, other.x = ~poly(women,2),data=Prestige) ## MLE of lambda Score Statistic (z) Pr(>|z|)
## income -0.037775 -5.3013 1.15e-07 ***
## education 2.192827 2.4056 0.01615 *
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## iterations = 12

# According to the output: lambda = -0.04 for income–> log
# According to the output: lambda = 2.19 for education–> quadratic

22

# To visualize these results, we can plot the “Constructed-Variable Plots”
# ‘construction’ in this context means transformation

3 Omitted Variable Bias

The Omitted Variable Bias quantifies the impact of omitting a relevant (e.g., statistically and/or economically
significant) variable in a regression model. For example, suppose the model true underlying population
model is y = β1 + β2×2 + β3×3 + e where both x2 and x3 should be included, but instead we only estimated
the model y = β1 + β2×2 + e. How can we quantify the effect that omitting x3 would have on the model?
We can show that mathematically it corresponds to a bias, namely:

bias(b∗2) = E(b2)
∗ − β2 = β3

̂cov(x2, x3)
var(x2)

The example below shows how omitting a relevant variable such as the wife’s years of education contribution
to annual family income, leads to a ∼ $2000 bias.

# Example 12: Omitted Variable Bias
library(car)
library(AER)
library(broom)
data(“edu_inc”, package=”PoEdata”)
mreg.mod1 <- lm(faminc~he+we, data=edu_inc) mreg.mod2 <- lm(faminc~he, data=edu_inc) # Correct Model summary(mreg.mod1) ## ## Call: ## lm(formula = faminc ~ he + we, data = edu_inc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -88299 -23663 -6800 17244 232085 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5533.6 11229.5 -0.493 0.622426
## he 3131.5 802.9 3.900 0.000112 ***
## we 4522.6 1066.3 4.241 2.73e-05 ***
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 40500 on 425 degrees of freedom
## Multiple R-squared: 0.1613, Adjusted R-squared: 0.1574
## F-statistic: 40.87 on 2 and 425 DF, p-value: < 2.2e-16 #Incorrect Model summary(mreg.mod2) ## ## Call: ## lm(formula = faminc ~ he, data = edu_inc) ## 23 ## Residuals: ## Min 1Q Median 3Q Max ## -78985 -26196 -6924 18372 250934 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26191.3 8541.1 3.066 0.0023 **
## he 5155.5 658.5 7.830 3.92e-14 ***
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 41300 on 426 degrees of freedom
## Multiple R-squared: 0.1258, Adjusted R-squared: 0.1237
## F-statistic: 61.3 on 1 and 426 DF, p-value: 3.921e-14

# We can see that the effect of omitting wedu translates to an
# ovesstatement of ~$2000 which would correspond to the bias in this case.

# Q: What if we include kl6 ( = number of chidren under the age of 6)?
mreg.mod3 <- lm(faminc~he+we+kl6, data=edu_inc) summary(mreg.mod3) ## ## Call: ## lm(formula = faminc ~ he + we + kl6, data = edu_inc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -91760 -23745 -6642 17636 243255 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7755.3 11162.9 -0.695 0.48760
## he 3211.5 796.7 4.031 6.58e-05 ***
## we 4776.9 1061.2 4.502 8.73e-06 ***
## kl6 -14310.9 5003.9 -2.860 0.00445 **
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 40160 on 424 degrees of freedom
## Multiple R-squared: 0.1772, Adjusted R-squared: 0.1714
## F-statistic: 30.43 on 3 and 424 DF, p-value: < 2.2e-16 # A: In this case, the coefficients of hedu and wedu did not change by much, # yet kl6 is both statistically and economically significant. # This suggests that the estimates of wedu and hedu are robust. 4 Irrelevant Variables Typically, including irrelevant variables into the model has the effect of increasing the variance of the other variables, and therefore, potentially rendering the respective coefficient estimates statistically insignificant. # Example 13: Irrelevant Variables library(car) 24 library(AER) library(broom) library(PoEdata) data("edu_inc", package="PoEdata") mreg.mod1 <- lm(faminc~he+we+kl6, data=edu_inc) # The variables xtra_x5 and _x6 are contain random values that are not associated with $y$ # We can would expect that the variances of th relevant variable will increase as we introduce # these irrlevant variables. mreg.mod2 <- lm(faminc~he+we+kl6+xtra_x5, data=edu_inc) mreg.mod3 <- lm(faminc~he+we+kl6+xtra_x5+xtra_x6, data=edu_inc) #summary(mreg.mod1) #summary(mreg.mod2) #summary(mreg.mod3) #stargazer(mreg.mod1, mreg.mod2, mreg.mod3) Below we summarize the results from the 3 models: Table 1: Dependent variable: faminc (1) (2) (3) he 3,211.526∗∗∗ 3,377.219∗∗∗ 3,339.792∗∗∗ (796.703) (1,247.058) (1,250.039) we 4,776.907∗∗∗ 4,783.930∗∗∗ 5,868.677∗∗ (1,061.164) (1,063.156) (2,278.067) kl6 −14,310.920∗∗∗ −14,216.440∗∗∗ −14,200.180∗∗∗ (5,003.928) (5,039.395) (5,043.720) xtra_x5 −180.162 888.843 (1,042.335) (2,242.491) xtra_x6 −1,067.186 (1,981.685) Constant −7,755.330 −7,682.601 −7,558.613 (11,162.930) (11,183.650) (11,195.410) Observations 428 428 428 R2 0.177 0.177 0.178 Adjusted R2 0.171 0.169 0.168 Residual Std. Error 40,160.080 (df = 424) 40,206.100 (df = 423) 40,239.890 (df = 422) F Statistic 30.432∗∗∗ (df = 3; 424) 22.779∗∗∗ (df = 4; 423) 18.251∗∗∗ (df = 5; 422) Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01 25 5 Multicollinearity Multicollinearity occurs when there are strong linear relationships between the predictors. For example, consider the model y = β1 + β2x2 + β3x3 + ε, where x3 = α1 + α2x2. In this case, Ordinary Least Squares cannot generate estimates of the regression coefficients because the predictor x2 and x3 move together. In this case (which is considered Perfect Multicollinearity) the solution would be to drop of the two predictors from the model. In general, Imperfect Multicollinearity is more common. This occurs when e.g., using the previous case, x3 = α1 + α2x2 + u, where u represents some unknown random component. The effects of Multicollinearity can be summarized as follows: 1. The variances and standard errors of the regression coefficient estimates will increase. This leads to lower t−statistics. 2. The overall fit of the regression equation will be largely unaffected by multicollinearity. Therefore, forecasting and prediction will be largely unaffected. 3. Regression coefficients can change substantially when variables are added or dropped. 5.1 Detecting Multicollinearity There are several ways in which we can detect multicollinearity. Below are the 3 most common ones: 1. Large Correlation Coefficients High pairwise correlations among independent variables is a strong sign of multicollinearity. Usually |rxi ,xj | & 0.8 is considered large. 2. Large R2 with Low t−Stats It is possible for individual coefficients to be insignificant but for the overall fit of the equation to be high. 3. Variance Inflation Factor (VIF) VIF is a measure of how much the variance of the estimated regression coefficient bk is inflated by the existence of correlations among the predictor variables in the model. Consider for example v̂ar(bj) as given by: v̂ar(bj) = σ̂2 (n− 1)s2j × 1 1− R2j . The term 1/(1− R2j ) is called the *VIF*. Values of VIF= 1 suggest no correlation whereas VIF& 10 (and in some cases & 4) is considered large, and thus, we might consider removing those variables with large VIFs from the model. The R2j are obtained from the *Auxiliary Regressions*, i.e., xj regressed on the other predcitors. An important limitation to of VIF is that it does not apply to nonlinear regressors or contrasts. For these cases we would instead use the Generalized VIF (GVIF) which is given by GVIF1/2p, where p represents the number of regressors in a term. # Example 13: Multicollinearity (Fuel Efficiency) library(car) data("cars", package="PoEdata") mreg.mod=lm(mpg~cyl+eng+wgt, data=cars) summary(mreg.mod) 26 ## ## Call: ## lm(formula = mpg ~ cyl + eng + wgt, data = cars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -12.5568 -2.8703 -0.3649 2.2708 16.4338 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.3709616 1.4806851 29.967 < 2e-16 *** ## cyl -0.2677968 0.4130673 -0.648 0.517 ## eng -0.0126740 0.0082501 -1.536 0.125 ## wgt -0.0057079 0.0007139 -7.995 1.5e-14 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.297 on 388 degrees of freedom ## Multiple R-squared: 0.6993, Adjusted R-squared: 0.697 ## F-statistic: 300.8 on 3 and 388 DF, p-value: < 2.2e-16 tidy(vif(mreg.mod)) ## Warning: 'tidy.numeric' is deprecated. ## See help("Deprecated") names x cyl 10.515508 eng 15.786455 wgt 7.788716 # We can see that both cyl and eng should be removed since VIF>10

# Example 14: Multicollinearity (Median household value in Boston)
data(“Boston”, package = “MASS”)
mreg.mod <- lm(medv ~., data = Boston) summary(mreg.mod) ## ## Call: ## lm(formula = medv ~ ., data = Boston) ## ## Residuals: ## Min 1Q Median 3Q Max ## -15.595 -2.730 -0.518 1.777 26.199 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.646e+01 5.103e+00 7.144 3.28e-12 ***
## crim -1.080e-01 3.286e-02 -3.287 0.001087 **
## zn 4.642e-02 1.373e-02 3.382 0.000778 ***
## indus 2.056e-02 6.150e-02 0.334 0.738288
## chas 2.687e+00 8.616e-01 3.118 0.001925 **
## nox -1.777e+01 3.820e+00 -4.651 4.25e-06 ***
## rm 3.810e+00 4.179e-01 9.116 < 2e-16 *** ## age 6.922e-04 1.321e-02 0.052 0.958229 27 ## dis -1.476e+00 1.995e-01 -7.398 6.01e-13 *** ## rad 3.060e-01 6.635e-02 4.613 5.07e-06 *** ## tax -1.233e-02 3.760e-03 -3.280 0.001112 ** ## ptratio -9.527e-01 1.308e-01 -7.283 1.31e-12 *** ## black 9.312e-03 2.686e-03 3.467 0.000573 *** ## lstat -5.248e-01 5.072e-02 -10.347 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.745 on 492 degrees of freedom ## Multiple R-squared: 0.7406, Adjusted R-squared: 0.7338 ## F-statistic: 108.1 on 13 and 492 DF, p-value: < 2.2e-16 tidy(vif(mreg.mod)) ## Warning: 'tidy.numeric' is deprecated. ## See help("Deprecated") names x crim 1.792191 zn 2.298758 indus 3.991596 chas 1.073995 nox 4.393720 rm 1.933744 age 3.100826 dis 3.955945 rad 7.484496 tax 9.008554 ptratio 1.799084 black 1.348521 lstat 2.941491 # We can see that nox, rad, and tax might be worth removing since VIF>4
# Also note that idus, and dis are very close to 4, so we might want to
# consider removing them as well.

# Example 15: Multicollinearity (US Census Data) -see Ericksen et. al., 1989
mreg.mod <- lm(undercount ~ ., data=Ericksen) summary(mreg.mod) ## ## Call: ## lm(formula = undercount ~ ., data = Ericksen) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.8356 -0.8033 -0.0553 0.7050 4.2467 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.611411 1.720843 -0.355 0.723678
## minority 0.079834 0.022609 3.531 0.000827 ***
## crime 0.030117 0.012998 2.317 0.024115 *
## poverty -0.178369 0.084916 -2.101 0.040117 *

28

## language 0.215125 0.092209 2.333 0.023200 *
## highschool 0.061290 0.044775 1.369 0.176415
## housing -0.034957 0.024630 -1.419 0.161259
## citystate -1.159982 0.770644 -1.505 0.137791
## conventional 0.036989 0.009253 3.997 0.000186 ***
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 1.426 on 57 degrees of freedom
## Multiple R-squared: 0.7077, Adjusted R-squared: 0.6667
## F-statistic: 17.25 on 8 and 57 DF, p-value: 1.044e-12

tidy(vif(mreg.mod))

## Warning: ‘tidy.numeric’ is deprecated.
## See help(“Deprecated”)

names x
minority 5.009065
crime 3.343586
poverty 4.625178
language 1.635568
highschool 4.619169
housing 1.871745
city 3.537750
conventional 1.691320

# We can see that minority, poverty, and highschool might be worth removing since VIF>4

6 Model Misspecification: Ramsey RESET

As we have seen previously, sometimes adding nonlinear expressions for the predictors can help improve
our model in terms of economic and statistical significance. A popular test to evaluate is such nonlinear
terms are needed, is known as the REgression Specification Error Test (RESET). For example, assume we
have the model y = β1 + β2×2 + β3×3 + ε and we are interested in determining if we can improve the
model by including higher order terms of the predictors. We can easily assess this by estimating the model
y = β1 + β2×2 + β3×3 + γŷ2 + ε, and testing the null hypothesis H0 : γ = 0 against H0 : γ 6= 0. If we reject
H0, then we should consider including e.g., quadratic terms. We can then perform the same test but instead
including the term δŷ3, and so on, until we fail to reject H0. In practice, however, we should not have to
include high order terms beyond 2 or 3.

# Example 16: Functional Form (RESET)
library(car)
library(PoEdata)
library(lmtest)

# We will create to models: A quadratic one and a linear one, and test each for model misspecification
x <- c(1:30) # Model 1: Quadratic y1 <- 1 + x + x^2 + rnorm(30) # Test if a quadratic model is appropriate resettest(y1 ~ x , power=2, type="regressor") 29 ## ## RESET test ## ## data: y1 ~ x ## RESET = 137780, df1 = 1, df2 = 27, p-value < 2.2e-16 # According to the test, we can improve the model # Test if a quadratic model is appropriate y2 <- 1 + x + rnorm(30) resettest(y2 ~ x , power=2, type="regressor") ## ## RESET test ## ## data: y2 ~ x ## RESET = 0.36652, df1 = 1, df2 = 27, p-value = 0.55 # According to the test, our model is well specified, no need for adding a quadratic term # Example 17: Family Income (RESET) data("edu_inc", package="PoEdata") mreg.mod <- lm(faminc~he+we+kl6, data=edu_inc) resettest(mreg.mod , power=2, type="regressor") ## ## RESET test ## ## data: mreg.mod ## RESET = 2.6046, df1 = 3, df2 = 421, p-value = 0.05144 # According to the test, our model is fine 7 Model Selection We now discuss how to select the best model among competing ones using robust statistical metrics. 7.1 Akaike Information Criterion (AIC) AIC estimates the quality of the models being considered for the data relative to each other. The smaller the value of AIC, the better the model. AIC = ln ( SSE N ) + 2K N # Example 18: AIC (Family Income) library(car) library(AER) library(broom) library(PoEdata) data("edu_inc", package="PoEdata") mreg.mod1 <- lm(faminc~he, data=edu_inc) mreg.mod2 <- lm(faminc~he+we, data=edu_inc) mreg.mod3 <- lm(faminc~he+we+kl6, data=edu_inc) 30 mreg.mod4 <- lm(faminc~he+we+kl6+xtra_x5, data=edu_inc) mreg.mod5 <- lm(faminc~he+we+kl6+xtra_x5+xtra_x6, data=edu_inc) AIC(mreg.mod1, mreg.mod2, mreg.mod3, mreg.mod4, mreg.mod5) df AIC mreg.mod1 3 10316.65 mreg.mod2 4 10300.91 mreg.mod3 5 10294.73 mreg.mod4 6 10296.70 mreg.mod5 7 10298.41 # As expected, Model 3 (faminc~he+we+kl6) is the preferred choice 7.2 Bayesian Information Criterion (BIC) Similarly to AIC, BIC does the same bu penalizes extra variables more heavily than AIC. BIC = ln ( SSE N ) + K ln(N) N As with AIC, the model with the smallest BIC is preferred. # Example 18: BIC (Family Income) library(car) library(AER) library(broom) library(PoEdata) data("edu_inc", package="PoEdata") mreg.mod1 <- lm(faminc~he, data=edu_inc) mreg.mod2 <- lm(faminc~he+we, data=edu_inc) mreg.mod3 <- lm(faminc~he+we+kl6, data=edu_inc) mreg.mod4 <- lm(faminc~he+we+kl6+xtra_x5, data=edu_inc) mreg.mod5 <- lm(faminc~he+we+kl6+xtra_x5+xtra_x6, data=edu_inc) BIC(mreg.mod1, mreg.mod2, mreg.mod3, mreg.mod4, mreg.mod5) df BIC mreg.mod1 3 10328.83 mreg.mod2 4 10317.15 mreg.mod3 5 10315.03 mreg.mod4 6 10321.06 mreg.mod5 7 10326.82 # As expected, Model 3 (faminc~he+we+kl6) is also the preferred choice 7.3 Mallows’ CP Mallows’ Cp can be used to assess model fits using different number of parameters. The statistic is given by: Cp = SSE(p) σ2 − N + 2p, where SSE is the sum squared error, N the number of observations, and p the number of parameters being used to estimate the model. The smaller the value of the statistic Cp, the better. Therefore, if model p is the best choice, Cp will tend to be close to or smaller than p. 31 # Example 18: Mallows Cp (Family Income) library(car) library(AER) library(broom) library(PoEdata) library(leaps) data("edu_inc", package="PoEdata") mreg.mod5 <- lm(faminc~he+we+kl6+xtra_x5+xtra_x6, data=edu_inc) ss=regsubsets(faminc~he+we+kl6+xtra_x5+xtra_x6,method=c("exhaustive"),nbest=3,data=edu_inc) subsets(ss,statistic="cp",legend=F,main="Mallows CP",col="steelblue4") Abbreviation he h we w kl6 k xtra_x5 x_5 xtra_x6 x_6 legend(3,25,bty="y",legend=c('h=hedu','w=wedu','k=kl6','x_5 =xtra5', 'x_6=extr6'),col="steelblue4",cex=1.25) 1 2 3 4 5 5 1 0 1 5 2 0 2 5 Mallows CP Subset Size S ta tis tic : cp w h x_6 h−w w−kw−x_5 h−w−k w−k−x_5h−w−x_6 h−w−k−x_6h−w−k−x_5 h−k−x_5−x_6 h−w−k−x_5−x_6 h=hedu w=wedu k=kl6 x_5 =xtra5 x_6=extr6 # As expected, Model 3 (faminc~he+we+kl6) is also the preferred choice by Cp 7.4 Stepwise Regression Since our goal is ultimately about making the best predictions possible with our model, Stepwise Regression consists of iteratively adding and removing predictors until the model with the lowest prediction error is obtained. There are 3 popular ways to perform this: 1. Forward Selection Start with no predictors in the model, iteratively add the most contributive predictors, and stop when 32 the improvement is no longer statistically significant. 2. Backward Selection Starts with all predictors in the model (full model), iteratively remove the least contributive predictors, and stop when you have a model where all predictors are statistically significant. 3. Stepwise Selection Is a combination of forward and backward selections. You start with no predictors, then sequentially add the most contributive predictors (like forward selection). After adding each new variable, remove any variables that no longer provide an improvement in the model fit (like backward selection). Below we show a brute force approach by manually comparing models based on their prediction accuracy. # Example 19: Evaluating the Prediction Error # See http://www.sthda.com/ for details of the exmaple below library(car) library(AER) library(broom) library(PoEdata) library(leaps) library(tidyverse) library(caret) ## Loading required package: lattice ## ## Attaching package: 'caret' ## The following object is masked from 'package:purrr': ## ## lift ## The following object is masked from 'package:survival': ## ## cluster library(MASS) suppressMessages(library("tidyverse")) # Load the data data("Boston", package = "MASS") # Split the data into training and test set set.seed(123) training.samples <- Boston$medv %>%

createDataPartition(p = 0.8, list = FALSE)
train.data <- Boston[training.samples, ] test.data <- Boston[-training.samples, ] # Build the model model1 <- lm(medv ~., data = train.data) # Make predictions predictions <- model1 %>% predict(test.data)
# Model performance
data.frame(

RMSE = RMSE(predictions, test.data$medv),
R2 = R2(predictions, test.data$medv)

)

RMSE R2
4.992775 0.6704608

33

# So far we have R^2 = 0.67, can we do better by removing e.g., any multicoliinear varaiables?
vif(model1)

## crim zn indus chas nox rm age dis
## 1.865531 2.364859 3.901322 1.064429 4.471619 2.010665 3.018555 3.961686
## rad tax ptratio black lstat
## 7.799919 9.163102 1.907071 1.311933 2.967784

# Remove tax

model2 <- lm(medv ~. -tax, data = train.data) # Make predictions predictions <- model2 %>% predict(test.data)
# Model performance
data.frame(

RMSE = RMSE(predictions, test.data$medv),
R2 = R2(predictions, test.data$medv)

)

RMSE R2
5.013736 0.6705781

# Looks like it didn’t help much

# Technically we would need to manually run through all the plausible models
# until we converge on the best one. Or we can simply use ‘step(model1)’ as
# shown in Example 20.

#step(model1)

In R we can use the function stepAIC (from the leaps library) with the flag direction set to forward,
backward, or both (Stepwise) to automatically do the model selection process for us. The function step is
often used as well.

# Example 20: Stepwise (Family Income)
library(car)
library(AER)
library(broom)
library(PoEdata)
library(leaps)

data(“edu_inc”, package=”PoEdata”)
mreg.mod <- lm(faminc~he+we+kl6+xtra_x5+xtra_x6, data=edu_inc) # Forward Selection step.model <- stepAIC(mreg.mod, direction = "forward", trace = FALSE) summary(step.model) ## ## Call: ## lm(formula = faminc ~ he + we + kl6 + xtra_x5 + xtra_x6, data = edu_inc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -90853 -24027 -6555 18014 243361 ## ## Coefficients: 34 ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7558.6 11195.4 -0.675 0.49995
## he 3339.8 1250.0 2.672 0.00784 **
## we 5868.7 2278.1 2.576 0.01033 *
## kl6 -14200.2 5043.7 -2.815 0.00510 **
## xtra_x5 888.8 2242.5 0.396 0.69204
## xtra_x6 -1067.2 1981.7 -0.539 0.59050
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 40240 on 422 degrees of freedom
## Multiple R-squared: 0.1778, Adjusted R-squared: 0.1681
## F-statistic: 18.25 on 5 and 422 DF, p-value: < 2.2e-16 # Backward Selection step.model <- stepAIC(mreg.mod, direction = "backward", trace = FALSE) summary(step.model) ## ## Call: ## lm(formula = faminc ~ he + we + kl6, data = edu_inc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -91760 -23745 -6642 17636 243255 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7755.3 11162.9 -0.695 0.48760
## he 3211.5 796.7 4.031 6.58e-05 ***
## we 4776.9 1061.2 4.502 8.73e-06 ***
## kl6 -14310.9 5003.9 -2.860 0.00445 **
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 40160 on 424 degrees of freedom
## Multiple R-squared: 0.1772, Adjusted R-squared: 0.1714
## F-statistic: 30.43 on 3 and 424 DF, p-value: < 2.2e-16 # Stepwise Selection step.model <- stepAIC(mreg.mod, direction = "both", trace = FALSE) summary(step.model) ## ## Call: ## lm(formula = faminc ~ he + we + kl6, data = edu_inc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -91760 -23745 -6642 17636 243255 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7755.3 11162.9 -0.695 0.48760

35

## he 3211.5 796.7 4.031 6.58e-05 ***
## we 4776.9 1061.2 4.502 8.73e-06 ***
## kl6 -14310.9 5003.9 -2.860 0.00445 **
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 40160 on 424 degrees of freedom
## Multiple R-squared: 0.1772, Adjusted R-squared: 0.1714
## F-statistic: 30.43 on 3 and 424 DF, p-value: < 2.2e-16 # Autmated way to evaluate all the models: step(mreg.mod) ## Start: AIC=9081.8 ## faminc ~ he + we + kl6 + xtra_x5 + xtra_x6 ## ## Df Sum of Sq RSS AIC ## - xtra_x5 1 2.5439e+08 6.8358e+11 9080.0 ## - xtra_x6 1 4.6960e+08 6.8379e+11 9080.1 ## 6.8332e+11 9081.8
## – we 1 1.0746e+10 6.9407e+11 9086.5
## – he 1 1.1559e+10 6.9488e+11 9087.0
## – kl6 1 1.2835e+10 6.9616e+11 9087.8
##
## Step: AIC=9079.95
## faminc ~ he + we + kl6 + xtra_x6
##
## Df Sum of Sq RSS AIC
## – xtra_x6 1 2.6350e+08 6.8384e+11 9078.1
## 6.8358e+11 9080.0
## – kl6 1 1.2698e+10 6.9628e+11 9085.8
## – he 1 1.5563e+10 6.9914e+11 9087.6
## – we 1 2.0830e+10 7.0441e+11 9090.8
##
## Step: AIC=9078.12
## faminc ~ he + we + kl6
##
## Df Sum of Sq RSS AIC
## 6.8384e+11 9078.1
## – kl6 1 1.3192e+10 6.9703e+11 9084.3
## – he 1 2.6207e+10 7.1005e+11 9092.2
## – we 1 3.2683e+10 7.1652e+11 9096.1

##
## Call:
## lm(formula = faminc ~ he + we + kl6, data = edu_inc)
##
## Coefficients:
## (Intercept) he we kl6
## -7755 3212 4777 -14311

# As expected, Model 3 (faminc~he+we+kl6) is the best one for all cases

# Note: The leaps package has other functions such as ‘leapSeq’ which give
# you more choices for performing the selection.

36

8 Heteroskedasticity

The classic signature of Heteroskedasticity is a growing spread in y with increasing values of x. In this
case the Constant Variance assumption no longer holds, and we therefore need to apply other techniques to
mitigate this problem.

Consider the MR model: yi = β1 + β2xi2 + · · ·+ βKxiK + ei. If the variance var(yi) = E(e2i ) is no longer
constant, this suggests that there is a function h() that depends on predictors z2, . . . , zS that can be used to
characterize the behavior of var(yi). For example, we can propose

var(yi) = E(e
2
i ) = h(α1 + α2zi2 + · · ·+ αSziS).

This is more general result since the constant variance assumption holds provided α2 = α3 = · · · = αS = 0.

Next, we will discuss a couple of statistical methods to help us identify if heteroskedasticity is present by
evaluating hypotheses such as H0 : α2 = α3 = · · · = αS = 0 and examining plausible function forms for h()
such as linear, quadratic and exponential functions.

8.1 Detecting Heteroskedasticity

Next, we will discuss a couple of statistical methods to help us identify if heteroskedasticity is present by
evaluating hypotheses such as H0 : α2 = α3 = · · · = αS = 0 and examining plausible function forms for h()
such as linear, quadratic and exponential functions.

8.1.1 Spread Level Plots

These can be used as a diagnostic for nonconstant variance with the advantage that it also yields an estimate
for the power law transformation to use for the response variable.

# Example 21: Spread Level Plot (Family Income)
library(car)
library(PoEdata)

# Traditional LS
foodeq <- lm(food_exp~income,data=food) tidy(foodeq) term estimate std.error statistic p.value (Intercept) 83.41600 43.410163 1.921578 0.0621824 income 10.20964 2.093263 4.877381 0.0000195 spreadLevelPlot(foodeq) 37 150 200 250 300 350 400 0 .0 5 0 .2 0 0 .5 0 2 .0 0 Spread−Level Plot for foodeq Fitted Values A b so lu te S tu d e n tiz e d R e si d u a ls ## ## Suggested power transformation: -1.106871 # It suggests a lambda = -1.107 for the power low to use in transforming y. 8.1.2 Score Tests for Nonconstant Error Variance There are several popular tests for Heteroskedasticity depending on various characteristics of the data. In this section, however, we will focus on: 1. Breusch-Pagan Test 2. White Test 3. Goldfeld-Quandt Test 8.1.2.1 Lagrange Multiplier Test (Breusch-Pagan Test) Assuming we know what the zS’s are, this test is for the null H0 : α2 = α3 = · · · = αS = 0 against the alternative H1 : not all the αS are zero. The respective test-statistic is χ2 ∼ N × R2 ∼ χ2S−1, where R 2 is obtained from the regression: êi 2 = α1 + α2zi2 + · · ·+ αSziS + vi. The challenge with this test is that we may not not know (or have) other predictors zS for estimating the αS’s. # Example 22: BP Test: Food Expenditure vs. Family Income library(car) library(PoEdata) # The exmaple below shows how ot compute everything from scracth, however, we will shortly # instead use the R function 'bptest' 38 alpha <- 0.05 mod1 <- lm(food_exp~income, data=food) ressq <- resid(mod1)^2 #The test equation: modres <- lm(ressq~income, data=food) N <- nobs(modres) gmodres <- glance(modres) S <- gmodres$df #Number of Betas in model #Chi-square is always a right-tail test chisqcr <- qchisq(1-alpha, S-1) Rsqres <- gmodres$r.squared chisq <- N*Rsqres pval <- 1-pchisq(chisq,S-1) # Since p-val = 0.0066 < alpha=0.05, we reject the null # and conclude tht heteroskedasticity is present. # You can also use the function ncvTest from the car package # to test for Non Constant Variance (hence ncv) ncvTest(mod1) ## Non-constant Variance Score Test ## Variance formula: ~ fitted.values ## Chisquare = 7.343935, Df = 1, p = 0.0067289 # We can see that we reject the null H0: variance = constant 8.1.2.2 White Test Given the problem we saw with the Breusch-Pagan (BP) Test, the White Test solves this by suggesting we start by using our existing predictors from the original model. For example, if our regression model is E(y) = β1 + β2x2 + β3x3, then we set z2 = x2, z3 = x3, z4 = x22, and z5 = x 2 3. # Example 22: White Test: Food Expenditure vs. Family Income library(car) library(PoEdata) alpha <- 0.05 mod1 <- lm(food_exp~income, data=food) ressq <- resid(mod1)^2 modres <- lm(ressq~income+I(income^2), data=food) gmodres <- glance(modres) Rsq <- gmodres$r.squared S <- gmodres$df #Number of Betas in model chisq <- N*Rsq pval <- 1-pchisq(chisq, S-1) # Since p-val = 0.023 < alpha=0.05, we reject the null # and conclude tht heteroskedasticity is present. # We can directly test the Robust version of the BP Test mod1 <- lm(food_exp~income, data=food) bptest(mod1) ## ## studentized Breusch-Pagan test ## ## data: mod1 39 ## BP = 7.3844, df = 1, p-value = 0.006579 # Consissten with the previous results, we reject H0. 8.1.3 Goldfeld-Quandt Test When our model includes a factor variable(s), and suffers from heteroskedasticty, the Goldfeld-Quandt Test can be useful for comparing groups based on the factor levels. For example, consider the case of a binary indicator variable (I) that takes on the values 0 or 1. The test will compares the variances of the two groups by testing the null hypothesis H0 : σ21 = σ 2 0 against the alternative: H0 : σ 2 1 6= σ 2 0 . The respective test statistic is F = σ21 /σ 2 0 which is then compared against Fc = FN1−K,N0−K. This test can also be applied to a model with only quantitative predictors by simply splitting the observations according to a desired rule. The two examples below show these cases. # Example 23: Goldfeld-Quandt Test (w/ Inidicator variable) # wage = educ + exper + metro (=1 metropolitan and =0 urban) library(car) library(PoEdata) alpha <- 0.05 #two tail, will take alpha/2 data("cps2", package="PoEdata") #Create the two groups, m (metro) and r (rural) m <- cps2[which(cps2$metro==1),] r <- cps2[which(cps2$metro==0),] wg1 <- lm(wage~educ+exper, data=m) wg0 <- lm(wage~educ+exper, data=r) df1 <- wg1$df.residual #Numerator degrees of freedom df0 <- wg0$df.residual #Denominatot df sig1squared <- glance(wg1)$sigma^2 sig0squared <- glance(wg0)$sigma^2 fstat <- sig1squared/sig0squared Flc <- qf(alpha/2, df1, df0)#Left (lower) critical F Fuc <- qf(1-alpha/2, df1, df0) #Right (upper) critical F # Since F = 2.09 (in the rejection region), we reject H0 and conclude # that the variances are different, and therefore, heteroskedasticity # is an issue. # Example 24: Goldfeld-Quandt Test (only quantitative predictors) # Food Expenditure vs. Family Income # Divide the data into two groups: below median income and above library(car) library(PoEdata) alpha <- 0.05 data("food", package="PoEdata") medianincome <- median(food$income) li <- food[which(food$income<=medianincome),] hi <- food[which(food$income>=medianincome),]
eqli <- lm(food_exp~income, data=li) eqhi <- lm(food_exp~income, data=hi) dfli <- eqli$df.residual dfhi <- eqhi$df.residual sigsqli <- glance(eqli)$sigma^2 sigsqhi <- glance(eqhi)$sigma^2 40 fstat <- sigsqhi/sigsqli #The larger var in numerator Fc <- qf(1-alpha, dfhi, dfli) pval <- 1-pf(fstat, dfhi, dfli) # Since p-val = 0.0046 < alpha=0.05, we reject the null # and conclude tht heteroskedasticity is present. In this case, # we can see that the test suggests that there is a hiher variance # at higher incomes. # An easier way to conduct the GQ test is using the function gqtest # from the lmtest package library(lmtest) foodeq <- lm(food_exp~income, data=food) gqtest(foodeq, point=0.5, alternative="greater", order.by=food$income) ## ## Goldfeld-Quandt test ## ## data: foodeq ## GQ = 3.6148, df1 = 18, df2 = 18, p-value = 0.004596 ## alternative hypothesis: variance increases from segment 1 to 2 8.2 Mitigating Heteroskedasticity In the previous sections we learned how to detect heteroskedasticity, now we will discuss what to with our model to mitigate it its impact. 8.2.1 Heteroskedasticity-Consistent Standard Errors Recall that heteroskedasticity has the two effects that (1) the LS estimators are no longer the best, and (2) the LS standard errors are not correct since the variance is no longer constant. A more robust (in the sense that it applies to both heteroskedastic and homoskedastic errors) is the White Statndard Error as given by: var(b̂2) = N N − 2 ∑Ni=1 [ (xi − x)2 ê2i ] ∑Ni=1 [(xi − x)2] 2 For example, we look at the results from the food expenditure case and compare the traditional vs. the White standard errors, as well as the respective confidence intervals for the model coefficients. Keep in mind that hypotheses such as e.g., λ = c1β1 + c2β2 = c0 would also require us to update the standard errors, and therefore, our conclusions may change. # Example 25: White SE # Food Expenditure vs. Family Income 41 # To compute the Robust White SE, the car package has a function # hccm (heteroskedasticity-robust coefficient covariance matrix) that will # compute them library(car) library(PoEdata) # Traditional Standard Errors foodeq <- lm(food_exp~income,data=food) tidy(foodeq) term estimate std.error statistic p.value (Intercept) 83.41600 43.410163 1.921578 0.0621824 income 10.20964 2.093263 4.877381 0.0000195 # Robust White Standard Errors cov1 <- hccm(foodeq, type="hc1") food.HC1 <- coeftest(foodeq, vcov.=cov1) tidy(food.HC1) term estimate std.error statistic p.value (Intercept) 83.41600 27.463748 3.037313 0.0042989 income 10.20964 1.809077 5.643566 0.0000018 # Hypotheiss Test for a linear combination of paramters: H0 = beta_2 + beta_3 = 0 data("andy", package="PoEdata") andy.eq <- lm(sales~price+advert, data=andy) bp <- bptest(andy.eq) #Heteroskedsticity test b2 <- coef(andy.eq)[["price"]] b3 <- coef(andy.eq)[["advert"]] H0 <- "price+advert=0" # Traditional Standard Errors linearHypothesis(andy.eq, H0) Res.Df RSS Df Sum of Sq F Pr(>F)
73 2254.715 NA NA NA NA
72 1718.943 1 535.7719 22.44145 1.06e-05

# Robust White Standard Errors
linearHypothesis(andy.eq, H0, vcov=hccm(andy.eq, type=”hc1″))

Res.Df Df F Pr(>F)
73 NA NA NA
72 1 23.38698 7.3e-06

# In this case the p-value is larger using the traditional errors.

8.2.2 Generalized (Weighted) Least Squares

To properly deal with Heteroskedasticity, we would need to estimate a different variance of the error term
for each observation. On a practical level this is not possible, and instead we model the dependence of the
variance on one or more of the predictors in the model. We will exams two forms of dependence depending
on whether we know or not the actual relationship between the variance and predictor(s).

8.2.2.1 Known Form or Variance

42

For simplicity, let’s assume we have a SLR model yi = β1 + β2xi + ei and that var(ei) = σ2i = σ
2xi. If we

divide our original model by

xi, then we show that in the transformed model: y∗i = β1x

i1 + β2x


i2 + e


i , e

i

are now homoskedastic since var(e∗i ) =varei/

xi =var(ei)/xi = σ2xi/xi = σ2.Also, the desired properties
E(e∗i ) = 0 and cov(e


i , e

j ) = 0, i 6= j still hold.

The transformation we proposed is part of a more general method known as Weighted Least Squares where
we weighted the errors by x−1/2i .

For example, looking again at the food expenditure example, applying this method we obtain:

# Example 25: Weighted Least Squares -Known Variance
# Food Expenditure vs. Family Income
# To compute the Robust White SE, the car package has a function
# hccm (heteroskedasticity-robust coefficient covariance matrix) that will
# compute them

library(car)
library(PoEdata)

# Traditional LS
foodeq <- lm(food_exp~income,data=food) tidy(foodeq) term estimate std.error statistic p.value (Intercept) 83.41600 43.410163 1.921578 0.0621824 income 10.20964 2.093263 4.877381 0.0000195 # Weighted LS w <- 1/food$income # Weights : R takes the sqrt by default already, hence w=1/x food.wls <- lm(food_exp~income, weights=w, data=food) vcvfoodeq <- coeftest(foodeq, vcov.=cov1) tidy(vcvfoodeq) term estimate std.error statistic p.value (Intercept) 83.41600 27.463748 3.037313 0.0042989 income 10.20964 1.809077 5.643566 0.0000018 # Traditional LS 8.2.2.2 Unknown Form of Variance If we do not know the form of the variance, we can propose a more general power law form such as var(ei) = σ2i = σ 2xγi , where γ is an unknown parameter that we can estimate. The trick to estimating γ is to take logs on both side of the variance equation, ln(σ2i ) = ln(σ 2) + γ ln(xi), and take the exponential on both sides, σ2i = exp[ln(σ 2) + γ ln(xi)] = exp(α1 + α2zi), 43 where α1 = ln(σ2), α2 = γ, and zi = ln(xi). For the MR case, we simply have more terms, ziS to include in the exponential function. We can use traditional LS to estimate γ from the log-linear model ln(σ2i ) = α1 + α2zi. The estimated equation is l̂n(σ2i ) = 0.9378 + 2.39zi (recall that we used γ = 1 in our previous example). Now we can use σ̂2i = exp(0.9378 + 2.39zi) to transform our original model instead of the x −1/2 i weights. The estimated model is now: # Example 25: Weighted Least Squares -Uknown Variance # Food Expenditure vs. Family Income # To compute the Robust White SE, the car package has a function # hccm (heteroskedasticity-robust coefficient covariance matrix) that will # compute them library(car) library(PoEdata) foodeq <- lm(food_exp~income,data=food) data("food", package="PoEdata") food.ols <- lm(food_exp~income, data=food) ehatsq <- resid(food.ols)^2 sighatsq.ols <- lm(log(ehatsq)~log(income), data=food) vari <- exp(fitted(sighatsq.ols)) food.fgls <- lm(food_exp~income, weights=1/vari, data=food) # This case is also known as Feasible Generalized Least Squares library(stargazer) # stargazer(food.ols, food.HC1, food.wls, food.fgls) Table 2: Comparing various ’food’ models Dependent variable: ’food expenditure’ OLS HC1 WLS FGLS Constant 83.416 83.416 78.684 76.054 (43.410) (27.464) (23.789) (9.713) income 10.210 10.210 10.451 10.633 (2.093) (1.809) (1.386) (0.972) Observations 40 40 40 9 Missing Observations There are several ways to deal with missing data depending on their characteristics and volume. Below we briefly discuss for of them. 1. Remove Missing Observations If the number of missing observations is small relative to the dataset, we can simply remove these 44 observations. Also, we need to make sure not to introduce a bias due to their removal by e.g., failing to represent certain characteristics of the data. 2. Remove the Variable This may be appropriate if most of the missing values occur mainly in a particular variable. We would need to weigh its removal against other considerations such as economic significance, etc. 3. Imputation via Mean/Median/Mode Depending on the distribution of the observations, we can impute the missing data with estimates of the Mean, Median or Mode. 4. Prediction We can predict the missing observations using kNN (k-Nearest Neighbor), regression, interpolation, etc. However, of all these methods, kNN seems to work well provided the variables are not factor variables. The basic idea behind the method is to replace a missing observation with a weighted average of the closest (using the Euclidean distance) k neighbors. In the case of factor variables, we can instead use *Recursive Partitioning* (We normally cover this when discussing *Regression Trees*). # Example 25: Boston Housing Data library(mlbench) #Machine Learning Benchmark # For refence on this example, look at: # http://r-statistics.co/Missing-Value-Treatment-With-R.html#4.%20Prediction data ("BostonHousing", package="mlbench") # initialize the data # load the data original <- BostonHousing # backup original data # Introduce missing values set.seed(100) BostonHousing[sample(1:nrow(BostonHousing), 40), "rad"] <- NA BostonHousing[sample(1:nrow(BostonHousing), 40), "ptratio"] <- NA head(BostonHousing) crim zn indus chas nox rm age dis rad tax ptratio b lstat medv 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7 # Before going through the methods, we should first look at # the pattern of missing observations library(mice) ## ## Attaching package: 'mice' ## The following object is masked from 'package:tidyr': ## ## complete ## The following objects are masked from 'package:base': ## ## cbind, rbind md.pattern(BostonHousing) # pattern or missing values in data. 45 crim 0 zn 0 indus 0 chas 0 nox 0 rm 0 age 0 dis 0 tax 0 b 0 lstat 0 medv 0 rad 40 ptratio 40 25 135 135 0431 80 crim zn indus chas nox rm age dis tax b lstat medv rad ptratio 431 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 35 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 35 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 5 1 1 1 1 1 1 1 1 1 1 1 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 40 40 80 # Method 1: Remove Missing Observations lm(medv ~ ptratio + rad, data=BostonHousing, na.action=na.omit) # ## ## Call: ## lm(formula = medv ~ ptratio + rad, data = BostonHousing, na.action = na.omit) ## ## Coefficients: ## (Intercept) ptratio rad ## 57.2724 -1.7836 -0.2035 # Method 2: Remove the variable # Simply leave it out of the regression model -not much more to do 🙂 # Method 3: Imputation via Mean/Median/Mode library(Hmisc) ## Loading required package: Formula ## ## Attaching package: 'Hmisc' ## The following objects are masked from 'package:xtable': ## ## label, label<- ## The following object is masked from 'package:quantmod': ## ## Lag ## The following objects are masked from 'package:tis': ## ## capitalize, Lag ## The following objects are masked from 'package:plyr': ## ## is.discrete, summarize ## The following objects are masked from 'package:dplyr': ## 46 ## src, summarize ## The following objects are masked from 'package:base': ## ## format.pval, units #impute(BostonHousing$ptratio, mean) # replace with mean #impute(BostonHousing$ptratio, median) # median #impute(BostonHousing$ptratio, 20) # replace specific number # or if you want to impute manually #BostonHousing$ptratio[is.na(BostonHousing$ptratio)] <- mean(BostonHousing$ptratio, na.rm = T) # Compute the accuracy when NAs are imputed with the mean library(DMwR) #Data Mining w/ R ## Loading required package: grid ## ## Attaching package: 'DMwR' ## The following object is masked from 'package:broom': ## ## bootstrap ## The following object is masked from 'package:plyr': ## ## join # This library offers many fancy ways of dealing with NAs # As a side note, if you're interested in trading, try their trading.simulator actuals <- original$ptratio predicted <- rep(mean(BostonHousing$ptratio, na.rm=T), length(actuals)) regr.eval(actuals, predicted) ## mae mse rmse mape ## 1.7854298 4.6778718 2.1628388 0.1034322 # Looking at mape, we have about a 10% (mean absolute error) -not too bad # Method 4: Prediction # Perform knn imputation. knnOutput <- knnImputation(BostonHousing[, !names(BostonHousing) %in% "medv"]) anyNA(knnOutput) # Check that we don't have any NAs left in the entire dataset ## [1] FALSE actuals <- original$ptratio predicted <- knnOutput$ptratio regr.eval(actuals, predicted) ## mae mse rmse mape ## 0.079200565 0.156450738 0.395538542 0.004632037 # Based on mape, we are now down to 0.5% error! # Prediction for Factor Variables library(rpart) 47 ## ## Attaching package: 'rpart' ## The following object is masked from 'package:survival': ## ## solder class_mod <- rpart(rad ~ . - medv, data=BostonHousing[!is.na(BostonHousing$rad), ], method="class", na.action=na.omit) # since rad is a factor anova_mod <- rpart(ptratio ~ . - medv, data=BostonHousing[!is.na(BostonHousing$ptratio), ], method="anova", na.action=na.omit) # since ptratio is numeric. rad_pred <- predict(class_mod, BostonHousing[is.na(BostonHousing$rad), ]) ptratio_pred <- predict(anova_mod, BostonHousing[is.na(BostonHousing$ptratio), ]) # Accuracy for ptratio actuals <- original$ptratio[is.na(BostonHousing$ptratio)] predicteds <- ptratio_pred regr.eval(actuals, predicteds) ## mae mse rmse mape ## 0.71061673 0.99693845 0.99846805 0.04099908 # A mape of about 4% not bad # Acccuracy for rad (factor variable) actuals <- original$rad[is.na(BostonHousing$rad)] predicteds <- as.numeric(colnames(rad_pred)[apply(rad_pred, 1, which.max)]) mean(actuals != predicteds) # compute misclass error. ## [1] 0.25 # About 25% mis-classification error -pretty good # Method 5: Advanced Method (Multivariate Imputation by Chained Equations) # Use the R package mice library(mice) # Perform mice imputation, based on random forests miceMod <- mice(BostonHousing[, !names(BostonHousing) %in% "medv"], method="rf") ## ## iter imp variable ## 1 1 rad ptratio ## 1 2 rad ptratio ## 1 3 rad ptratio ## 1 4 rad ptratio ## 1 5 rad ptratio ## 2 1 rad ptratio ## 2 2 rad ptratio ## 2 3 rad ptratio ## 2 4 rad ptratio ## 2 5 rad ptratio ## 3 1 rad ptratio ## 3 2 rad ptratio ## 3 3 rad ptratio 48 ## 3 4 rad ptratio ## 3 5 rad ptratio ## 4 1 rad ptratio ## 4 2 rad ptratio ## 4 3 rad ptratio ## 4 4 rad ptratio ## 4 5 rad ptratio ## 5 1 rad ptratio ## 5 2 rad ptratio ## 5 3 rad ptratio ## 5 4 rad ptratio ## 5 5 rad ptratio miceOutput <- complete(miceMod) # generate the completed data. anyNA(miceOutput) ## [1] FALSE # Accuracy for ptratio actuals <- original$ptratio[is.na(BostonHousing$ptratio)] predicteds <- miceOutput[is.na(BostonHousing$ptratio), "ptratio"] regr.eval(actuals, predicteds) ## mae mse rmse mape ## 0.35000000 0.77700000 0.88147603 0.01965896 # Acccuracy for rad (factor variable) actuals <- original$rad[is.na(BostonHousing$rad)] predicteds <- miceOutput[is.na(BostonHousing$rad), "rad"] mean(actuals != predicteds) # compute misclass error. ## [1] 0.225 49 Residuals and Influence Measures Residual Plots Identifying and Classifying Unusual Observations Predictor Transformations Omitted Variable Bias Irrelevant Variables Multicollinearity Detecting Multicollinearity Model Misspecification: Ramsey RESET Model Selection Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) Mallows' CP Stepwise Regression Heteroskedasticity Detecting Heteroskedasticity Spread Level Plots Score Tests for Nonconstant Error Variance Goldfeld-Quandt Test Mitigating Heteroskedasticity Heteroskedasticity-Consistent Standard Errors Generalized (Weighted) Least Squares Missing Observations