Assignment 11, Question 4: Instrumental variables
Part (a): Reading data from the Excel file:
library(readxl)
## Warning: package ‘readxl’ was built under R version 4.0.2
data=read_excel(“Mroz.xls”,na=”.”)
my_data=subset(data,is.na(wage)==FALSE)
suppressMessages(attach(my_data))
Part (b): OLS regression:
suppressMessages(library(AER))
m=lm(lwage~educ+exper+expersq)
summary(m)
##
## Call:
## lm(formula = lwage ~ educ + exper + expersq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.08404 -0.30627 0.04952 0.37498 2.37115
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5220407 0.1986321 -2.628 0.00890 **
## educ 0.1074896 0.0141465 7.598 1.94e-13 ***
## exper 0.0415665 0.0131752 3.155 0.00172 **
## expersq -0.0008112 0.0003932 -2.063 0.03974 *
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 0.6664 on 424 degrees of freedom
## Multiple R-squared: 0.1568, Adjusted R-squared: 0.1509
## F-statistic: 26.29 on 3 and 424 DF, p-value: 1.302e-15
coeftest(m,vcov=hccm(m,type=”hc0″))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.52204068 0.20070596 -2.6010 0.009620 **
## educ 0.10748965 0.01315705 8.1697 3.591e-15 ***
## exper 0.04156651 0.01520150 2.7344 0.006512 **
## expersq -0.00081119 0.00041810 -1.9402 0.053022 .
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
• The OLS estimate of the return to education is 0.1075.
• The homoskedastic and heteroskedastic standard errors appear to be similar, although the latter are somewhat larger. For example, expersq is less significant under with the homoskedastic standard error.
Part (c): IV estimates:
m_iv=ivreg(lwage~educ+exper+expersq | fatheduc+motheduc+exper+expersq)
summary(m_iv)
##
## Call:
## ivreg(formula = lwage ~ educ + exper + expersq | fatheduc + motheduc +
## exper + expersq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0986 -0.3196 0.0551 0.3689 2.3493
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0481003 0.4003281 0.120 0.90442
## educ 0.0613966 0.0314367 1.953 0.05147 .
## exper 0.0441704 0.0134325 3.288 0.00109 **
## expersq -0.0008990 0.0004017 -2.238 0.02574 *
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 0.6747 on 424 degrees of freedom
## Multiple R-Squared: 0.1357, Adjusted R-squared: 0.1296
## Wald test: 8.141 on 3 and 424 DF, p-value: 2.787e-05
• The IV estimate of the return to education is 0.0614.
• The IV estimate is smaller than the OLS estimate. The result is plausible if education is endogenous due to unobserved ability in the error term.
Robust standard errors for the IV estimates:
library(ivpack)
## Warning: package ‘ivpack’ was built under R version 4.0.2
robust.se(m_iv)
## [1] “Robust Standard Errors”
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.04810032 0.42778460 0.1124 0.910527
## educ 0.06139663 0.03318243 1.8503 0.064969 .
## exper 0.04417039 0.01547356 2.8546 0.004521 **
## expersq -0.00089897 0.00042807 -2.1001 0.036314 *
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
• The heteroskedasticity-robust are somewhat larger, which reduces the significance of education.
• The standard errors for the IV estimate of return to education are larger.
Part (d): The first stage:
fstg=lm(educ~fatheduc+motheduc+exper+expersq)
summary(fstg)
##
## Call:
## lm(formula = educ ~ fatheduc + motheduc + exper + expersq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.8057 -1.0520 -0.0371 1.0258 6.3787
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.102640 0.426561 21.340 < 2e-16 ***
## fatheduc 0.189548 0.033756 5.615 3.56e-08 ***
## motheduc 0.157597 0.035894 4.391 1.43e-05 ***
## exper 0.045225 0.040251 1.124 0.262
## expersq -0.001009 0.001203 -0.839 0.402
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.039 on 423 degrees of freedom
## Multiple R-squared: 0.2115, Adjusted R-squared: 0.204
## F-statistic: 28.36 on 4 and 423 DF, p-value: < 2.2e-16
coeftest(fstg)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.1026401 0.4265614 21.3396 < 2.2e-16 ***
## fatheduc 0.1895484 0.0337565 5.6152 3.562e-08 ***
## motheduc 0.1575970 0.0358941 4.3906 1.430e-05 ***
## exper 0.0452254 0.0402507 1.1236 0.2618
## expersq -0.0010091 0.0012033 -0.8386 0.4022
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linearHypothesis(fstg,c("fatheduc=0","motheduc=0"),vcov=hccm(fstg,type="hc0"))
## Linear hypothesis test
##
## Hypothesis:
## fatheduc = 0
## motheduc = 0
##
## Model 1: restricted model
## Model 2: educ ~ fatheduc + motheduc + exper + expersq
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 425
## 2 423 2 50.112 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
• The homosekdastic and heteroskedastic standard errors are very similar.
• The IVs fatheduc and motheduc are significant in the first stage.
Part (e),(f)
summary(m_iv,diagnostics=TRUE)
##
## Call:
## ivreg(formula = lwage ~ educ + exper + expersq | fatheduc + motheduc +
## exper + expersq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0986 -0.3196 0.0551 0.3689 2.3493
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0481003 0.4003281 0.120 0.90442
## educ 0.0613966 0.0314367 1.953 0.05147 .
## exper 0.0441704 0.0134325 3.288 0.00109 **
## expersq -0.0008990 0.0004017 -2.238 0.02574 *
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 423 55.400 <2e-16 ***
## Wu-Hausman 1 423 2.793 0.0954 .
## Sargan 1 NA 0.378 0.5386
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6747 on 424 degrees of freedom
## Multiple R-Squared: 0.1357, Adjusted R-squared: 0.1296
## Wald test: 8.141 on 3 and 424 DF, p-value: 2.787e-05
• Note that the “Weak instruments” statistic is simply the first-stage statistic for the IVs.
• The IVs can be endogenous if ability is inherited by children from their parents. In that case, fatheduc and motheduc can be correlated with their child’s ability.
• The IVs pass Sargan’s test. Note that this cannot be used as a strong evidence of the validity of the IVs: The probability of type II error (not being able to reject the null of the IVs validity when they are in fact invalid) is not controlled.
• The Hausman test is significant at 10%. The null of the exogeneity of education can be rejected at 10%. Note also that we should again be particularly concerned with type II error here: the OLS estimates are inconsistent when eucation is endogenous. Combining these two consideration, we can concldue that one should not be using OLS in this case.
Part (g)
mh=lm(formula = lwage ~ educ + exper + expersq + fstg$residuals)
summary(mh)
##
## Call:
## lm(formula = lwage ~ educ + exper + expersq + fstg$residuals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.03743 -0.30775 0.04191 0.40361 2.33303
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0481003 0.3945753 0.122 0.903033
## educ 0.0613966 0.0309849 1.981 0.048182 *
## exper 0.0441704 0.0132394 3.336 0.000924 ***
## expersq -0.0008990 0.0003959 -2.271 0.023672 *
## fstg$residuals 0.0581666 0.0348073 1.671 0.095440 .
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 0.665 on 423 degrees of freedom
## Multiple R-squared: 0.1624, Adjusted R-squared: 0.1544
## F-statistic: 20.5 on 4 and 423 DF, p-value: 1.888e-15
linearHypothesis(mh,c(“fstg$residuals=0”))
## Linear hypothesis test
##
## Hypothesis:
## fstg$residuals = 0
##
## Model 1: restricted model
## Model 2: lwage ~ educ + exper + expersq + fstg$residuals
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 424 188.31
## 2 423 187.07 1 1.235 2.7926 0.09544 .
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
• The F-statistic as the same as Hausman statistic in part (f).
• The estimates of the coefficients on educ, exper, and expersq are identical to the 2SLS estimates in part (c). They are known as control function estimates, where the control function is the first-stage residuals, which control for endogeneity. It can be easily shown that in the case of the linear IV model, the 2SLS and control function estimators are the same.
cbind(TwoSLS=m_iv$coefficients,ContrFunc=mh$coefficients[1:4])
## TwoSLS ContrFunc
## (Intercept) 0.0481003171 0.0481003171
## educ 0.0613966277 0.0613966277
## exper 0.0441703940 0.0441703940
## expersq -0.0008989696 -0.0008989696