ECONOMETRICS I ECON GR5411
Lecture 20 – Instrumental Variables III and GMM and starting MLE
by
Seyhan Erden Columbia University MA in Economics
The Instrumental Variables
Today we will discuss J-statistic for overidentifying restrictions. We will conclude with a discussion of efficient IV estimation and the test of over identifying restrictions when the errors are heteroskedastic – a situation in which the efficient IV estimator is known as the efficient generalized method of
moments (GMM) estimator.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 2
Two-stage Least Squares:
Here we will consider 2SLS again from a different point of view:
If 𝑍 contains more variables than 𝑋, 𝑖. 𝑒. when 𝑙 > 𝑘 , 𝑍′𝑋 is 𝑙×𝑘 with rank 𝑘, which is less than 𝑙, then we cannot invert 𝑍′𝑋.
Hence, 𝛽+-. = 𝑍0𝑋 12𝑍0𝑦 is not usable.
Crucial result was plim 82 𝑍0𝜀 = 0, 𝑖. 𝑒. every column
of 𝑍 is asymptotically uncorrelated with 𝜀. This also means that every linear combination of the columns of 𝒁 is uncorrelated with 𝜺.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 3
Two-stage Least Squares:
One can choose 𝑘 variables among 𝑙 in 𝑍 (discarding the information contained in extra 𝑙 − 𝑘 columns), but this will be inefficient.
A better possibility, that uses all instruments, is the projection of columns of 𝑋 in the columns space of 𝑍,
𝑋> = 𝑍 𝑍 0 𝑍 1 2 𝑍 0 𝑋
11/23/20 Lecture 20 GR5411 by Seyhan Erden 4
Two-stage Least Squares:
𝑋> = 𝑍 𝑍 0 𝑍 1 2 𝑍 0 𝑋
The instruments are linear combinations of the variables (columns) in 𝒁. With this choice of instrumental variables we have,
𝛽+?@A@ = 𝑋>0𝑋> 12𝑋>0𝑦
11/23/20 Lecture 20 GR5411 by Seyhan Erden 5
Two-stage Least Squares:
Thus the stages of 2𝑆𝐿𝑆 estimator is
Ø First Stage: let 𝑋> be k linear combinations of 𝑍
𝑋> = 𝑍 𝑍 0 𝑍 1 2 𝑍 0 𝑋 Ø Second Stage: 𝛽+E@A@ = 𝑋>0𝑋> 12𝑋>0𝑦
𝛽+?@A@ = 𝑍 𝑍0𝑍 12𝑍0𝑋 ′ 𝑍 𝑍0𝑍 12𝑍0𝑋 12 𝑍 𝑍0𝑍 12𝑍0𝑋 0𝑦
= 𝑋0𝑍 𝑍0𝑍 12𝑍′𝑍 𝑍0𝑍 12𝑍0𝑋 12𝑋0𝑍 𝑍0𝑍 12𝑍0𝑦 = 𝑋0𝑍 𝑍0𝑍 12𝑍0𝑋 12𝑋0𝑍 𝑍0𝑍 12𝑍0𝑦
11/23/20 Lecture 20 GR5411 by Seyhan Erden 6
Two-stage Least Squares:
𝛽+?@A@ = 𝑋0𝑍 𝑍0𝑍 12𝑍0𝑋 12𝑋0𝑍 𝑍0𝑍 12𝑍0𝑦 = [X′𝑃IX]12𝑋0𝑃I𝑦
= 𝑋>0𝑋 12𝑋>0𝑦
Also true that
𝛽+?@A@=[X′𝑃I𝑃IX]12𝑋0𝑃I𝑦= 𝑋>0𝑋>12𝑋>0𝑦
11/23/20 Lecture 20 GR5411 by Seyhan Erden 7
Two-stage Least Squares:
Since 𝑃I is the projection matrix and 𝑋> = 𝑃I 𝑋
This result suggests that , when (and only
when) 𝑿K is the set of instruments
projected on 𝑿, the 𝟐𝑺𝑳𝑺 estimator is
computed by least squares regression of 𝒚 on 𝑿K.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 8
Two-stage Least Squares:
This result suggest that 𝛽+-. can be computed in two stages as given above (first computing 𝑋>, then by the least squares regression)
For this reason the 𝐼𝑉 estimator here is called 2𝑆𝐿𝑆 estimator.
However, be careful!………
11/23/20 Lecture 20 GR5411 by Seyhan Erden 9
Standard Errors of of the 2SLS estimators:
However, be careful!……… because in the computation of the asymptotic covariance matrix; 𝜎S? should not be based on 𝑋>.
𝑆? =W𝑦−𝑥S0𝛽+ ≠𝜎? ?@A@ 𝑛 X X?@A@
18
Recall however,
XY2
18]
𝑆 ? = W 𝑦 − 𝑥 0 𝛽+ → 𝜎 ? -. 𝑛 X X-.
XY2
The appropriate calculation is built into modern software (ex. Stata ivreg command calculates standard errors correctly under 2𝑆𝐿𝑆)
11/23/20 Lecture 20 GR5411 by Seyhan Erden 10
The class of IV estimators that use linear combinations of 𝑍:
The class of IV estimators that use linear combinations of Z as instruments can be generated in two equivalent ways:
Both start with the same moment equations. Under the exogeneity assumption, the errors 𝜀 = 𝑌 − 𝑋𝛽 are uncorrelated with the exogenous regressors; that is at the true value of 𝛽, 𝐸 𝑍X|𝜀X implies that
𝐸 𝑌−𝑋𝛽0𝑍 =0.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 11
The class of IV estimators that use linear combinations of 𝑍:
Recall that when there is exact identification (number of IVs are equal to number of endogenous regressors, 𝑙 = 𝑘)thevalueof𝑏thatsolves 𝑌−𝑋𝑏 0𝑍 =0istheIV estimator of 𝛽.
However, when there is overidentification 𝑙 > 𝑘 , the equations in the system cannot be simultaneously satisfied by the same value of 𝑏 because of sampling variation – there are more equations than unknowns – and, in general, this system does not have a unique
solution.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 12
Linear Combinations of Z’s:
One approach to the problem of estimating 𝛽
when there is overidentification is to trade off the
desire to satisfy each equation by minimizing a
quadratic form involving all the equations.
Specifically, let A be a symmetric positive
definite matrix, and let 𝛽+-. denote the estimator
that minimizes
b
0
𝑚𝑖𝑛d𝑆= 𝑌−𝑋𝑏 𝑍𝐴𝑍′ 𝑌−𝑋𝑏
𝑆 = 𝑌0𝑍𝐴𝑍0𝑌 − 𝑌0𝑍𝐴𝑍0𝑋𝑏
−𝑏0𝑋0𝑍𝐴𝑍0𝑌 + 𝑏0𝑋′𝑍𝐴𝑍0𝑋𝑏
11/23/20 Lecture 20 GR5411 by Seyhan Erden 13
Linear Combinations of Z’s:
𝑆 = 𝑌0𝑍𝐴𝑍0𝑌 − 𝑌0𝑍𝐴𝑍0𝑋𝑏
−𝑏0𝑋0𝑍𝐴𝑍0𝑌 + 𝑏0𝑋′𝑍𝐴𝑍0𝑋𝑏
g@ = −2𝑋0𝑍𝐴𝑍0𝑌 + 2𝑋0𝑍𝐴𝑍0𝑋𝑏 gd
the IV estimator based on the weight matrix A:
𝛽+-. = 𝑋0𝑍𝐴𝑍0𝑋 12𝑋0𝑍𝐴𝑍0𝑌 b
Comparing this with TSLS estimator shows that TSLS estimator is the IV estimator with
𝐴= 𝑍0𝑍12
11/23/20 Lecture 20 GR5411 by Seyhan Erden 14
Hausman (Endogeneity) Test:
Ø𝐼𝑉 is consistent both when 𝑝𝑙𝑖𝑚 (jkl) = 0 and when it is not zero 8
ØOLS is consistent only when 𝑝𝑙𝑖𝑚 (jkl) = 0, 8
but it is efficient when the condition is true
ØUnder the assumption that 𝐸 𝑥X𝜀X = 0, the difference between the two estimators should be small.
ØThe Hausman test is
+ + p+ + 12+ +
𝐻 = 𝛽-. −𝛽oA@ ′ 𝐴𝑣𝑎𝑟(𝛽-. −𝛽oA@) 𝛽-. −𝛽oA@
11/23/20 Lecture 20 GR5411 by Seyhan Erden
15
Hausman (Endogeneity) Test:
Hausman showed that the covariance between an efficient estimator, 𝛽+t of parameter, 𝛽, and its difference from an inefficient estimator 𝛽+-, of the
same parameter vector, 𝛽+t − 𝛽+- , is zero. Thus,
𝐶𝑜𝑣 𝛽+t, 𝛽+t −𝛽+- =𝑉𝑎𝑟 𝛽+t −𝐶𝑜𝑣 𝛽+t,𝛽+- =0
or
𝑉𝑎𝑟𝛽+t =𝐶𝑜𝑣𝛽+t,𝛽+-
Let’s show this for 𝛽+oA@ and 𝛽+-.
𝛽+oA@ = 𝑋0𝑋 12𝑋0𝑌 = 𝑋0𝑋 12𝑋0(𝑋𝛽 + 𝜀)
then𝛽+oA@−𝛽= 𝑋0𝑋12𝑋0𝜀
Similarly,
𝛽+-. =+ 𝑍0𝑋 12𝑍0𝑌 = 𝑍0𝑋 12𝑍0(𝑋𝛽 + 𝜀)
then𝛽-.−𝛽= 𝑍0𝑋12𝑍0𝜀
11/23/20 Lecture 20 GR5411 by Seyhan Erden 16
Hausman (Endogeneity) Test:
Now,+ + + 𝐶𝑜𝑣 𝛽oA@, 𝛽oA@ − 𝛽-.
+ + + = 𝑉𝑎𝑟 𝛽oA@ − 𝐶𝑜𝑣 𝛽oA@, 𝛽-.
=𝐸 𝛽+oA@−𝛽 𝛽+oA@−𝛽0 −𝐸 𝛽+-.−𝛽 𝛽+oA@−𝛽0
=𝐸 𝑋0𝑋12𝑋0𝜀𝜀0𝑋𝑋0𝑋12 −𝐸 𝑍0𝑋12𝑍0𝜀𝜀0𝑋𝑋0𝑋12 =𝜎? 𝑋0𝑋12−𝜎? 𝑋0𝑋12=0
Then we can write that
𝑉𝑎𝑟 𝛽+oA@ = 𝐶𝑜𝑣 𝛽+oA@,𝛽+-.
𝐴𝑣𝑎𝑟 𝛽+-. − 𝛽+oA@ = 𝐴𝑣𝑎𝑟 𝛽+-. + 𝐴𝑣𝑎𝑟 𝛽+oA@ − 2𝐴𝐶𝑜𝑣 𝛽+oA@ , 𝛽+-.
since 𝐴𝑣𝑎𝑟 𝛽+oA@ = 𝐴𝐶𝑜𝑣 𝛽+oA@ , 𝛽+-. , we can write the following
𝐴𝑣𝑎𝑟 𝛽+-. − 𝛽+oA@ = 𝐴𝑣𝑎𝑟 𝛽+-. − 𝐴𝑣𝑎𝑟 𝛽+oA@
Also, However,
11/23/20 Lecture 20 GR5411 by Seyhan Erden 17
Hausman (Endogeneity) Test:
Thus,
Inserting this useful result into Wald statistic, we have
𝐴𝑣𝑎𝑟 𝛽+-. − 𝛽+oA@ = 𝐴𝑣𝑎𝑟 𝛽+-. − 𝐴𝑣𝑎𝑟 𝛽+oA@
𝐻 = 𝛽+ − 𝛽+ ′ 𝐴 𝑣 𝑎 𝑟 𝛽+ − 𝐴 𝑣 𝑎 𝑟 𝛽+ 1 2 𝛽+ − 𝛽+
-. oA@ -. oA@ -. oA@
Now, to find a useful expression for 𝐴𝑣𝑎𝑟 𝛽+-. − 𝛽+oA@
+ 𝐴𝑣𝑎𝑟 𝛽-.
+
− 𝐴𝑣𝑎𝑟 𝛽oA@
𝜎?
= 𝑛 𝑝𝑙𝑖𝑚
𝑋0𝑍 𝑍0𝑍 12𝑍0𝑋 𝑛
𝜎? 𝑋0𝑋
12
𝜎?
= 𝑛 𝑝𝑙𝑖𝑚 𝑛 𝑋 𝑍 𝑍 𝑍 𝑍 𝑋 − 𝑋 𝑋
𝜎? 012012 = 𝑛 𝑝 𝑙 𝑖 𝑚 𝑛 𝑋 𝑃I 𝑋 − 𝑋 𝑋
12
− 𝑛 𝑝𝑙𝑖𝑚
𝑛
0012012 012
Estimator for this solution is
p++ ?>>12 012 𝐴𝑣𝑎𝑟 𝛽-.−𝛽oA@ =𝑠 𝑋′𝑋 − 𝑋𝑋
11/23/20 Lecture 20 GR5411 by Seyhan Erden
18
The Logic of the HausmanTest for Endogeneity:
What is the null hypothesis for Hausman test here?
𝐻y: we have two consistent estimators of 𝛽 (𝛽+oA@ and 𝛽+-. both OLS and IV are consistent)
𝐻2: only 𝛽+-. is consistent + +
The suggestion, then, is to examine 𝛽-. − 𝛽oA@ Under the null hypothesis 𝑝𝑙𝑖𝑚(𝛽+-. − 𝛽+oA@) = 0 whereas under the alternative 𝑝𝑙𝑖𝑚(𝛽+-. − 𝛽+oA@) ≠ 0
We will test this with a Wald statistic,
𝐻 = 𝛽+-. − 𝛽+oA@ ′ 𝐴𝑣𝑎𝑟 𝛽+-. − 𝐴𝑣𝑎𝑟 𝛽+oA@ 12 𝛽+-. − 𝛽+oA@
Now, to find a useful expression we use 3rd rule of the Distribution Theory again
𝛽+ – . − 𝛽+ o A @ ′ 𝑋> ′ 𝑋> 1 2 − 𝑋 0 𝑋 1 2 1 2 𝛽+ – . − 𝛽+ o A @ 𝑠?
𝐻=
11/23/20
Lecture 20 GR5411 by Seyhan Erden 19
The Logic of the HausmanTest for Endogeneity:
What is the null hypothesis for Hausman test here?
𝐻y: we have two consistent estimators of 𝛽 (both OLS and IV are consistent)
𝐻2: only 𝛽+-. is consistent (OLS estimator is not consistent)
The suggestion then is to examine 𝑑 = 𝛽+-. − 𝛽+oA@ 𝐻y: 𝑝𝑙𝑖𝑚𝑑=0
𝐻y: 𝑝𝑙𝑖𝑚𝑑≠0
11/23/20 Lecture 20 GR5411 by Seyhan Erden 20
The Logic of the HausmanTest for Endogeneity:
OLS fails under Endogeneity (correlation between regressor and errors). How do we test for the presence of endogeneity? Hausman test.
So the null hypothesis is 𝐶𝑜𝑣 𝑋X , 𝑢X = 0. There are several
forms of this test.
𝐻=𝛽+-.−𝛽+oA@′ 𝑋>′𝑋>12−𝑋0𝑋12 𝑠?
12
𝛽+-.−𝛽+oA@
However, the ordinary inverse of 𝑋>′𝑋> 12 − 𝑋0𝑋 12
will not exist, we need generalized inverse.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 21
In the regression 𝑌X = 𝛽y + 𝛽2𝑋X + 𝑢X, we wish to know whether 𝑋X is correlated with 𝑢X. Let 𝑍2 and 𝑍? be instrumental variables for 𝑋. Then carry out the following steps:
1. Estimate the first-stage model: 𝑋 = 𝛾y +
𝛾2𝑍2 + 𝛾?𝑍? + 𝜀 by OLS, including all IVs
and all exogenous variables on the right hand
side. Obtain the residuals 𝜀̂ = 𝑋 − 𝛾Sy −
𝛾S 𝑍 − 𝛾S 𝑍 , if more than one explanatory 22 ??
variable is being tested for endogeneity, repeat this estimation for each one.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 22
2. Include the residual computed in step 1 as an explonatory variable in the original regression. 𝑌 = 𝛽y + 𝛽2𝑋 + 𝛿𝜀̂ + 𝑒. Estimate this “artificial regression” by OLS, and employ the usual t-test for the hypothesis of significance. 𝐻y: 𝛿 = 0 (no correlation between 𝑋 and 𝑒) 𝐻2: 𝛿 ≠ 0 (correlation between 𝑋 and 𝑒)
3. If more than one variable is being tested for endogeneity, the test will be an F-test of joint significance of the coefficients on the included residuals.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 23
The Logic of the Hausman Test for Endogeneity:
Let us explore how this test works. The simple regression model:
𝑌 = 𝛽y + 𝛽2𝑋 + 𝑒 (1)
If X is correlated with error then 𝑋 is endogenous and OLS
estimator 𝛽+2 is biased and inconsistent.
An instrumental variable 𝑍 must be correlated with 𝑋 but uncorrelated with 𝑒 in order to be valid. A correlation between 𝑋 and 𝑍 implies
𝑋 = 𝛾y + 𝛾2𝑍 + 𝜀 (2) This is the first stage. Hence 𝐸 𝑋 𝑍 = 𝛾y + 𝛾2𝑍.
We can divide 𝑋 into two parts, a systematic part and a random part:
𝑋=𝐸𝑋𝑍+𝜀 (3)
Then we can write the eq (1) as
𝑌 = 𝛽y + 𝛽2 𝐸 𝑋 𝑍 + 𝜀 + 𝑒 =𝛽y+𝛽2𝐸𝑋𝑍 +𝛽2𝜀+𝑒
11/23/20 Lecture 20 GR5411 by Seyhan Erden 24
The Logic of the Hausman Test for Endogeneity:
We do not know 𝛾y and 𝛾2, however we can consistently estimate first stage by OLS.
and the residuals
y2
>p
𝑋=𝐸𝑋𝑍 =𝛾S +𝛾S𝑍
𝜀 ̂ = 𝑋 − 𝑋>
Rearranged these to obtain an estimated analog of (3)
𝑋 = 𝑋> + 𝜀 ̂ ( 4 )
Substitute (4) into (1)
𝑌 = 𝛽 y + 𝛽 2 𝑋> + 𝜀 ̂ + 𝑒
𝑌 = 𝛽 y + 𝛽 2 𝑋> + 𝛽 2 𝜀 ̂ + 𝑒
11/23/20 Lecture 20 GR5411 by Seyhan Erden 25
The Logic of the Hausman Test for Endogeneity:
To avoid 𝛽2appearing twice, let the coefficient of 𝜀̂ be denoted by 𝛾 >
𝑌 = 𝛽y + 𝛽2𝑋 + 𝛾𝜀̂ + 𝑒 (5)
If we omit 𝜀̂ from (5), we have the second stage
𝑌 = 𝛽 y + 𝛽 2 𝑋> + 𝑒 ( 6 )
The least squares estimates of 𝛽y and 𝛽2 in (6) are the IV/2SLS estimates. Recall that if we omit a variable from a regression that is uncorrelated with the included variables there is no omitted variable bias (OVB). This holds for (6) because 𝑋> and 𝜀̂ are uncorrelated. Hence the least squares estimates of 𝛽y and 𝛽2 are consistent.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 26
The Logic of the Hausman Test for Endogeneity:
What about 𝛾?
If 𝑋 is exogenous, and hence 𝜀 and 𝑒 are uncorrelated then LS
estimator of 𝛾 will also converge, in large samples to 𝛽2, however If 𝑋 is endogenous then LS estimator of 𝛾 will not converge to 𝛽2
in large samples because 𝜀̂ like 𝜀, is correlated with error term 𝑒. This observation makes it possible to test for whether X is
exogenous by testing the equality of the estimates of 𝛽2and 𝛾 If we reject the null hypothesis, 𝐻y: 𝛽2 = 𝛾, then we reject the
exogeneity of 𝑋.
Carrying out the test is made simpler by playing a trick on (5). Add
and subtract 𝛽2𝜀̂ to the right-hand side to obtain
𝑌 = 𝛽 y + 𝛽 2 𝑋> + 𝛾 𝜀 ̂ + 𝑒 + 𝛽 2 𝜀 ̂ − 𝛽 2 𝜀 ̂ =𝛽y+𝛽2 𝑋>+𝜀̂ + 𝛾−𝛽2 𝜀̂+𝑒
=𝛽y+𝛽2 𝑋>+𝜀̂ +𝛿𝜀̂+𝑒
Thustesting𝐻y: δ=0isthesameas𝐻y: 𝛽2 =𝛾
11/23/20 Lecture 20 GR5411 by Seyhan Erden 27
Overidentifying test (Exogeneity test), J-Test:
The 𝐽 − statistic tests the null hypothesis that all the overidentifying restrictions hold against the alternative that some or all of them do not hold.
Take the model 𝑦 = 𝑋𝛽 + 𝜀, say a part of or all of 𝑋 is endogenous, hence we decide to use instrumental variables, 𝑍. We run the TSLS and obtain the residuals from 𝜀̂ = 𝑦 − 𝑋>𝛽+?@A@. Then regress 𝜀̂ on 𝑍 as such:
𝜀̂ = 𝑍𝛼 + 𝑢. The null hypothesis: 𝐻y: 𝛼=𝑞
where 𝛼 is 𝑙×1 vector of OLS estimators from the above regression and 𝑞 is 𝑙×1 vector of zeros. 𝐹 statistic from this multiplied by 𝑙 has chi-squared
distribution: 𝑙×𝐹 →g Χ? ä1ã
11/23/20 Lecture 20 GR5411 by Seyhan Erden 28
Overidentifying test (Exogeneity test), J-Test:
The idea of this test is, if all overidentifying restrictions hold, that is if IV’s are really exogenous then the errors will be uncorrelated with the instruments, and thus a regression of errors on IV’s will have a population regression coefficients 𝛼0𝑠 that all equals zero.
In practice, however, errors are not observed, but they can be estimated by the 2SLS residuals 𝜀̂ , so a regression of 𝜀̂ on 𝑍 would yield statistically significant coefficients if IV’s are not exogenous.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 29
Overidentifying test (Exogeneity test), J-Test:
Accordingly, the 2SLS 𝐽 − 𝑡𝑒𝑠𝑡 is the homoskedasticity only F statistics testing the hypothesis that coefficients of 𝑍’s are all zero, in the regression of 𝜀̂ on 𝑍’s, multiplied by 𝑙 (number of IV’s) so that the F – statistic is in its asymptotic chi – squared form:
𝐽=𝑙×𝐹=𝑙 𝑆𝑆𝑅é−𝑆𝑆𝑅è =𝑙𝜀̂0𝑃ë𝜀̂ 𝑆𝑆𝑅è⁄ 𝑘 − 𝑙 𝜀̂0𝑀ë𝜀̂
Under𝐻y:𝐸𝑧X𝜀X =0
𝐽 →g Χ ?
11/23/20 Lecture 20 GR5411 by Seyhan Erden
30
ä1ã
Proofof𝐽→gΧ? : ä1ã
Recall that Under𝐻y:𝐸𝑧X𝜀X =0
2 𝑍′𝜀 has mean zero and CLT applies so
8 1 𝑍 0 𝜀 →g 𝑁 0 , 𝜎 ? 𝑄 ë ë
In addition:
82 𝑍0𝑍 →] 𝑄ëë and 82 𝑍0𝑋 →] 𝑄ëj. Thus,
𝑍0𝑍 12/?𝑍0𝜀 = 𝑍0𝑍/𝑛 12/? 𝑍0𝜀/ 𝑛 →g 𝜎𝑧
where𝑧→g 𝑁0ä,𝐼ä.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 31
𝑛
Proofof𝐽→gΧ? : ä1ã
In addition,
𝑍0𝑍 12/?𝑍0𝑋⁄ 𝑛 ] 12/? =𝑍0𝑍/𝑛12/?𝑍0𝑋/𝑛→𝑄ëë 𝑄ëj
𝜀̂0𝑃ë𝜀 →g 𝑧′𝑀óôö/õóòú𝑧 𝜎? òò
𝜀 ̂ 0 𝑀 ë 𝜀 ̂ →] 𝜎 ?
𝐽= lSkùòlS →g Χ?
lS k û ò lS / ( ä 1 ã ) ä 1 ã
11/23/20
Lecture 20 GR5411 by Seyhan Erden
32
Back to College Proximity Example:
Recall David Card (1995) suggested the proximity to college as an IV.
Is education really endogenous? Let’s test this! (Endogeneity test)
Is it relevant? If a potential student lives close to college, this increases likelihood that the student will attend college. Let’s test this! (𝑭 > 𝟏𝟎 )
Is it exogenous? College proximity does not directly affect a student’s skills or abilities, so should not have direct effect on her or his market
wage. Let’s test this! (𝑱 − 𝒕𝒆𝒔𝒕)
11/23/20 Lecture 20 GR5411 by Seyhan Erden 33
College Proximity OLS results
. reg lwage educ exper expersq_scaled black south urban, r
Linear regression Number of obs = F(6, 3003) = Prob>F = R-squared = Root MSE =
3,010
217.74
0.0000
0.2905
.37419
——————————————————————————– | Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] —————+—————————————————————- educ| .074009 .003642 20.32 0.000 .0668679 .0811501 exper| .0835958 .0067326 12.42 0.000 .0703948 .0967969 expersq_scaled | -.2240885 .0318114 -7.04 0.000 -.2864627 -.1617142 black | -.1896315 .0174324 -10.88 0.000 -.2238123 -.1554508 south | -.1248615 .0153508 -8.13 0.000 -.1549606 -.0947625 urban| .161423 .0151751 10.64 0.000 .1316683 .1911776 _cons| 4.733664 .0701577 67.47 0.000 4.596102 4.871226 ——————————————————————————–
11/23/20 Lecture 20 GR5411 by Seyhan Erden 34
Card’s College Proximity Example 2SLS results and Endogeneity Tests:
ivregress 2sls lwage (educ = nearc4 age) exper expersq_scaled black south urban, r note: age dropped due to collinearity
Instrumental variables (2SLS) regression
Number of obs =
Wald chi2(6) =
Prob > chi2 =
R-squared =
Root MSE =
3,010
792.07
0.0000
0.2252
.39058
——————————————————————————– | Robust
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval] —————+—————————————————————- educ| .1322888 .0485213 2.73 0.006 .0371888 .2273889 exper| .107498 .0211129 5.09 0.000 .0661175 .1488785 expersq_scaled | -.2284072 .0346338 -6.59 0.000 -.2962883 -.1605261 black | -.1308019 .0514513 -2.54 0.011 -.2316445 -.0299592 south | -.1049005 .0228997 -4.58 0.000 -.1497831 -.0600179 urban| .1313237 .0297684 4.41 0.000 .0729787 .1896686 _cons| 3.752781 .8167498 4.59 0.000 2.151981 5.353582 ——————————————————————————–
Instrumented: educ
Instruments: exper expersq_scaled black south urban nearc4
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1)
Robust regression F(1,3002)
= 1.60908 (p = 0.2046)
= 1.60609 (p = 0.2051)
11/23/20
Lecture 20 GR5411 by Seyhan Erden 35
Acemoglu et. al. Example
2SLS results and Endogeneity Tests:
. ivregress 2sls loggdp (risk= mortnaval1), r Instrumental variables (2SLS) regression
Number of obs = 53
Wald chi2(1) = 6.35
Prob > chi2 = 0.0118
R-squared = .
Root MSE = 1.115
—————————————————————————— | Robust
loggdp | Coef. Std. Err. z P>|z| [95% Conf. Interval] ————-+—————————————————————- risk| 1.071826 .4254925 2.52 0.012 .237876 1.905776 _cons| .938585 2.823759 0.33 0.740 -4.595882 6.473052 ——————————————————————————
Instrumented: risk
Instruments: mortnaval1
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1)
Robust regression F(1,50)
= 5.3865 (p = 0.0203)
= 3.49527 (p = 0.0674)
11/23/20
Lecture 20 GR5411 by Seyhan Erden 36
Relevance Test:
We run the first stage 𝑋 = 𝑍Γ + 𝑉 The null hypothesis:
𝐻y: Γ=𝑞 where 𝑞 is 𝑙×1 vector of zeros.
𝐹 test statistic for this null hypothesis needs to be larger than 10, for us to call 𝑍’s relevant.
We assume single problematic endogenous 𝑋 here.
If there are more than one, then the test needs to be
done for each 𝑋.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 37
Relevance test for Card’s
. reg educ nearc4 nearc2 exper expersq_scaled black south urban,r
Linear regression Number of obs = F(7, 3002) = Prob>F = R-squared = Root MSE =
3,010
522.11
0.0000
0.4748
1.9421
——————————————————————————– | Robust
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] —————+—————————————————————- nearc4| .3312388 .0805747 4.11 0.000 .1732516 .4892261 nearc2| .1076585 .0731378 1.47 0.141 -.0357467 .2510637 exper| -.409533 .0319212 -12.83 0.000 -.4721226 -.3469435 expersq_scaled| .06956 .1699638 0.41 0.682 -.2636974 .4028174 black| -1.01298 .0877548 -11.54 0.000 -1.185046 -.8409146 south | -.2786568 .0787816 -3.54 0.000 -.4331282 -.1241855 urban| .3886608 .0856653 4.54 0.000 .2206922 .5566295 _cons| 16.62244 .1494693 111.21 0.000 16.32936 16.91551 ——————————————————————————–
. test nearc4 nearc2
(1) nearc4=0 (2) nearc2=0
F( 2, 3002) =
Prob > F =
9.72 < 10 (so NO! nearc4 and nearc2 are not relevant)
0.0001
11/23/20
Lecture 20 GR5411 by Seyhan Erden 38
J – test (exogeneity) for Card’s
. ivreg lwage (educ = nearc4 nearc2) exper expersq_scaled black south urban,r
Instrumental variables (2SLS) regression Number of obs = F(6, 3003) = Prob>F = R-squared = Root MSE =
3,010
119.78
0.0000
0.1455
.41065
——————————————————————————– | Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] —————+—————————————————————- educ| .1608487 .0485705 3.31 0.001 .0656139 .2560835 exper| .1192112 .0213279 5.59 0.000 .0773923 .16103 expersq_scaled | -.2305236 .036906 -6.25 0.000 -.3028872 -.15816 black | -.1019726 .0520797 -1.96 0.050 -.2040881 .0001429 south | -.0951187 .0234332 -4.06 0.000 -.1410654 -.049172 urban| .1165736 .0302929 3.85 0.000 .0571767 .1759705 _cons| 3.272102 .8178286 4.00 0.000 1.668541 4.875663 ——————————————————————————–
Instrumented: educ
Instruments: exper expersq_scaled black south urban nearc4 nearc2 ——————————————————————————–
. predict eps_hat, resid
11/23/20 Lecture 20 GR5411 by Seyhan Erden 39
J – test (exogeneity) for Card’s
. reg eps_hat nearc4 nearc2 exper expersq_scaled black south urban,
Linear regression Number of obs = F(7, 3002) = Prob>F = R-squared = Root MSE =
r
——————————————————————————– | Robust
eps_hat | Coef. Std. Err. t P>|t| [95% Conf. Interval] —————+—————————————————————- nearc4 | -.0109657 .0172524 -0.64 0.525 -.0447934 .0228621 nearc2| .023575 .0155222 1.52 0.129 -.0068603 .0540103 exper| .0001006 .0072761 0.01 0.989 -.014166 .0143672 expersq_scaled | -.0007784 .0366433 -0.02 0.983 -.0726268 .0710701 black | -.0015923 .0185915 -0.09 0.932 -.0380457 .034861 south| .0013469 .0167946 0.08 0.936 -.0315833 .034277 urban | -.0001166 .0178718 -0.01 0.995 -.0351588 .0349256 _cons | -.0031499 .0364562 -0.09 0.931 -.0746315 .0683318 ——————————————————————————–
. test nearc4 nearc2 (1) nearc4=0 (2) nearc2=0
F( 2, 3002) = 1.33 → 𝑱 = 𝟐×𝟏.𝟑𝟑 = 𝟐.𝟔𝟔 check the chi-sq table (1 df) = Prob > F = 0.2653
CHECK THE TABLE ON THE NEXT SLIDE (DO NOT REJECT Ho at 1% 5% and 10% Z’s are exogenous)
11/23/20 Lecture 20 GR5411 by Seyhan Erden 40
3,010
0.38
0.9138
0.0009
.41054
11/23/20 Lecture 20 GR5411 by Seyhan Erden 41
Generalized Method of Moments
We will conclude with a discussion of efficient IV estimation and the test of over identifying restrictions when the errors are heteroskedastic – a situation in which the efficient IV estimator is known as the efficient generalized method of moments (GMM) estimator.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 42
GMM Estimation in Linear Models:
If the errors are heteroskedastic, then 2SLS estimator is NOT efficient.
Under heteroskedasticity GMM estimator is efficient.
In addition, if errors are heteroskedastic, then the 𝐽 − statistic as defined above no longer has a chi– squared distribution. However, an alternative formulation of the 𝐽 − statistic, constructed using the efficient GMM estimator, does have a chi– squared distribution with 𝑙 − 𝑘 degrees of
freedom.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 43
GMM Estimation:
GMM estimation is a general method for the estimation of parameters of linear or nonlinear models, in which the parameters are chosen to provide the best fit to multiple equations, each of which sets a sample moment to zero.
These equations are called moment conditions (typically cannot all be satisfied simultaneously). The GMM estimator trades off the desire to satisfy each of the equations by minimizing a
quadratic objective function.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 44
In the linear IV regression model with exogenous variables Z, the class of GMM estimators consists of all the estimators that are solutions to the quadratic minimization problem given above by
𝑚𝑖𝑛d𝑆= 𝑌−𝑋𝑏 0𝑍𝐴𝑍′ 𝑌−𝑋𝑏
𝑆 = 𝑌0𝑍𝐴𝑍0𝑌 − 𝑌0𝑍𝐴𝑍0𝑋𝑏 − 𝑏0𝑋0𝑍𝐴𝑍0𝑌 + 𝑏0𝑋′𝑍𝐴𝑍0𝑋𝑏
where A is a symmetric positive definite matrix, and let 𝛽+-. denote the estimator that minimizes 𝑆 and we
b +-.001200 found that 𝛽b = 𝑋 𝑍𝐴𝑍 𝑋 𝑋 𝑍𝐴𝑍 𝑌
Comparing this with TSLS estimator shows that TSLS
estimatoristheIVestimatorwith𝐴= 𝑍0𝑍12
11/23/20 Lecture 20 GR5411 by Seyhan Erden 45
GMM Estimation under IV Model:
Thus the class of GMM estimators based on the full set of instruments 𝑍, with different – weight matrices 𝐴, is the same as class of IV estimators in which the instruments are linear combinations of 𝑍.
In the linear IV regression model, GMM is just another name for the class of estimators we have been studying – that is, estimators that solve
𝑚𝑖𝑛d𝑆= 𝑌−𝑋𝑏 0𝑍𝐴𝑍′ 𝑌−𝑋𝑏
11/23/20 Lecture 20 GR5411 by Seyhan Erden 46
Asymptotically efficient GMM estimator
Recall that if the errors are homoskedatic, then
Ω = 𝜎? 𝑄ëë
Then the variance of 𝛽+-. is b
𝑉b-. =𝜎? 𝑄jë𝐴𝑄ëj 12𝑄jë𝐴𝑄ëë𝐴𝑄ëj 𝑄jë𝐴𝑄ëj 12
And the asymptotically efficient weight matrix is obtained by setting 𝐴 = 𝑍0𝑍 12. In large samples this is equivalent to using the weight 𝐴 =
𝜎? 𝑄ëë 12 = Ω12
Hence, 𝑉b-. becomes
𝑉b-. = 𝜎? 𝑄jë 𝑄ëë 12𝑄ëj 12
11/23/20 Lecture 20 GR5411 by Seyhan Erden 47
Asymptotically efficient GMM estimator
This representation of the 2SLS estimator suggests
that, by analogy, the efficient IV estimator under
heteroskedasticity can be obtained by setting
𝐴 = Ω12
𝑚𝑖𝑛d𝑆= 𝑌−𝑋𝑏 0𝑍Ω12𝑍′ 𝑌−𝑋𝑏
The solution is the efficient GMM estimator, can be
and solving
found by replacing 𝐴 with Ω12 in the solution
above 𝛽+ -. . TMb
𝛽t ́ ́ ̈ûû = 𝑋0𝑍Ω12𝑍0𝑋 12𝑋0𝑍Ω12𝑍0𝑌
11/23/20 Lecture 20 GR5411 by Seyhan Erden 48
Asymptotically efficient GMM estimator
The asymptotic distribution of 𝛽TMt ́ ́ ̈ûû can be obtained by substituting 𝐴 = Ω12 to earlier result,
that is,
where
Proof follows by showing
𝑐′𝑉b-.𝑐 ≥ 𝑐0𝑉t ́ ́ ̈ûû𝑐
for all vectors 𝑐. Next we show proof of efficiency
of 2SLS, proof of efficiency of GMM is the same…
𝑛 𝛽TMt ́ ́ ̈ûû −𝛽 →g 𝑁 0,𝑉t ́ ́ ̈ûû
𝑉t ́ ́ ̈ûû = 𝑄jëΩ12𝑄ëj 12
11/23/20 Lecture 20 GR5411 by Seyhan Erden 49
Proof of efficiency of 2SLS estimator
First we will prove the efficiency of 2SLS under homoskedasticity, efficiency of GMM is very similar prrof of which we will do later.
When errors are homoscedastic the difference between 𝑉b-. and 𝑉E@A@ is given by
𝑉b-. − 𝑉E@A@ = 𝜎? 𝑄jë𝐴𝑄ëj 12𝑄jë𝐴𝑄ëë𝐴𝑄ëj 𝑄jë𝐴𝑄ëj 12
− 𝑄jëΩ12𝑄ëj 12
11/23/20 Lecture 20 GR5411 by Seyhan Erden 50
Proof of efficiency of 2SLS estimator
Then we can rewrite the final expression above
𝑉b-. − 𝑉E@A@
=𝜎? 𝑄 𝐴𝑄 12𝑄 𝐴𝐹0 𝐼−𝐹12k𝑄 𝑄 𝐹12𝐹12k𝑄 12𝑄 𝐹12 jë ëj jë ëj jë ëj jë
×𝐹𝐴𝑄ëj 𝑄jë𝐴𝑄ëj 12 where the 2nd expression within the brackets uses 𝐹0𝐹120 = 𝐼
Thus, where and
𝑐0 𝑉b-.−𝑉E@A@ 𝑐=𝜎?𝑑′𝐼−𝐷𝐷0𝐷12𝐷′𝑑
𝑑 = 𝐹𝐴𝑄ëj 𝑄jë𝐴𝑄ëj 12𝑐
𝐷 = 𝐹12k𝑄ëj
Now, 𝐼 − 𝐷 𝐷0𝐷 12𝐷 is a symmetric idempotent matrix, hence 𝐼 − 𝐷 𝐷0𝐷 12𝐷 has eigenvalues that are 0 or 1, and 𝑑′ 𝐼 − 𝐷 𝐷0𝐷 12𝐷′ 𝑑 ≥ 0. Thus, 𝑐0°𝑉b-. − 𝑉E@A@±𝑐 ≥ 0 proving that TSLS is efficient under homoskedasticity.
11/23/20 Lecture 20 GR5411 by Seyhan Erden 51
Proof of efficiency of GMM estimator
Now, efficiency of infeasible GMM estimator proof: We must show that
𝑐0 𝑉b-.−𝑉t ́ ́ ̈ûû 𝑐≥0
When errors are homoscedastic the difference between 𝑉b-. and
𝑉E@A@ is given by
𝑉b-. − 𝑉t ́ ́ ̈ûû = 𝜎? 𝑄jë𝐴𝑄ëj 12𝑄jë𝐴𝑄ëë𝐴𝑄ëj 𝑄jë𝐴𝑄ëj 12
−𝜎? 𝑄jë 𝑄ëë 12𝑄ëj 12
=𝜎? 𝑄jë𝐴𝑄ëj 12𝑄jë𝐴 𝑄ëë − 𝑄ëj 𝑄jë 𝑄ëë 12𝑄ëj 12𝑄jë
𝐴𝑄ëj 𝑄jë𝐴𝑄ëj
where the 2nd term within the brackets follows from
𝑄jë𝐴𝑄ëj 12𝑄jë𝐴𝑄ëj = 𝐼
Let 𝐹 be the matrix of the square root of 𝑄ëë so 𝑄ëë = 𝐹0𝐹 and
𝐹12𝐹12.
11/23/20 Lecture 20 GR5411 by Seyhan Erden
12
𝑄ëë 12 =
52
Maximum Likelihood Estimation:
Probability density function (pdf) of a RV, 𝑦, conditioned on a set of parameters, 𝜃, is denoted as 𝑓 𝑦 𝜃 . This function
𝑓𝑦𝜃
identifies the data generating process that underlies an observed sample of data and, at the same time, provides a mathematical description of the data that the process will produce.
11/23/20 Lectures 17 GR5411 by Seyhan Erden 53
The joint density of n independent and identically distributed (i.i.d.). observations from this process is the product of individual densities,
8
𝑓 𝑦2,…,𝑦8 𝜃 =μ𝑓 𝑦X|𝜃 =𝐿(𝜃|𝑦) XY2
This joint density is the likelihood function.
Note that we write the joint density as a function of the data conditioned on the parameters 𝑓 𝑦 𝜃 whereas the likelihood function in reverse, as a function of parameters conditioned on data 𝐿(𝜃|𝑦) . Reason?
11/23/20 Lectures 17 GR5411 by Seyhan Erden 54
Reason?
Two functions the joint density and the likelihood function are the same, but we write the likelihood as function of parameters conditioned on data 𝐿(𝜃|𝑦), in order to emphasize our interest in the parameters and the information about them that is contained in the observed data.
11/23/20 Lectures 17 GR5411 by Seyhan Erden 55
It is simpler to work with the log of the likelihood function 8
𝐿𝑛𝐿𝜃𝑦 = W𝐿𝑛𝑓𝑦X|𝜃 XY2
Again, to emphasize our interest in the parameters, given the observed data, we denote 𝑳 𝜽 𝒅𝒂𝒕𝒂 = 𝑳 𝜽 𝒚 , the likelihood function and its logarithm, evaluated at 𝜽.
Sometimes, this is denoted as just 𝐿(𝜃) and 𝐿𝑛 𝐿(𝜃), respectively, or when it is clear, just 𝐿 and 𝐿𝑛 𝐿
11/23/20 Lectures 17 GR5411 by Seyhan Erden 56