DS-UA 201: Causal Inference: More Instrumental Variables
Parijat Dube
New York University Center for Data Science
August 15, 2022
Copyright By PowCoder代写 加微信 powcoder
Acknowledgement: Slides including material from DS-US 201 Fall 2021 offered by .
Experiments with non-compliance
Last lecture we talked about randomized experiments where units do not comply with the treatment.
▶ E.g., randomizing students to attend a class: some don’t go.
We would like to estimate the treatment effect, However unobserved confounders could be affecting both compliance and outcome.
The setup:
▶ Zi is our binary instrument
▶ Di(z) is our binary treatment, which now has potential outcomes
▶ Yi(d,z) are potential outcomes, defined in terms of both treatment and instrument
The IV assumptions
We discussed the following assumptions:
▶ Randomization of instrument: Zi Di(z), (d,z) ▶ Exclusion restriction: Yi (d , z ) = Yi (d )
▶ First-stage relationship: E [Di (1) − Di (0)] ̸= 0
▶ Monotonicity: Di (1) > Di (0)
We showed that under the IV assumptions the Local Average Treatment Effect (LATE) is identified:
E[Yi(1,1)−Yi(0,0)|Di(1)>Di(0)]= E[Yi|Zi =1]−E[Yi|Zi =0]. E[Di|Zi = 1] − E[Di|Zi = 0]
▶ This is the treatment effect on the compliers
▶ Likely not representative of the entire sample, but still useful
▶ How do we estimate the LATE? ▶ What are weak instruments?
Binary instrument
With a binary instrument, we can use the sample analog of the LATE:
τˆ = Eˆ[Yi|Zi =1]−Eˆ[Yi|Zi =0],
Eˆ[Di|Zi =1]−Eˆ[Di|Zi =0]
E[Di|Zi = z] = N
and Eˆ[Yi|Zi = z] is defined in an analogous way.
Di⊮(Zi = z),
▶ The numerator is an estimate of the intent-to-treat effect
Eˆ[Yi|Zi =1]−Eˆ[Yi|Zi =0]
▶ The denominator is an estimate of the first-stage effect (effect of instrument on the treatment)
Eˆ[Di|Zi =1]−Eˆ[Di|Zi =0]
The Wald estimator
The estimator we just saw can be generalized to settings where Zi can have many different values:
Cov(Yi,Zi) τˆ =
Cov(Di,Zi)
Where Cov(Yi,Zi) is the sample covariance of Yi and Zi and
Cov(Di,Zi)isthesamplecovarianceofZi andDi.
The Wald Estimator as a ratio of regressions
Recall that when we only have one regressor, i.e.: Yi = Xi β + εi , then the estimated regression coefficient can be written as:
Cov(Yi,Xi) βˆ =
Because of this, we can write the Wald estimator as a ratio of
two regression coefficients:
Cov(Yi,Zi)/Var(Zi) IV
Cov(Di,Zi)/Var(Zi)
▶ The numerator is the regression coefficient from a regression
of Y on Z.
▶ The denominator is the regression coefficient from a regression of D on Z.
Properties of the Wald estimator
The Wald estimator satisfies two of the three usual statistical properties we like:
▶ It is consistent for the LATE ▶ and it is asympotically normal
▶ There exists an analytical formula for the variance ▶ But the bootstrap also works
But it is biased in small samples.
▶ Bias inversely depends on Cov(Zi,Di): the smaller this
covariance, the larger the bias.
Regardless, it only requires mild assumptions:
▶ No need to assume we know the form of Yi or Di.
Regression and IV
Another way of thinking about IV is in terms of the linear model framework.
This has the useful function of allowing us to include additional covariates, Xi in our instrumental variables estimator.
▶ But it comes at a cost of having to make stronger assumptions about Yi .
IV with constant effects
Let’s write down a model for Yi with treatment Di and an unobserved confounder =α+τDi +γUi +ηi
Important: We assume that Cov (Di , ηi ) = 0 – in other words, if
we knew Ui, we’d be able to estimate this directly and get τ:
▶ Same as assuming E [ηi |Di ] = 0, as we usually do in regression.
The instrumental variable
However, what we actually have is
Yi =α+τDi +εi
εi =γUi +ηi
Our assumption is violated! Since Ui is a confounder, Cov(Di,γUi +ηi)̸=0andthebivariateregressionofYi onDi will not identify the causal effect.
▶ ηi is just statistical noise with E [ηi |Di ] = 0 ▶ But E[εi|Di] = γE[Ui|Di] ̸= 0.
The instrumental variable
Our model is:
Yi =α+τDi +εi
▶ Note that Zi does not appear directly in the equation for Yi
(exclusion restriction).
Furthermore, if instrument Zi satisfies exogeneity and the exclusion restriction, we can say:
Cov(Zi,γUi +ηi)=0
Unlike Di , Zi is not correlated with the error term εi .
We can use the properties of covariances to get an expression for τ in terms of Cov(Yi,Zi)
Cov(Yi,Zi)=Cov(α+τDi +γUi +ηi,Zi) =Cov(α,Zi)+Cov(τDi,Zi)+Cov(γUi +ηi,Zi) = 0 + τ Cov (Di , Zi ) + 0,
Therefore:
This is the LATE!
τ = Cov(Yi,Zi), Cov(Di,Zi)
▶ And we can estimate it with the Wald estimator we saw before.
IV with covariates
What if we have covariates that we want to include in both the instrument and outcome regressions?
▶ The instrument is valid only conditional on Xi : Yi (d, z) Di |Xi
▶ We want to improve the precision of our estimates (maybe Zi is a stronger instrument conditional on Xi ):
Cov(Zi,Di|Xi) ≥ Cov(Zi,Di)
▶ Or we have multiple instruments: Zi = [Ai , Bi ].
Regrssion offers a simple way to include covariates in IV estimators.
IV with covariates
We can generalize the IV estimator with two linear equations – one for the outcome Yi and the other for the treatment Di
D =X′α+γZ +ν iiii
Y =X′β+τD +ε iiii
Xi goes into both equations (they’re neither instruments nor treatments).
Plug one into the other to get the “reduced form” equation Y =X′β+τ[X′α+γZ +ν]+ε
iiiiii =Xi′β+τ[Xi′α+γZi]+[τνi +εi]
▶ So the LATE, τ is the coefficient on the term: (Xi′α + γZi ), ▶ And [τ νi + εi ] is a combined statistical error term
IV with covariates
Our outcome model is:
Y =X′β+τ[X′α+γZ]+[τν +ε]
▶ We can estimate Xi′α + γZi , and ▶ Cov(Xi′α+γZi,τνi +εi|Xi)=0…
Then we can estimate τ by:
1. First estimating Xi′α + γZi ,
2. Then running a regression of Yi on Xi and Xi′α + γZi
Can we estimate X′α+γZi? i
Back to our assumption on Di :
D =X′α+γZ +ν,
iiii With E [νi |Xi , Zi ] = 0, therefore:
E [D |X , Z ] = X ′α + γZ iiiii
It’s the predicted value from the regression of Di on Xi and Zi !
▶ We can estimate it consistently by regressing Di on Xi and Zi
to obtain coefficient estimates αˆ and γˆ
▶ And then obtaining predicted values: Dˆi = Xi′αˆ + γˆZi
IsX′α+γZi independentoftheerrorterm? i
Recall our models:
D =X′α+γZ +ν iiii
Y =X′β+τD +ε iiii
By randomization and exclusion-restriction: Cov(Zi,νi|Xi) = 0, and Cov(Zi,εi|Xi) = 0,
therefore:
Cov(Xi′α+γZi,τνi +εi|Xi)=Cov(Xi′α,τνi +εi|Xi) +Cov(γZi,τνi|Xi)+Cov(γZi,εi|Xi)
= 0+τγCov(Zi,νi|Xi)+γCov(Zi,εi|Xi) = 0.
Two-stage least squares
Stage 1: Regress treatment Di on instrument Zi and covariates Xi using an OLS estimator
E [D |X , Z ] = X ′α + γZ iiiii
Get the predicted values from the regression Dˆi = Xi′αˆ + γˆZi . Stage 2: Regress outcome Yi on the fitted values Dˆi and
covariates Xi
E [ Y | X , Dˆ ] = X ′ β + τ Dˆ iiiii
The coefficient on Dˆi is our estimate of the effect of Di using IV.
Two-stage least squares SEs
▶ However, this isn’t exactly what a canned 2SLS procedure (like iv robust in estimatr will do). The point estimate will be correct, but the standard errors in that second regression will be wrong.
▶ Let X be the matrix of all second-stage regressors and D. Let Z be the matrix of all first stage regressors and Z (the covariates appear in both).
▶ We can write the linear projection from the first stage (the fitted values) as
Xˆ = Z(Z′Z)−1Z′X
▶ Then the second stage coefficients can be estimated by substituting the projection Z(Z′Z)−1Z′X for X – 2SLS routines will do this all in one step (and get correct SEs).
Forbidden regressions
IV analysis with 2SLS is extremely popular in applied research, but there are some common pitfalls to avoid:
▶ Don’t include covariates Xi only in one stage but not the other.
▶ Don’t use non-linear transformations of the fitted values in the second stage (remember E[X2] ̸= E[X]2)
▶ Don’t use a non-linear first stage in 2SLS (expectations/linear projections don’t propagate directly through non-linear functions).
Weak instruments
Recall that one of the IV assumptions is that: E [Di (1) − Di (0)] ̸= 0,
i.e., the instrument must have some effect on the treatment. ▶ The magnitude of this effect influences the accuracy of
▶ When this effect is small we say that our instrument is weak
We will now see how and why this happens.
Weak instrument problem
Our Wald estimator is the ratio of two covariances. Cov(Yi,Zi)
Cov(Di,Zi)
What happens when that denominator is really close to 0? Wildly variable estimates!
Weak instrument problem
Can show that the Wald estimator converges in probability to τˆ →τ+Cov(Zi,Ui)
Cov(Zi,Di)
▶ Since Cov(Zi,Ui) is 0 by the exclusion restriction/exogeneity,
IV is *consistent*.
▶ But in finite-samples, that bias term can be large. If the the instrument is weak (Cov (Zi , Di ) ≈ 0), the divergence from the true treatment effect can sometimes be worse than just the naive regression of Yi on Di !
Diagnosing weak instruments
Simulation evidence (Stock and Yogo, 2005) suggests thresholds for how “strong” our first-stage relationship should be to get valid t-ratio inference.
▶ Threshold criteria based on the F-test statistic of excluding the variables from the first-stage regression.
▶ Stock and Yogo (2005) suggest that when the first-stage F-statistic is 10 or more, the relative bias of IV is small (though stronger = better).
▶ However! Lee, McCrary, Moreira, Porter (2020) show that this is far too low – need closer to an F of 104.7! – in general, the
usual asymptotic CIs of τˆ ± 1.96 × SE(ˆτ) will under-cover the
true value.
▶ Weak-instrument-robust confidence intervals are increasingly more common.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com