MFIN6201
Empirical Techniques and Applications in
Finance
DOBRA
Week 5
Be•e•e•Ba
March 19, 2018
Outline
• IV Regression: Why and What; Two Stage Least Squares • The General IV Regression Model
• Checking Instrument Validity
– Weak and strong instruments
– Instrument exogeneity
• Application: Demand for cigarettes
• Examples: Where Do Instruments Come From?
MFIN6201 – Empirical Techniques and Applications in Finance
1
IV Regression: Why?
Three important threats to internal validity are:
• Omitted variable bias from a variable that is correlated with X but is unobserved (so cannot be included in the regression) and for which there are inadequate control variables;
• Simultaneous causality bias (X causes Y, Y causes X);
• Errors-in-variables bias (X is measured with error)
All three problems result in E(u|X) 6= 0.
• Instrumental variables regression can eliminate bias when
E(u|X) 6= 0 – using an instrumental variable (IV), Z.
MFIN6201 – Empirical Techniques and Applications in Finance
2
IV Estimator with Single Regressor and Single Instrument
Yi = 0 + 1Xi +ui
• IV regression breaks X into two parts: a part that might be correlated with u, and a part that is not. By isolating the part that is not correlated with u, it is possible to estimate 1.
• This is done using an instrumental variable, Zi, which is correlated with Xi but uncorrelated with ui.
MFIN6201 – Empirical Techniques and Applications in Finance
3
Terminology: Endogeneity and Exogeneity
An endogenous variable is one that is correlated with u An exogenous variable is one that is uncorrelated with u
In IV regression, we focus on the case that X is endogenous and there is an instrument, Z, which is exogenous
Digression on terminology –
“Endogenous” literally means “determined within the system.” If X is jointly determined with Y, then a regression of Y on X is subject to simultaneous causality bias. But this definition of endogeneity is too narrow because IV regression can be used to address OV bias and errors-in-variable bias. Thus we use the broader definition of endogeneity above.
MFIN6201 – Empirical Techniques and Applications in Finance
4
Two Conditions for a Valid Instrument
Yi = 0 + 1Xi +ui
For an instrumental variable (an “instrument”) Z to be valid,
it must satisfy two conditions:
1. Instrument relevance: corr(Zi, Xi) 6= 0
1.a Exclusion restriction: Zi a↵ects Yi only through Xi
2. Instrument exogeneity: corr(Zi, ui) = 0
Suppose for now that you have such a Zi (we’ll discuss how to
find instrumental variables later).
How can you use Zi to estimate 1? MFIN6201 – Empirical Techniques and Applications in Finance
5
IV estimator with one X and one Z
Explanation ] 1: Two Stage Least Squares (TSLS)
As it sounds, TSLS has two stages – two regressions: (1) Isolate the part of X that is uncorrelated with u by regressing X on Z using OLS:
Xi = ⇡0 +⇡1Zi +vi
– Because Zi is uncorrelated with ui, ⇡0 + ⇡1Zi is uncorrelated with ui. We don’t know ⇡0 or ⇡1 but we have estimated them, so …
– Compute the predicted values of X , where Xc = ⇡b +⇡b Z , ii01i
i = 1,…,n.
MFIN6201 – Empirical Techniques and Applications in Finance
6
Two Stage Least Squares, ctd.
(2) Replace X by Xc in the regression of interest: ii
regress Y on Xc using OLS: i
Y = + Xc + u i01ii
• Because Xc is uncorrelated with u , the first least squares ii
assumption holds for regression (2). (This requires n to be large so that ⇡0 or ⇡1 are precisely estimated.)
• Thus, in large samples, 1 can be estimated by OLS using regression (2)
• The resulting estimator is called the Two Stage Least Squares (TSLS) estimator, bTSLS.
1
MFIN6201 – Empirical Techniques and Applications in Finance
7
Two Stage Least Squares: Summary
Suppose Zi, satisfies the two conditions for a valid instrument: 1. Instrument relevance: corr(Zi, Xi) 6= 0
2. Instrument exogeneity: corr(Zi, ui) = 0 Two-stage least squares:
Stage 1: Regress Xi on Zi (including an intercept), obtain the
predicted values Xc i
Stage 2: Regress Y on Xc (including an intercept); the ii
coe cient on Xc is the TSLS estimator, bTSLS. i1
bTSLS is a consistent estimator of 1. 1
MFIN6201 – Empirical Techniques and Applications in Finance
8
IV Estimator, one X and one Z, ctd.
Explanation ] 2: A direct algebraic derivation Yi = 0 + 1Xi +ui
Thus:
cov( 0 + 1Xi + ui, Zi) = 0 + cov( 1Xi, Zi) + 0
cov(Yi, Zi) =
= cov( 0, Zi) + cov( 1Xi, Zi) + cov(ui, Zi)
= 1cov(Xi, Zi)
where cov(ui,Zi)=0 by instrument exogeneity; thus
1 = cov(Yi, Zi) cov(Xi, Zi)
MFIN6201 – Empirical Techniques and Applications in Finance
9
IV Estimator, one X and one Z, ctd.
1 = cov(Yi, Zi) cov(Xi, Zi)
The IV estimator replaces these population covariances with sample covariances:
b T S L S = S Y Z , 1 SXZ
SY Z and SXZ are the sample covariances. This is the TSLS estimator – just a di↵erent derivation!
MFIN6201 – Empirical Techniques and Applications in Finance
10
IV Estimator, one X and one Z, ctd.
Explanation ] 3: Derivation from the “reduced form” The “reduced form” relates Y to Z and X to Z:
Xi = ⇡0 +⇡1Zi +vi Yi = 0 + 1Zi +wi
where wi is an error term. Because Z is exogenous, Z is uncorrelated with both vi and wi.
The idea: A unit change in Zi results in a change in Xi of ⇡1 and a change in Yi of 1. Because that change in Xi arises from the exogenous change in Zi, that change in Xi is exogenous. Thus an exogenous change in Xi of ⇡1 units is associated with a change in Yi of 1 units – so the e↵ect on Y of an exogenous change in X is 1 = 1 units.
11
⇡1
MFIN6201 – Empirical Techniques and Applications in Finance
The math:
IV estimator from the reduced form, ctd.
Xi = ⇡0 +⇡1Zi +vi Yi = 0 + 1Zi +wi
Solve the X equation for Z:
Zi = ⇡0 + ( 1 )Xi ( 1 )vi
⇡1 ⇡1 ⇡1 Substitute this into the Y equation and collect terms:
Yi = 0+ 1Zi+wi
= 0 + 1[⇡0 +( 1 )Xi ( 1 )vi]+wi
= [ 0 + ⇡0 1] + ( 1)Xi + [wi ( 1)vi] ⇡1⇡1 ⇡1
= 0+ 1Xi+ui
where 0 = 0 ⇡0 1, 1 = 1, and ui = wi ( 1)vi
⇡1⇡1 ⇡1 MFIN6201 – Empirical Techniques and Applications in Finance
12
⇡1 ⇡1 ⇡1
yields where
IV estimator from the reduced form, ctd.
Xi = ⇡0 +⇡1Zi +vi Yi = 0 + 1Zi +wi
Yi = 0 + 1Xi +ui, 1 = 1
⇡1
Interpretation: An exogenous change in Xi of ⇡1 units is
associated with a change in Yi of 1 units – so the e↵ect on Y of an exogenous unit change in X is 1 = 1 .
⇡1
MFIN6201 – Empirical Techniques and Applications in Finance
13
Example 1: E↵ect of Studying on Grades
What is the e↵ect on grades of studying for an additional hour per day?
Y = GPA
X = study time (hours per day)
Data: grades and study hours of college freshmen.
Would you expect the OLS estimator of 1 (the e↵ect on GPA of studying an extra hour per day) to be unbiased? Why or why not?
MFIN6201 – Empirical Techniques and Applications in Finance
14
Studying on grades, ctd.
Stinebrickner, Ralph and Stinebrickner, Todd R. (2008) “The Causal E↵ect of Studying on Academic Performance,” The B.E. Journal of Economic Analysis & Policy: Vol. 8: Iss. 1 (Frontiers), Article 14.
• n = 210 freshman at Berea College (Kentucky) in 2001 • Y = first-semester GPA
• X = average study hours per day (time use survey)
• Roommates were randomly assigned
• Z = 1 if roommate brought video game, = 0 otherwise
Do you think Zi (whether a roommate brought a video game) is a valid instrument?
15
1. Is it relevant (correlated with X)?
2. Is it exogenous (uncorrelated with u)?
MFIN6201 – Empirical Techniques and Applications in Finance
Studying on grades, ctd.
Xi = ⇡0+⇡1Zi+vi Yi = 0+ 1Zi+wi
Y = GPA (4 point scale)
X = time spent studying (hours per day)
Z = 1 if roommate brought video game, = 0 otherwise Stinebrinckner and Stinebrinckners findings
⇡c = 0.668 1
c = 0.241
1 d1
c 0.241 IV== =0.36
1 ⇡c 0.668 1
What are the units? Do these estimates make sense in a real-world way? (Note: They actually ran the regressions including additional regressors – more on this later.)
MFIN6201 – Empirical Techniques and Applications in Finance
16
Example 2: Supply and demand for butter
IV regression was first developed to estimate demand elasticities for agricultural goods, for example, butter:
ln(Qbutter) = 0 + 1 ln(P butter) + ui ii
• 1 = price elasticity of butter = percent change in quantity for a 1% change in price
• Data: observations on price and quantity of butter for di↵erent years
• The OLS regression of ln(Qbutter) on ln(Pbutter) su↵ers from ii
simultaneous causality bias (why?)
MFIN6201 – Empirical Techniques and Applications in Finance
17
Simultaneous causality bias in the OLS regression of ln(Qbutter) on butter i
ln(Pi ) arises because price and quantity are determined by the interaction of demand and supply:
MFIN6201 – Empirical Techniques and Applications in Finance
18
This interaction of demand and supply produces data like …
Would a regression using these data produce the demand curve?
MFIN6201 – Empirical Techniques and Applications in Finance
19
But…what would you get if only supply shifted?
• TSLS estimates the demand curve by isolating shifts in price and quantity that arise from shifts in supply.
• Z is a variable that shifts supply but not demand. MFIN6201 – Empirical Techniques and Applications in Finance
20
TSLS in the supply-demand example:
ln(Qbutter) = 0 + 1 ln(P butter) + ui ii
Let Z = rainfall in dairy-producing regions. Is Z a valid instrument?
(1) Relevant? corr(raini, ln(P butter)) 6= 0? i
Plausibly: insu cient rainfall means less grazing means less butter means higher prices
(2) Exogenous? corr(raini, ui) = 0?
Plausibly: whether it rains in dairy-producing regions shouldn’t a↵ect demand for butter
MFIN6201 – Empirical Techniques and Applications in Finance
21
TSLS in the supply-demand example:
ln(Qbutter) = 0 + 1 ln(P butter) + ui ii
Zi = raini = rainfall in dairy-producing regions. \
Stage 1: regress ln(P butter) on rain, get ln(P butter) ii
\
– ln(Pbutter) isolates changes in log price that arise from i
supply (part of supply, at least)
– The regression counterpart of using shifts in the supply curve to trace out the demand curve.
Stage 2: regress ln(Qbutter) on ln(P butter) ii
\
MFIN6201 – Empirical Techniques and Applications in Finance
22
Example 3: Test scores and class size
• The California test score/class size regressions still could have OV bias (e.g. parental involvement).
• In principle, this bias can be eliminated by IV regression (TSLS).
• IV regression requires a valid instrument, that is, an instrument that is:
1. relevant: corr(Zi, ST Ri) 6= 0? 2. exogenous: corr(Zi, ui) = 0?
MFIN6201 – Empirical Techniques and Applications in Finance
23
Example 3: Test scores and class size, ctd.
Here is a (hypothetical) instrument:
• some districts, randomly hit by an earthquake, “double up”
classrooms: Zi = Quakei = 1 if hit by quake, = 0 otherwise
• Do the two conditions for a valid instrument hold?
• The earthquake makes it as if the districts were in a random assignment experiment. Thus, the variation in STR arising from the earthquake is exogenous.
• The first stage of TSLS regresses STR against Quake, thereby isolating the part of STR that is exogenous (the part that is “as if” randomly assigned)
MFIN6201 – Empirical Techniques and Applications in Finance
24
Inference using TSLS
• In large samples, the sampling distribution of the TSLS estimator is normal
• Inference (hypothesis tests, confidence intervals) proceeds in the usual way, e.g. ± 1.96SE
• The idea behind the large-sample normal distribution of the TSLS estimator is that – like all the other estimators we have considered – it involves an average of mean zero i.i.d. random variables, to which we can apply the CLT.
• Here is a sketch of the math (see SW App. 12.3 for the details) …
MFIN6201 – Empirical Techniques and Applications in Finance
25
so …
1 TSLS n 1
Pni=1(Yi Y )(Zi Z)
S
b =YZ=PP
1 SXZ
Substitute in Yi = 0 + 1Xi + ui and simplify:
Pn P Pn
bTSLS = 1 i=1Xi(Zi Z)+ i=1ui(Zi Z)
1 ni=1 Xi(Zi Z)
=
ni=1 Yi(Zi Z) ni=1 Xi(Zi Z)
1 n 1
n (Xi X)(Zi Z) i=1
MFIN6201 – Empirical Techniques and Applications in Finance
26
Pn ui(Zi Z) b T S L S = 1 + P i = 1
1 ni=1 Xi(Zi Z) so Pn ui(Zi Z)
b T S L S 1 = P i = 1
1 ni=1 Xi(Zi Z)
Multiply through by pn
p b p1 P ni = 1 ( Z i Z ) u i
n( TSLS )= n
1 1 1 Pn Xi(Zi Z)
n i=1
MFIN6201 – Empirical Techniques and Applications in Finance
27
n
(CLT)
1 Xn
1 Xn
Xi(Zi Z) = n (Xi X)(Zi Z) p cov(X, Z) 6= 0
p b p1 P ni = 1 ( Z i Z ) u i n( TSLS )= n
1 1 1 Pn Xi(Zi Z) n i=1
!
p1 Pni=1(Zi Z)ui is distributed N(0, var[(Z μz)u])
n
so: bTSLS is approx. distributed N( 1, 2 ), where
i=1
i=1
1 ˆ T S L S 2 1 var[(Zi μz)ui] 1
2 where cov(X, Z) 6= 0 because the instrument is relevant
T SLS =
ˆ n [cov(Zi,Xi)]
1
MFIN6201 – Empirical Techniques and Applications in Finance
28
Inference using TSLS, ctd.
bTSLS is approx. distributed N( 1, 2 ), 1 ˆ T S L S
1
• Statistical inference proceeds in the usual way.
• The justification is (as usual) based on large samples
• This all assumes that the instruments are valid – we’ll discuss
what happens if they aren’t valid shortly.
• Important note on standard errors:
– The OLS standard errors from the second stage regression
aren’t right – they don’t take into account the estimation in the first stage (Xc is estimated).
– Instead, use a single specialized command that computes the TSLS estimator and the correct SEs.
i
– As usual, use heteroskedasticity-robust SEs MFIN6201 – Empirical Techniques and Applications in Finance
29
Example 4: Demand for Cigarettes
ln(Qcigarettes) = 0 + 1 ln(P cigarettes) + ui ii
Why is the OLS estimator of 1 likely to be biased?
• Data set: Panel data on annual cigarette consumption and average prices paid (including tax), by state, for the 48 continental US states, 1985-1995.
• Proposed instrumental variable:
– Zi = general sales tax per pack in the state = SalesTaxi
– Do you think this instrument is plausibly valid? ⇤ Relevant? corr(SalesTaxi,ln(Pcigarettes))6=0?
i
⇤ Exogenous? corr(SalesT axi, ui) = 0?
MFIN6201 – Empirical Techniques and Applications in Finance
30
Cigarette demand, ctd.
For now, use data from 1995 only. First stage OLS regression:
ln(P cigarettes) = 4.63 + .031SalesT axi, n = 48 i
Second stage OLS regression:
ln(Qcigarettes) = 9.72 1.08 ln(P cigarettes), n = 48
Combined TSLS regression with correct,
heteroskedasticity-robust standard errors:
ln(Qcigarettes) = 9.72 1.08 ln(P cigarettes), n = 48 i (1.49) (0.31) i
ii
MFIN6201 – Empirical Techniques and Applications in Finance
31
STATA Example: Cigarette demand, First stage
Instrument = Z = rtaxso = general sales tax (real $/pack)
MFIN6201 – Empirical Techniques and Applications in Finance
32
Second stage
MFIN6201 – Empirical Techniques and Applications in Finance
33
Combined into a single command:
MFIN6201 – Empirical Techniques and Applications in Finance
34
Summary of IV Regression with a Single X and Z
• A valid instrument Z must satisfy two conditions:
1. relevance: corr(Zi, Xi) 6= 0
2. exogeneity: corr(Zi, ui) = 0
• TSLS proceeds bcy first regressing X on Z to get Xc, then
regressing Y on X
• The key idea is that the first stage isolates part of the
variation in X that is uncorrelated with u
• If the instrument is valid, then the large-sample sampling distribution of the TSLS estimator is normal, so inference proceeds as usual
MFIN6201 – Empirical Techniques and Applications in Finance
35
General IV Regression Model (SW Section 12.2)
• So far we have considered IV regression with a single endogenous regressor (X) and a single instrument (Z).
• We need to extend this to:
– multiple endogenous regressors (X1,…,Xk)
– multiple included exogenous variables (W1,…,Wr) or control variables, which need to be included for the usual OV reason
– multiple instrumental variables (Z1,…,Zm). More (relevant) instruments can produce a smaller variance of TSLS: the R2 of the first stage increases, so you have more variation in Xc .
• New terminology: identification & overidentification MFIN6201 – Empirical Techniques and Applications in Finance
36
Identification
• In general, a parameter is said to be identified if di↵erent values of the parameter produce di↵erent distributions of the data.
• In IV regression, whether the coe cients are identified depends on the relation between the number of instruments (m) and the number of endogenous regressors (k)
• Intuitively, if there are fewer instruments than endogenous regressors, we can’t estimate 1,…, k
– For example, suppose k = 1 but m = 0 (no instruments)!
MFIN6201 – Empirical Techniques and Applications in Finance
37
Identification, ctd.
The coe cients 1,…, k are said to be:
• exactly identified if m = k.
There are just enough instruments to estimate 1,…, k.
• overidentified if m > k.
There are more than enough instruments to estimate 1, . . . , k. If so, you can test whether the instruments are valid (a test of the “overidentifying restrictions”) – we’ll return to this later
• underidentified if m < k.
There are too few instruments to estimate 1,..., k. If so,
you need to get more instruments!
MFIN6201 - Empirical Techniques and Applications in Finance
38
The General IV Regression Model: Summary of Jargon
Yi = 0 + 1X1i +...+ kXki + k+1W1i +...+ k+rWri +ui
• Yi is the dependent variable
• X1i,...,Xki are the endogenous regressors (potentially correlated with ui)
• W1i,...,Wri are the included exogenous regressors (uncorrelated with ui) or control variables (included so that Zi is uncorrelated with ui, once the W’s are included)
• 0, 1,..., k+r are the unknown regression coe cients
• Z1i,...,Zmi are the m instrumental variables (the excluded exogenous variables)
39
• The coe cients are overidentified if m > k; exactly identified if m = k; and underidentified if m < k.
MFIN6201 - Empirical Techniques and Applications in Finance
TSLS with a Single Endogenous Regressor
Yi = 0 + 1X1i + 2W1i +...+ 1+rWri +ui • m instruments: Z1i,...,Zmi
• First stage
– Regress X1 on all the exogenous regressors: regress X1 on
W1,...,Wr,Z1,...,Zm, and an intercept, by OLS
– Compute predicted values Xc , i = 1,...,n
• Second stage
– Regress Y on Xc ,W ,...,W , and an intercept, by OLS
TSLS estimators, but SEs are wrong
1i 1
– The coe cients from this second stage regression are the
1i
r
40
• To get correct SEs, do this in a single step in your regression software
MFIN6201 - Empirical Techniques and Applications in Finance
Example 4: Demand for cigarettes, ctd.
Suppose income is exogenous (this is plausible - why?), and we also want to estimate the income elasticity:
ln(Qcigarettes) = 0 + 1 ln(P cigarettes) + 2 ln(Incomei) + ui ii
We actually have two instruments: Z1i = general sales taxi
Z2i = cigarette-specific taxi
• Endogenous variable: ln(Pcigarettes) (“one X”)
i
• Included exogenous variable: ln(Incomei) (“one W”)
• Instruments (excluded endogenous variables): general sales tax, cigarette-specific tax (“two Zs”)
• Is 1 over-, under-, or exactly identified? MFIN6201 - Empirical Techniques and Applications in Finance
41
Example: Cigarette demand, one instrument
MFIN6201 - Empirical Techniques and Applications in Finance
42
Example: Cigarette demand, two instruments
MFIN6201 - Empirical Techniques and Applications in Finance
43
TSLS estimates, Z = sales tax (m = 1)
ln(Qcigarettes) = 9.43 1.14 ln(P cigarettes) + 0.21 ln(Income1)
i (1.26) (0.37) i (0.31) TSLS estimates, Z = sales tax & cig-only tax (m = 2)
ln(Qcigarettes) = 9.89 1.28 ln(P cigarettes) + 0.28 ln(Income1) i (0.96) (0.25) i (0.25)
• Smaller SEs for m = 2. Using 2 instruments gives more information-more “as-if random variation.”
• Low income elasticity (not a luxury good); income elasticity not statistically significantly di↵erent from 0
• Surprisingly high price elasticity
MFIN6201 - Empirical Techniques and Applications in Finance
44
The IV Regression Assumptions
Yi = 0 + 1X1i +...+ kXki + k+1W1i +...+ k+rWri +ui 1. E(ui|W1i,...,Wri) = 0
– ]1 says “the exogenous regressors are exogenous.” 2. (Yi,X1i,...,Xki,W1i,...,Wri,Z1i,...,Zmi) are i.i.d.
– ]2 is not new
3. The Xs, Ws, Zs, and Y have nonzero, finite 4th moments
– ]3 is not new
4. The instruments (Z1i,...,Zmi) are valid.
– We have discussed this
• Under 1-4, TSLS and its t-statistic are normally distributed
• The critical requirement is that the instruments be valid MFIN6201 - Empirical Techniques and Applications in Finance
45
Example 1: E↵ect of studying on grades, ctd.
Yi = 0 + 1Xi +ui
Y = first-semester GPA
X = average study hours per day
Z = 1 if roommate brought video game, = 0 otherwise Roommates were randomly assigned
Can you think of a reason that Z might be correlated with u - even though it is randomly assigned? What else enters the error term - what are other determinants of grades, beyond time spent studying?
MFIN6201 - Empirical Techniques and Applications in Finance
46
Example 1: E↵ect of studying on grades, ctd.
Yi = 0 + 1Xi +ui Why might Z be correlated with u?
• Here’s a hypothetical possibility: gender. Suppose:
– Women get better grades than men, holding constant hour
spent studying
– Men are more likely to bring a video game than women
– Then corr(Zi, ui) < 0 (males are more likely to have a [male] roommate who brings a video game - but males also tend to have lower grades, holding constant the amount of studying).
• This is just a version of OV bias. The solution to OV bias is to control for (or include) the OV - in this case, gender.
MFIN6201 - Empirical Techniques and Applications in Finance
47
Example 1: E↵ect of studying on grades, ctd.
• This logic leads you to include W = gender as a control variable in the IV regression:
Yi = 0 + 1Xi + 2Wi +ui
• The TSLS estimate reported above is from a regression that included gender as a W variable - along with other variables such as individual i’s major.
MFIN6201 - Empirical Techniques and Applications in Finance
48
Recall the two requirements for valid instruments:
1. Relevance (special case of one X)
At least one instrument must enter the population counterpart
of the first stage regression. 2. Exogeneity
• All the instruments must be uncorrelated with the error term: corr(Z1i, ui) = 0, . . . , corr(Zmi, ui) = 0
What happens if one of these requirements isn’t satisfied? How can you check? What do you do?
If you have multiple instruments, which should you use?
MFIN6201 - Empirical Techniques and Applications in Finance
49
Checking Assumption 1: Instrument Relevance
We will focus on a single included endogenous regressor:
Yi = 0 + 1Xi + 2W1i +...+ 1+rWri +ui First stage regression:
Xi = ⇡0 +⇡1Z1i +...+⇡mZmi +⇡m+1W1i +...+⇡m+kWki +ui • The instruments are relevant if at least one of ⇡1,...,⇡m are
nonzero.
• The instruments are said to be weak if all the ⇡1,...,⇡m are either zero or nearly zero.
• Weak instruments explain very little of the variation in X, beyond that explained by the W’s
MFIN6201 - Empirical Techniques and Applications in Finance
50
What are the consequences of weak instruments?
If instruments are weak, the sampling distribution of TSLS and its t-statistic are not (at all) normal, even with n large.
Consider the simplest case:
Yi = 0 + 1Xi +ui
Xi = ⇡0 +⇡1Zi +ui • The IV estimator is bTSLS = SYZ
1 SXZ
• If cov(X,Z) is zero or small, then SXZ will be small: With weak
instruments, the denominator is nearly zero.
• If so, the sampling distribution of bTSLS (and its t-statistic) is 1
not well approximated by its large-n normal approximation ...
MFIN6201 - Empirical Techniques and Applications in Finance
51
Why does our trusty normal approximation fail us?
b T S L S = S Y Z 1 SXZ
• If cov(X,Z) is small, small changes in SXZ (from one sample to the next) can induce big changes in bTSLS
1
• Suppose in one sample you calculate SXZ = .00001 . . .
• Thus the large-n normal approximation is a poor approximation to the sampling distribution of bTSLS
• A better approximation is that bTSLS is distributed as the ratio 1
of two correlated normal random variables (see SW App. 12.4)
• If instruments are weak, the usual methods of inference are unreliable-potentially very unreliable.
1
MFIN6201 - Empirical Techniques and Applications in Finance
52
Measuring the Strength of Instruments in Practice:
The First-Stage F-statistic
• The first stage regression (one X):
• Regress X on Z1,...,Zm,W1,...,Wk.
• Totally irrelevant instruments () all the coe cients on Z1,...,Zm are zero.
• The first-stage F-statistic tests the hypothesis that Z1,...,Zm do not enter the first stage regression.
• Weak instruments imply a small first stage F-statistic.
MFIN6201 - Empirical Techniques and Applications in Finance
53
Checking for Weak Instruments with a Single X
• Compute the first-stage F-statistic.
Rule-of-thumb: If the first stage F-statistic is less than
10, then the set of instruments is weak.
• If so, the TSLS estimator will be biased, and statistical inferences (standard errors, hypothesis tests, confidence intervals) can be misleading.
MFIN6201 - Empirical Techniques and Applications in Finance
54
What to do if you have weak instruments?
• Get better instruments (often easier said than done!)
• If you have many instruments, some are probably weaker than others and it’s a good idea to drop the weaker ones (dropping an irrelevant instrument will increase the first-stage F)
• If you only have a few instruments, and all are weak, then you need to do some IV analysis other than TSLS ...
MFIN6201 - Empirical Techniques and Applications in Finance
55
Checking Assumption 2: Instrument Exogeneity
• Instrument exogeneity: All the instruments are uncorrelated with the error term: corr(Z1i, ui) = 0, . . . , corr(Zmi, ui) = 0
• If the instruments are correlated with the error term, the first stage of TSLS cannot isolate a component of X that is uncorrelated with the error term, so Xc is correlated with u and TSLS is inconsistent.
• If there are more instruments than endogenous regressors, it is possible to test - partially - for instrument exogeneity.
MFIN6201 - Empirical Techniques and Applications in Finance
56
Testing Overidentifying Restrictions
Consider the simplest case:
Yi = 0 + 1Xi +ui
• Suppose there are two valid instruments: Z1i,Z2i
• Then you could compute two separate TSLS estimates.
• Intuitively, if these 2 TSLS estimates are very di↵erent from each other, then something must be wrong: one or the other (or both) of the instruments must be invalid.
• The J-test of overidentifying restrictions makes this comparison in a statistically precise way.
• This can only be done if ]Z’s > ]X’s (overidentified). MFIN6201 – Empirical Techniques and Applications in Finance
57
The J-test of Overidentifying Restrictions
Suppose ] instruments = m > ]X’s = k (overidentified)
Yi = 0 + 1X1i +…+ kXki + k+1W1i +…+ k+rWri +ui
The J-test is the Anderson-Rubin test, using the TSLS estimator instead of the hypothesized value 1,0. The recipe:
• First estimate the equation of interest usingb TSLS and all m instruments; compute the predicted values Yi, using the actual X’s (not the Xc’s used to estimate the second stage)
• Compute the residuals ub = Y Yb iii
• Regress against Z1i,…,Zmi,W1i,…,Wri.
• Compute the F-statistic testing the hypothesis that the
coe cients on Z1i,…,Zmi are all zero;
• The J-statistic is J = mF
MFIN6201 – Empirical Techniques and Applications in Finance
58
The J-test, ctd
J = mF, where F = the F-statistic testing the coe cients on Z1i,…,Zmi in a regression of the TSLS residuals against Z1i,…,Zmi,W1i,…,Wri.
Distribution of the J-statistic
• Under the null hypothesis that all the instruments are exogeneous, J has a chi-squared distribution with m-k degrees of freedom
• If some instruments are exogenous and others are endogenous, the J statistic will be large, and the null hypothesis that all instruments are exogenous will be rejected.
MFIN6201 – Empirical Techniques and Applications in Finance
59
Checking Instrument Validity: Summary
This summary considers the case of a single X. The two requirements for valid instruments are:
1. Relevance
– At least one instrument must enter the population
counterpart of the first stage regression.
– If instruments are weak, then the TSLS estimator is biased and the t-statistic has a non-normal distribution
– To check for weak instruments with a single included endogenous regressor, check the first-stage F
– If F > 10, instruments are strong – use TSLS
– If F < 10, weak instrument - take some action.
MFIN6201 - Empirical Techniques and Applications in Finance
60
2. Exogeneity
• All the instruments must be uncorrelated with the error term: corr(Z1i, ui) = 0, . . . , corr(Zmi, ui) = 0
• We can partially test for exogeneity: if m > 1, we can test the null hypothesis that all the instruments are exogenous, against the alternative that as many as m 1 are endogenous (correlated with u)
• The test is the J-test, which is constructed using the TSLS residuals.
• If the J-test rejects, then at least some of your instruments are endogenous – so you must make a di cult decision and jettison some (or all) of your instruments.
MFIN6201 – Empirical Techniques and Applications in Finance
61
How should we interpret the J-test rejection?
• J-test rejects the null hypothesis that both the instruments are exogenous
• This means that either rtaxso is endogenous, or rtax is endogenous, or both!
• The J-test doesn’t tell us which! You must exercise judgment
• Why might rtax (cig-only tax) be endogenous?
– Political forces: history of smoking or lots of smokers ? political pressure for low cigarette taxes
– If so, cig-only tax is endogenous
• This reasoning doesn’t apply to general sales tax
• ! use just one instrument, the general sales tax MFIN6201 – Empirical Techniques and Applications in Finance
62
The Demand for Cigarettes: Summary of Empirical Results
• Use the estimated elasticity based on TSLS with the general sales tax as the only instrument:
Elasticity = -.94, SE = .21
• This elasticity is surprisingly large (not inelastic) – a 1% increase in prices reduces cigarette sales by nearly 1%. This is much more elastic than conventional wisdom in the health economics literature.
• This is a long-run (ten-year change) elasticity. What would you expect a short-run (one-year change) elasticity to be – more or less elastic?
MFIN6201 – Empirical Techniques and Applications in Finance
63
Where Do Valid Instruments Come From?
General comments
The hard part of IV analysis is finding valid instruments
– Method 1: “variables in another equation” (e.g. supply shifters that do not a↵ect demand)
– Method 2: look for exogenous variation (Z) that is “as if” randomly assigned (does not directly a↵ect Y) but a↵ects X.
– These two methods are di↵erent ways to think about the same issues – see the link
⇤ Rainfall shifts the supply curve for butter but not the demand curve; rainfall is “as if” randomly assigned
⇤ Sales tax shifts the supply curve for cigarettes but not the demand curve; sales taxes are “as if” randomly assigned
MFIN6201 – Empirical Techniques and Applications in Finance
64
Conclusion (SW Section 12.6)
• A valid instrument lets us isolate a part of X that is uncorrelated with u, and that part can be used to estimate the e↵ect of a change in X on Y
• IV regression hinges on having valid instruments:
– Relevance: Check via first-stage F
– Researchers must argue the validity of exclusion restriction
– Exogeneity: Test overidentifying restrictions via the J -statistic
• A valid instrument isolates variation in X that is “as if” randomly assigned.
• The critical requirement of at least m valid instruments cannot be tested – you must use your head.
MFIN6201 – Empirical Techniques and Applications in Finance
65
Some IV FAQs
1. When might I want to use IV regression?
Any time that X is correlated with u and you have a valid instrument. The primary reasons for correlation between X and u could be:
• Omitted variable(s) that lead to OV bias Example: ability bias in returns to education
• Measurement error
Example: measurement error in years of education
• Selection bias
Example: Patients select treatment
• Simultaneous causality bias
Example: supply and demand for butter, cigarettes MFIN6201 – Empirical Techniques and Applications in Finance
66
2. Threats to the internal validity of an IV regression?
• The main threat to the internal validity of IV is the failure of the assumption of valid instruments. Given a set of control variables W, instruments are valid if they are relevant and exogenous.
– Instrument relevance can be assessed by checking if instruments are weak or strong: Is the first-stage F-statistic > 10?
– Instrument exogeneity can be checked using the J-statistic – as long as you have m exogenous instruments to start with! In general, instrument exogeneity must be assessed using expert knowledge of the application
MFIN6201 – Empirical Techniques and Applications in Finance
67
Practice questions
• Try question 12.1, 12.5, 12.7, 12.9. • Answers will be provide next week. • This is not an assessment.
• They are just for practice.
MFIN6201 – Empirical Techniques and Applications in Finance
68