CS计算机代考程序代写 finance MFIN6201

MFIN6201
Empirical Techniques and Applications in
Finance
DOBRA
Week 5
Be•e•e•Ba
March 19, 2018

Outline
• IV Regression: Why and What; Two Stage Least Squares • The General IV Regression Model
• Checking Instrument Validity
– Weak and strong instruments
– Instrument exogeneity
• Application: Demand for cigarettes
• Examples: Where Do Instruments Come From?
MFIN6201 – Empirical Techniques and Applications in Finance
1

IV Regression: Why?
Three important threats to internal validity are:
• Omitted variable bias from a variable that is correlated with X but is unobserved (so cannot be included in the regression) and for which there are inadequate control variables;
• Simultaneous causality bias (X causes Y, Y causes X);
• Errors-in-variables bias (X is measured with error)
All three problems result in E(u|X) 6= 0.
• Instrumental variables regression can eliminate bias when
E(u|X) 6= 0 – using an instrumental variable (IV), Z.
MFIN6201 – Empirical Techniques and Applications in Finance
2

IV Estimator with Single Regressor and Single Instrument
Yi = 0 +1Xi +ui
• IV regression breaks X into two parts: a part that might be correlated with u, and a part that is not. By isolating the part that is not correlated with u, it is possible to estimate 1.
• This is done using an instrumental variable, Zi, which is correlated with Xi but uncorrelated with ui.
MFIN6201 – Empirical Techniques and Applications in Finance
3

Terminology: Endogeneity and Exogeneity
An endogenous variable is one that is correlated with u An exogenous variable is one that is uncorrelated with u
In IV regression, we focus on the case that X is endogenous and there is an instrument, Z, which is exogenous
Digression on terminology –
“Endogenous” literally means “determined within the system.” If X is jointly determined with Y, then a regression of Y on X is subject to simultaneous causality bias. But this definition of endogeneity is too narrow because IV regression can be used to address OV bias and errors-in-variable bias. Thus we use the broader definition of endogeneity above.
MFIN6201 – Empirical Techniques and Applications in Finance
4

Two Conditions for a Valid Instrument
Yi = 0 +1Xi +ui
For an instrumental variable (an “instrument”) Z to be valid,
it must satisfy two conditions:
1. Instrument relevance: corr(Zi, Xi) 6= 0
1.a Exclusion restriction: Zi a↵ects Yi only through Xi
2. Instrument exogeneity: corr(Zi, ui) = 0
Suppose for now that you have such a Zi (we’ll discuss how to
find instrumental variables later).
How can you use Zi to estimate 1? MFIN6201 – Empirical Techniques and Applications in Finance
5

IV estimator with one X and one Z
Explanation ] 1: Two Stage Least Squares (TSLS)
As it sounds, TSLS has two stages – two regressions: (1) Isolate the part of X that is uncorrelated with u by regressing X on Z using OLS:
Xi = ⇡0 +⇡1Zi +vi
– Because Zi is uncorrelated with ui, ⇡0 + ⇡1Zi is uncorrelated with ui. We don’t know ⇡0 or ⇡1 but we have estimated them, so …
– Compute the predicted values of X , where Xc = ⇡b +⇡b Z , ii01i
i = 1,…,n.
MFIN6201 – Empirical Techniques and Applications in Finance
6

Two Stage Least Squares, ctd.
(2) Replace X by Xc in the regression of interest: ii
regress Y on Xc using OLS: i
Y = + Xc + u i01ii
• Because Xc is uncorrelated with u , the first least squares ii
assumption holds for regression (2). (This requires n to be large so that ⇡0 or ⇡1 are precisely estimated.)
• Thus, in large samples, 1 can be estimated by OLS using regression (2)
• The resulting estimator is called the Two Stage Least Squares (TSLS) estimator, bTSLS.
1
MFIN6201 – Empirical Techniques and Applications in Finance
7

Two Stage Least Squares: Summary
Suppose Zi, satisfies the two conditions for a valid instrument: 1. Instrument relevance: corr(Zi, Xi) 6= 0
2. Instrument exogeneity: corr(Zi, ui) = 0 Two-stage least squares:
Stage 1: Regress Xi on Zi (including an intercept), obtain the
predicted values Xc i
Stage 2: Regress Y on Xc (including an intercept); the ii
coecient on Xc is the TSLS estimator, bTSLS. i1
bTSLS is a consistent estimator of 1. 1
MFIN6201 – Empirical Techniques and Applications in Finance
8

IV Estimator, one X and one Z, ctd.
Explanation ] 2: A direct algebraic derivation Yi = 0 +1Xi +ui
Thus:
cov(0 + 1Xi + ui, Zi) = 0 + cov(1Xi, Zi) + 0
cov(Yi, Zi) =
= cov(0, Zi) + cov(1Xi, Zi) + cov(ui, Zi)
= 1cov(Xi, Zi)
where cov(ui,Zi)=0 by instrument exogeneity; thus
1 = cov(Yi, Zi) cov(Xi, Zi)
MFIN6201 – Empirical Techniques and Applications in Finance
9

IV Estimator, one X and one Z, ctd.
1 = cov(Yi, Zi) cov(Xi, Zi)
The IV estimator replaces these population covariances with sample covariances:
b T S L S = S Y Z , 1 SXZ
SY Z and SXZ are the sample covariances. This is the TSLS estimator – just a di↵erent derivation!
MFIN6201 – Empirical Techniques and Applications in Finance
10

IV Estimator, one X and one Z, ctd.
Explanation ] 3: Derivation from the “reduced form” The “reduced form” relates Y to Z and X to Z:
Xi = ⇡0 +⇡1Zi +vi Yi = 0 +1Zi +wi
where wi is an error term. Because Z is exogenous, Z is uncorrelated with both vi and wi.
The idea: A unit change in Zi results in a change in Xi of ⇡1 and a change in Yi of 1. Because that change in Xi arises from the exogenous change in Zi, that change in Xi is exogenous. Thus an exogenous change in Xi of ⇡1 units is associated with a change in Yi of 1 units – so the e↵ect on Y of an exogenous change in X is 1 = 1 units.
11
⇡1
MFIN6201 – Empirical Techniques and Applications in Finance

The math:
IV estimator from the reduced form, ctd.
Xi = ⇡0 +⇡1Zi +vi Yi = 0 +1Zi +wi
Solve the X equation for Z:
Zi = ⇡0 + ( 1 )Xi ( 1 )vi
⇡1 ⇡1 ⇡1 Substitute this into the Y equation and collect terms:
Yi = 0+1Zi+wi
= 0 +1[⇡0 +( 1 )Xi ( 1 )vi]+wi
= [0 + ⇡01] + (1)Xi + [wi (1)vi] ⇡1⇡1 ⇡1
= 0+1Xi+ui
where 0 = 0 ⇡01, 1 = 1, and ui = wi (1)vi
⇡1⇡1 ⇡1 MFIN6201 – Empirical Techniques and Applications in Finance
12
⇡1 ⇡1 ⇡1

yields where
IV estimator from the reduced form, ctd.
Xi = ⇡0 +⇡1Zi +vi Yi = 0 +1Zi +wi
Yi = 0 +1Xi +ui, 1 = 1
⇡1
Interpretation: An exogenous change in Xi of ⇡1 units is
associated with a change in Yi of 1 units – so the e↵ect on Y of an exogenous unit change in X is 1 = 1 .
⇡1
MFIN6201 – Empirical Techniques and Applications in Finance
13

Example 1: E↵ect of Studying on Grades
What is the e↵ect on grades of studying for an additional hour per day?
Y = GPA
X = study time (hours per day)
Data: grades and study hours of college freshmen.
Would you expect the OLS estimator of 1 (the e↵ect on GPA of studying an extra hour per day) to be unbiased? Why or why not?
MFIN6201 – Empirical Techniques and Applications in Finance
14

Studying on grades, ctd.
Stinebrickner, Ralph and Stinebrickner, Todd R. (2008) “The Causal E↵ect of Studying on Academic Performance,” The B.E. Journal of Economic Analysis & Policy: Vol. 8: Iss. 1 (Frontiers), Article 14.
• n = 210 freshman at Berea College (Kentucky) in 2001 • Y = first-semester GPA
• X = average study hours per day (time use survey)
• Roommates were randomly assigned
• Z = 1 if roommate brought video game, = 0 otherwise
Do you think Zi (whether a roommate brought a video game) is a valid instrument?
15

1. Is it relevant (correlated with X)?
2. Is it exogenous (uncorrelated with u)?
MFIN6201 – Empirical Techniques and Applications in Finance

Studying on grades, ctd.
Xi = ⇡0+⇡1Zi+vi Yi = 0+1Zi+wi
Y = GPA (4 point scale)
X = time spent studying (hours per day)
Z = 1 if roommate brought video game, = 0 otherwise Stinebrinckner and Stinebrinckners findings
⇡c = 0.668 1
c = 0.241
1 d1
c 0.241 IV== =0.36
1 ⇡c 0.668 1
What are the units? Do these estimates make sense in a real-world way? (Note: They actually ran the regressions including additional regressors – more on this later.)
MFIN6201 – Empirical Techniques and Applications in Finance
16

Example 2: Supply and demand for butter
IV regression was first developed to estimate demand elasticities for agricultural goods, for example, butter:
ln(Qbutter) = 0 + 1 ln(P butter) + ui ii
• 1 = price elasticity of butter = percent change in quantity for a 1% change in price
• Data: observations on price and quantity of butter for di↵erent years
• The OLS regression of ln(Qbutter) on ln(Pbutter) su↵ers from ii
simultaneous causality bias (why?)
MFIN6201 – Empirical Techniques and Applications in Finance
17

Simultaneous causality bias in the OLS regression of ln(Qbutter) on butter i
ln(Pi ) arises because price and quantity are determined by the interaction of demand and supply:
MFIN6201 – Empirical Techniques and Applications in Finance
18

This interaction of demand and supply produces data like …
Would a regression using these data produce the demand curve?
MFIN6201 – Empirical Techniques and Applications in Finance
19

But…what would you get if only supply shifted?
• TSLS estimates the demand curve by isolating shifts in price and quantity that arise from shifts in supply.
• Z is a variable that shifts supply but not demand. MFIN6201 – Empirical Techniques and Applications in Finance
20

TSLS in the supply-demand example:
ln(Qbutter) = 0 + 1 ln(P butter) + ui ii
Let Z = rainfall in dairy-producing regions. Is Z a valid instrument?
(1) Relevant? corr(raini, ln(P butter)) 6= 0? i
Plausibly: insucient rainfall means less grazing means less butter means higher prices
(2) Exogenous? corr(raini, ui) = 0?
Plausibly: whether it rains in dairy-producing regions shouldn’t a↵ect demand for butter
MFIN6201 – Empirical Techniques and Applications in Finance
21

TSLS in the supply-demand example:
ln(Qbutter) = 0 + 1 ln(P butter) + ui ii
Zi = raini = rainfall in dairy-producing regions. \
Stage 1: regress ln(P butter) on rain, get ln(P butter) ii
\
– ln(Pbutter) isolates changes in log price that arise from i
supply (part of supply, at least)
– The regression counterpart of using shifts in the supply curve to trace out the demand curve.
Stage 2: regress ln(Qbutter) on ln(P butter) ii
\
MFIN6201 – Empirical Techniques and Applications in Finance
22

Example 3: Test scores and class size
• The California test score/class size regressions still could have OV bias (e.g. parental involvement).
• In principle, this bias can be eliminated by IV regression (TSLS).
• IV regression requires a valid instrument, that is, an instrument that is:
1. relevant: corr(Zi, ST Ri) 6= 0? 2. exogenous: corr(Zi, ui) = 0?
MFIN6201 – Empirical Techniques and Applications in Finance
23

Example 3: Test scores and class size, ctd.
Here is a (hypothetical) instrument:
• some districts, randomly hit by an earthquake, “double up”
classrooms: Zi = Quakei = 1 if hit by quake, = 0 otherwise
• Do the two conditions for a valid instrument hold?
• The earthquake makes it as if the districts were in a random assignment experiment. Thus, the variation in STR arising from the earthquake is exogenous.
• The first stage of TSLS regresses STR against Quake, thereby isolating the part of STR that is exogenous (the part that is “as if” randomly assigned)
MFIN6201 – Empirical Techniques and Applications in Finance
24

Inference using TSLS
• In large samples, the sampling distribution of the TSLS estimator is normal
• Inference (hypothesis tests, confidence intervals) proceeds in the usual way, e.g. ± 1.96SE
• The idea behind the large-sample normal distribution of the TSLS estimator is that – like all the other estimators we have considered – it involves an average of mean zero i.i.d. random variables, to which we can apply the CLT.
• Here is a sketch of the math (see SW App. 12.3 for the details) …
MFIN6201 – Empirical Techniques and Applications in Finance
25

so …
1 TSLS n1
Pni=1(YiY )(ZiZ)
S
b =YZ=PP
1 SXZ
Substitute in Yi = 0 + 1Xi + ui and simplify:
Pn P Pn
bTSLS = 1 i=1Xi(Zi Z)+ i=1ui(Zi Z)
1 ni=1 Xi(Zi Z)
=
ni=1 Yi(ZiZ) ni=1 Xi(ZiZ)
1 n1
n (XiX)(ZiZ) i=1
MFIN6201 – Empirical Techniques and Applications in Finance
26

Pn ui(Zi Z) b T S L S = 1 + P i = 1
1 ni=1 Xi(Zi Z) so Pn ui(Zi Z)
b T S L S 1 = P i = 1
1 ni=1 Xi(Zi Z)
Multiply through by pn
p b p1 P ni = 1 ( Z i Z ) u i
n(TSLS)= n
1 1 1 Pn Xi(Zi Z)
n i=1
MFIN6201 – Empirical Techniques and Applications in Finance
27

n
(CLT)
1 Xn
1 Xn
Xi(Zi Z) = n (Xi X)(Zi Z) p cov(X, Z) 6= 0
p b p1 P ni = 1 ( Z i Z ) u i n(TSLS)= n
1 1 1 Pn Xi(Zi Z) n i=1
!
p1 Pni=1(Zi Z)ui is distributed N(0, var[(Z μz)u])
n
so: bTSLS is approx. distributed N(1,2 ), where
i=1
i=1
1 ˆ T S L S 2 1 var[(Ziμz)ui] 1
2 where cov(X, Z) 6= 0 because the instrument is relevant
T SLS =
ˆ n [cov(Zi,Xi)]
1
MFIN6201 – Empirical Techniques and Applications in Finance
28

Inference using TSLS, ctd.
bTSLS is approx. distributed N(1,2 ), 1 ˆ T S L S
1
• Statistical inference proceeds in the usual way.
• The justification is (as usual) based on large samples
• This all assumes that the instruments are valid – we’ll discuss
what happens if they aren’t valid shortly.
• Important note on standard errors:
– The OLS standard errors from the second stage regression
aren’t right – they don’t take into account the estimation in the first stage (Xc is estimated).
– Instead, use a single specialized command that computes the TSLS estimator and the correct SEs.
i
– As usual, use heteroskedasticity-robust SEs MFIN6201 – Empirical Techniques and Applications in Finance
29

Example 4: Demand for Cigarettes
ln(Qcigarettes) = 0 + 1 ln(P cigarettes) + ui ii
Why is the OLS estimator of 1 likely to be biased?
• Data set: Panel data on annual cigarette consumption and average prices paid (including tax), by state, for the 48 continental US states, 1985-1995.
• Proposed instrumental variable:
– Zi = general sales tax per pack in the state = SalesTaxi
– Do you think this instrument is plausibly valid? ⇤ Relevant? corr(SalesTaxi,ln(Pcigarettes))6=0?
i
⇤ Exogenous? corr(SalesT axi, ui) = 0?
MFIN6201 – Empirical Techniques and Applications in Finance
30

Cigarette demand, ctd.
For now, use data from 1995 only. First stage OLS regression:
ln(P cigarettes) = 4.63 + .031SalesT axi, n = 48 i
Second stage OLS regression:
ln(Qcigarettes) = 9.72 1.08 ln(P cigarettes), n = 48
Combined TSLS regression with correct,
heteroskedasticity-robust standard errors:
ln(Qcigarettes) = 9.72 1.08 ln(P cigarettes), n = 48 i (1.49) (0.31) i
ii
MFIN6201 – Empirical Techniques and Applications in Finance
31

STATA Example: Cigarette demand, First stage
Instrument = Z = rtaxso = general sales tax (real $/pack)
MFIN6201 – Empirical Techniques and Applications in Finance
32

Second stage
MFIN6201 – Empirical Techniques and Applications in Finance
33

Combined into a single command:
MFIN6201 – Empirical Techniques and Applications in Finance
34

Summary of IV Regression with a Single X and Z
• A valid instrument Z must satisfy two conditions:
1. relevance: corr(Zi, Xi) 6= 0
2. exogeneity: corr(Zi, ui) = 0
• TSLS proceeds bcy first regressing X on Z to get Xc, then
regressing Y on X
• The key idea is that the first stage isolates part of the
variation in X that is uncorrelated with u
• If the instrument is valid, then the large-sample sampling distribution of the TSLS estimator is normal, so inference proceeds as usual
MFIN6201 – Empirical Techniques and Applications in Finance
35

General IV Regression Model (SW Section 12.2)
• So far we have considered IV regression with a single endogenous regressor (X) and a single instrument (Z).
• We need to extend this to:
– multiple endogenous regressors (X1,…,Xk)
– multiple included exogenous variables (W1,…,Wr) or control variables, which need to be included for the usual OV reason
– multiple instrumental variables (Z1,…,Zm). More (relevant) instruments can produce a smaller variance of TSLS: the R2 of the first stage increases, so you have more variation in Xc .
• New terminology: identification & overidentification MFIN6201 – Empirical Techniques and Applications in Finance
36

Identification
• In general, a parameter is said to be identified if di↵erent values of the parameter produce di↵erent distributions of the data.
• In IV regression, whether the coecients are identified depends on the relation between the number of instruments (m) and the number of endogenous regressors (k)
• Intuitively, if there are fewer instruments than endogenous regressors, we can’t estimate 1,…,k
– For example, suppose k = 1 but m = 0 (no instruments)!
MFIN6201 – Empirical Techniques and Applications in Finance
37

Identification, ctd.
The coecients 1,…,k are said to be:
• exactly identified if m = k.
There are just enough instruments to estimate 1,…,k.
• overidentified if m > k.
There are more than enough instruments to estimate 1, . . . , k. If so, you can test whether the instruments are valid (a test of the “overidentifying restrictions”) – we’ll return to this later
• underidentified if m < k. There are too few instruments to estimate 1,...,k. If so, you need to get more instruments! MFIN6201 - Empirical Techniques and Applications in Finance 38 The General IV Regression Model: Summary of Jargon Yi = 0 +1X1i +...+kXki +k+1W1i +...+k+rWri +ui • Yi is the dependent variable • X1i,...,Xki are the endogenous regressors (potentially correlated with ui) • W1i,...,Wri are the included exogenous regressors (uncorrelated with ui) or control variables (included so that Zi is uncorrelated with ui, once the W’s are included) • 0,1,...,k+r are the unknown regression coecients • Z1i,...,Zmi are the m instrumental variables (the excluded exogenous variables) 39 • The coecients are overidentified if m > k; exactly identified if m = k; and underidentified if m < k. MFIN6201 - Empirical Techniques and Applications in Finance TSLS with a Single Endogenous Regressor Yi = 0 +1X1i +2W1i +...+1+rWri +ui • m instruments: Z1i,...,Zmi • First stage – Regress X1 on all the exogenous regressors: regress X1 on W1,...,Wr,Z1,...,Zm, and an intercept, by OLS – Compute predicted values Xc , i = 1,...,n • Second stage – Regress Y on Xc ,W ,...,W , and an intercept, by OLS TSLS estimators, but SEs are wrong 1i 1 – The coecients from this second stage regression are the 1i r 40 • To get correct SEs, do this in a single step in your regression software MFIN6201 - Empirical Techniques and Applications in Finance Example 4: Demand for cigarettes, ctd. Suppose income is exogenous (this is plausible - why?), and we also want to estimate the income elasticity: ln(Qcigarettes) = 0 + 1 ln(P cigarettes) + 2 ln(Incomei) + ui ii We actually have two instruments: Z1i = general sales taxi Z2i = cigarette-specific taxi • Endogenous variable: ln(Pcigarettes) (“one X”) i • Included exogenous variable: ln(Incomei) (“one W”) • Instruments (excluded endogenous variables): general sales tax, cigarette-specific tax (“two Zs”) • Is 1 over-, under-, or exactly identified? MFIN6201 - Empirical Techniques and Applications in Finance 41 Example: Cigarette demand, one instrument MFIN6201 - Empirical Techniques and Applications in Finance 42 Example: Cigarette demand, two instruments MFIN6201 - Empirical Techniques and Applications in Finance 43 TSLS estimates, Z = sales tax (m = 1) ln(Qcigarettes) = 9.43 1.14 ln(P cigarettes) + 0.21 ln(Income1) i (1.26) (0.37) i (0.31) TSLS estimates, Z = sales tax & cig-only tax (m = 2) ln(Qcigarettes) = 9.89 1.28 ln(P cigarettes) + 0.28 ln(Income1) i (0.96) (0.25) i (0.25) • Smaller SEs for m = 2. Using 2 instruments gives more information-more “as-if random variation.” • Low income elasticity (not a luxury good); income elasticity not statistically significantly di↵erent from 0 • Surprisingly high price elasticity MFIN6201 - Empirical Techniques and Applications in Finance 44 The IV Regression Assumptions Yi = 0 +1X1i +...+kXki +k+1W1i +...+k+rWri +ui 1. E(ui|W1i,...,Wri) = 0 – ]1 says “the exogenous regressors are exogenous.” 2. (Yi,X1i,...,Xki,W1i,...,Wri,Z1i,...,Zmi) are i.i.d. – ]2 is not new 3. The Xs, Ws, Zs, and Y have nonzero, finite 4th moments – ]3 is not new 4. The instruments (Z1i,...,Zmi) are valid. – We have discussed this • Under 1-4, TSLS and its t-statistic are normally distributed • The critical requirement is that the instruments be valid MFIN6201 - Empirical Techniques and Applications in Finance 45 Example 1: E↵ect of studying on grades, ctd. Yi = 0 +1Xi +ui Y = first-semester GPA X = average study hours per day Z = 1 if roommate brought video game, = 0 otherwise Roommates were randomly assigned Can you think of a reason that Z might be correlated with u - even though it is randomly assigned? What else enters the error term - what are other determinants of grades, beyond time spent studying? MFIN6201 - Empirical Techniques and Applications in Finance 46 Example 1: E↵ect of studying on grades, ctd. Yi = 0 +1Xi +ui Why might Z be correlated with u? • Here’s a hypothetical possibility: gender. Suppose: – Women get better grades than men, holding constant hour spent studying – Men are more likely to bring a video game than women – Then corr(Zi, ui) < 0 (males are more likely to have a [male] roommate who brings a video game - but males also tend to have lower grades, holding constant the amount of studying). • This is just a version of OV bias. The solution to OV bias is to control for (or include) the OV - in this case, gender. MFIN6201 - Empirical Techniques and Applications in Finance 47 Example 1: E↵ect of studying on grades, ctd. • This logic leads you to include W = gender as a control variable in the IV regression: Yi = 0 +1Xi +2Wi +ui • The TSLS estimate reported above is from a regression that included gender as a W variable - along with other variables such as individual i’s major. MFIN6201 - Empirical Techniques and Applications in Finance 48 Recall the two requirements for valid instruments: 1. Relevance (special case of one X) At least one instrument must enter the population counterpart of the first stage regression. 2. Exogeneity • All the instruments must be uncorrelated with the error term: corr(Z1i, ui) = 0, . . . , corr(Zmi, ui) = 0 What happens if one of these requirements isn’t satisfied? How can you check? What do you do? If you have multiple instruments, which should you use? MFIN6201 - Empirical Techniques and Applications in Finance 49 Checking Assumption 1: Instrument Relevance We will focus on a single included endogenous regressor: Yi = 0 +1Xi +2W1i +...+1+rWri +ui First stage regression: Xi = ⇡0 +⇡1Z1i +...+⇡mZmi +⇡m+1W1i +...+⇡m+kWki +ui • The instruments are relevant if at least one of ⇡1,...,⇡m are nonzero. • The instruments are said to be weak if all the ⇡1,...,⇡m are either zero or nearly zero. • Weak instruments explain very little of the variation in X, beyond that explained by the W’s MFIN6201 - Empirical Techniques and Applications in Finance 50 What are the consequences of weak instruments? If instruments are weak, the sampling distribution of TSLS and its t-statistic are not (at all) normal, even with n large. Consider the simplest case: Yi = 0 +1Xi +ui Xi = ⇡0 +⇡1Zi +ui • The IV estimator is bTSLS = SYZ 1 SXZ • If cov(X,Z) is zero or small, then SXZ will be small: With weak instruments, the denominator is nearly zero. • If so, the sampling distribution of bTSLS (and its t-statistic) is 1 not well approximated by its large-n normal approximation ... MFIN6201 - Empirical Techniques and Applications in Finance 51 Why does our trusty normal approximation fail us? b T S L S = S Y Z 1 SXZ • If cov(X,Z) is small, small changes in SXZ (from one sample to the next) can induce big changes in bTSLS 1 • Suppose in one sample you calculate SXZ = .00001 . . . • Thus the large-n normal approximation is a poor approximation to the sampling distribution of bTSLS • A better approximation is that bTSLS is distributed as the ratio 1 of two correlated normal random variables (see SW App. 12.4) • If instruments are weak, the usual methods of inference are unreliable-potentially very unreliable. 1 MFIN6201 - Empirical Techniques and Applications in Finance 52 Measuring the Strength of Instruments in Practice: The First-Stage F-statistic • The first stage regression (one X): • Regress X on Z1,...,Zm,W1,...,Wk. • Totally irrelevant instruments () all the coecients on Z1,...,Zm are zero. • The first-stage F-statistic tests the hypothesis that Z1,...,Zm do not enter the first stage regression. • Weak instruments imply a small first stage F-statistic. MFIN6201 - Empirical Techniques and Applications in Finance 53 Checking for Weak Instruments with a Single X • Compute the first-stage F-statistic. Rule-of-thumb: If the first stage F-statistic is less than 10, then the set of instruments is weak. • If so, the TSLS estimator will be biased, and statistical inferences (standard errors, hypothesis tests, confidence intervals) can be misleading. MFIN6201 - Empirical Techniques and Applications in Finance 54 What to do if you have weak instruments? • Get better instruments (often easier said than done!) • If you have many instruments, some are probably weaker than others and it’s a good idea to drop the weaker ones (dropping an irrelevant instrument will increase the first-stage F) • If you only have a few instruments, and all are weak, then you need to do some IV analysis other than TSLS ... MFIN6201 - Empirical Techniques and Applications in Finance 55 Checking Assumption 2: Instrument Exogeneity • Instrument exogeneity: All the instruments are uncorrelated with the error term: corr(Z1i, ui) = 0, . . . , corr(Zmi, ui) = 0 • If the instruments are correlated with the error term, the first stage of TSLS cannot isolate a component of X that is uncorrelated with the error term, so Xc is correlated with u and TSLS is inconsistent. • If there are more instruments than endogenous regressors, it is possible to test - partially - for instrument exogeneity. MFIN6201 - Empirical Techniques and Applications in Finance 56 Testing Overidentifying Restrictions Consider the simplest case: Yi = 0 +1Xi +ui • Suppose there are two valid instruments: Z1i,Z2i • Then you could compute two separate TSLS estimates. • Intuitively, if these 2 TSLS estimates are very di↵erent from each other, then something must be wrong: one or the other (or both) of the instruments must be invalid. • The J-test of overidentifying restrictions makes this comparison in a statistically precise way. • This can only be done if ]Z’s > ]X’s (overidentified). MFIN6201 – Empirical Techniques and Applications in Finance
57

The J-test of Overidentifying Restrictions
Suppose ] instruments = m > ]X’s = k (overidentified)
Yi = 0 +1X1i +…+kXki +k+1W1i +…+k+rWri +ui
The J-test is the Anderson-Rubin test, using the TSLS estimator instead of the hypothesized value 1,0. The recipe:
• First estimate the equation of interest usingb TSLS and all m instruments; compute the predicted values Yi, using the actual X’s (not the Xc’s used to estimate the second stage)
• Compute the residuals ub = Y Yb iii
• Regress against Z1i,…,Zmi,W1i,…,Wri.
• Compute the F-statistic testing the hypothesis that the
coecients on Z1i,…,Zmi are all zero;
• The J-statistic is J = mF
MFIN6201 – Empirical Techniques and Applications in Finance
58

The J-test, ctd
J = mF, where F = the F-statistic testing the coecients on Z1i,…,Zmi in a regression of the TSLS residuals against Z1i,…,Zmi,W1i,…,Wri.
Distribution of the J-statistic
• Under the null hypothesis that all the instruments are exogeneous, J has a chi-squared distribution with m-k degrees of freedom
• If some instruments are exogenous and others are endogenous, the J statistic will be large, and the null hypothesis that all instruments are exogenous will be rejected.
MFIN6201 – Empirical Techniques and Applications in Finance
59

Checking Instrument Validity: Summary
This summary considers the case of a single X. The two requirements for valid instruments are:
1. Relevance
– At least one instrument must enter the population
counterpart of the first stage regression.
– If instruments are weak, then the TSLS estimator is biased and the t-statistic has a non-normal distribution
– To check for weak instruments with a single included endogenous regressor, check the first-stage F
– If F > 10, instruments are strong – use TSLS
– If F < 10, weak instrument - take some action. MFIN6201 - Empirical Techniques and Applications in Finance 60 2. Exogeneity • All the instruments must be uncorrelated with the error term: corr(Z1i, ui) = 0, . . . , corr(Zmi, ui) = 0 • We can partially test for exogeneity: if m > 1, we can test the null hypothesis that all the instruments are exogenous, against the alternative that as many as m 1 are endogenous (correlated with u)
• The test is the J-test, which is constructed using the TSLS residuals.
• If the J-test rejects, then at least some of your instruments are endogenous – so you must make a dicult decision and jettison some (or all) of your instruments.
MFIN6201 – Empirical Techniques and Applications in Finance
61

How should we interpret the J-test rejection?
• J-test rejects the null hypothesis that both the instruments are exogenous
• This means that either rtaxso is endogenous, or rtax is endogenous, or both!
• The J-test doesn’t tell us which! You must exercise judgment
• Why might rtax (cig-only tax) be endogenous?
– Political forces: history of smoking or lots of smokers ? political pressure for low cigarette taxes
– If so, cig-only tax is endogenous
• This reasoning doesn’t apply to general sales tax
• ! use just one instrument, the general sales tax MFIN6201 – Empirical Techniques and Applications in Finance
62

The Demand for Cigarettes: Summary of Empirical Results
• Use the estimated elasticity based on TSLS with the general sales tax as the only instrument:
Elasticity = -.94, SE = .21
• This elasticity is surprisingly large (not inelastic) – a 1% increase in prices reduces cigarette sales by nearly 1%. This is much more elastic than conventional wisdom in the health economics literature.
• This is a long-run (ten-year change) elasticity. What would you expect a short-run (one-year change) elasticity to be – more or less elastic?
MFIN6201 – Empirical Techniques and Applications in Finance
63

Where Do Valid Instruments Come From?
General comments
The hard part of IV analysis is finding valid instruments
– Method 1: “variables in another equation” (e.g. supply shifters that do not a↵ect demand)
– Method 2: look for exogenous variation (Z) that is “as if” randomly assigned (does not directly a↵ect Y) but a↵ects X.
– These two methods are di↵erent ways to think about the same issues – see the link
⇤ Rainfall shifts the supply curve for butter but not the demand curve; rainfall is “as if” randomly assigned
⇤ Sales tax shifts the supply curve for cigarettes but not the demand curve; sales taxes are “as if” randomly assigned
MFIN6201 – Empirical Techniques and Applications in Finance
64

Conclusion (SW Section 12.6)
• A valid instrument lets us isolate a part of X that is uncorrelated with u, and that part can be used to estimate the e↵ect of a change in X on Y
• IV regression hinges on having valid instruments:
– Relevance: Check via first-stage F
– Researchers must argue the validity of exclusion restriction
– Exogeneity: Test overidentifying restrictions via the J -statistic
• A valid instrument isolates variation in X that is “as if” randomly assigned.
• The critical requirement of at least m valid instruments cannot be tested – you must use your head.
MFIN6201 – Empirical Techniques and Applications in Finance
65

Some IV FAQs
1. When might I want to use IV regression?
Any time that X is correlated with u and you have a valid instrument. The primary reasons for correlation between X and u could be:
• Omitted variable(s) that lead to OV bias Example: ability bias in returns to education
• Measurement error
Example: measurement error in years of education
• Selection bias
Example: Patients select treatment
• Simultaneous causality bias
Example: supply and demand for butter, cigarettes MFIN6201 – Empirical Techniques and Applications in Finance
66

2. Threats to the internal validity of an IV regression?
• The main threat to the internal validity of IV is the failure of the assumption of valid instruments. Given a set of control variables W, instruments are valid if they are relevant and exogenous.
– Instrument relevance can be assessed by checking if instruments are weak or strong: Is the first-stage F-statistic > 10?
– Instrument exogeneity can be checked using the J-statistic – as long as you have m exogenous instruments to start with! In general, instrument exogeneity must be assessed using expert knowledge of the application
MFIN6201 – Empirical Techniques and Applications in Finance
67

Practice questions
• Try question 12.1, 12.5, 12.7, 12.9. • Answers will be provide next week. • This is not an assessment.
• They are just for practice.
MFIN6201 – Empirical Techniques and Applications in Finance
68