An Introduction to Causality
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
January 17-18, 2022
Copyright By PowCoder代写 加微信 powcoder
Last Week’s Lecture: Two Parts
(1) Introduction to the conditional expectation function (CEF) (2) Ordinary Least Squares and the CEF
Today’s Lecture: Four Parts
(1) Analyzing an experiment in R
Comparing means (t-test) and regression
(2) Causality and the potential outcomes framework What do we mean when we say X causes Y
(3) Linear regression, the CEF, and causality
How do we think about causality in a regression framework?
(4) Instrumental Variables (if time)
The intuition behind instrumental variables
Part 1: Analyzing an Experiment
Last year I performed an experiment for my MODES scores Can I bribe students to get great teaching evaluations?
Two sections: morning (9:00-12:00) vs. afternoon (1:00-4:00) Evaluation day: gave candy only to the morning students
Compared evaluations across the two: scored 1-5
Part 1: Analyzing an Experiment
Let’s define a few variables:
yi : Teaching evaluation for student i from 1-5
Di : Treatment status (candy vs. no candy)
1 if student received candy
Di = 0 otherwise
How do we see if the bribe was effective?
Are evaluations higher—on average—for students who got candy? E[yi|Di = 1] > E[yi|Di = 0]
Equivalently:
E[yi|Di = 1]−E[yi|Di = 0] > 0
Plotting the Difference in (Conditional) Means
Treatment: Candy or Not
Modes Score
Estimating the Difference in (Conditional) Means
Two exercises in R
What is the difference in means between the two groups
What is the magnitude of the t-statistic from a t-test for a difference in these means (two sample, equal variance)
Regression Provides Simple way to Analyze an Experiment
yi =β0+β1Di+vi
β1 gives the difference in means
t-statistic is equivalent to (two sample-equal variance) t-test
Part 2: Causality and the potential outcomes framework
Differences in conditional means often represent correlations Not causal effects
Potential outcomes framework helps us define a notion of causality And understand the assumptions necessary for conditional
expectations to reflect causal effects
Experiments allow CEF to capture causal effects
What’s really key is certain assumption: conditional independence
Conditional Means and Causality
In economics/finance we often want more than conditional means Interested in causal questions:
Does a change in X cause a change in Y?
How do corporate acquisitions affect the value of the acquirer?
How does a firm’s capital structure impact investment?
Does corporate governance affect firm performance?
Thinking Formally About Causality
Consider the example we just studied
Compare evaluations (1 – 5) for two (equal sized) groups: Treated with Candy vs. Not Treated
No Candy Candy
Sample Size Evaluation
80 3.32 80 4.33
Standard Error
0.065 0.092
Treated group provided significantly higher evaluations Difference in conditional means:
E[yi|Di = 1]−E[yi|Di = 0] = 1.01 So does candy cause higher scores?
Any Potential Reasons Why This Might not be Causal?
What if students in the morning class would have given me higher ratings even without candy?
Maybe I teach better in the morning?
Or morning students are more generous
We call this a “selection effect”
What if students in the morning respond better to candy?
Perhaps they are hungrier
For both of these, would need to answer the following question:
What scores would morning students have given me without candy?
I’ll never know…
The Potential Outcomes Framework
Ideally, how would we find the impact of candy on evaluations (yi )? Imagine we had access to two parallel universes and could observe
The exact same student (i)
At the exact same time
In one universe they receive candy—in the other they do not
And suppose we could see the student’s evaluations in both worlds Define the variables we would like to see: for each individual i:
yi1 = evaluation with candy yi0 = evaluation without candy
The Potential Outcomes Framework
If we could see both yi1 and yi0 impact would be easy to find: The causal effect or treatment effect for individual i defined as
Would answer our question—but we never see both yi1 and yi0!
Some people call this the “fundamental problem of causal inference” Intuition: there are two “potential” worlds out there
The treatment variable Di decides which one we see:
yi1 if Di = 1 yi= yi0ifDi=0
The Potential Outcomes Framework
We can never see the individual treatment effect
We are typically happy with population level alternatives
For example, the average treatment effect:
Average Treatment Effect = E[yi1 −yi0] = E[yi1]−E[yi0]
This is usually what’s meant by the “effect” of x on y
We often aren’t even able to see the average treatment effect We typically only see conditional means
So What Do Differences in Conditional Means Tell You?
In the MODES example, we compared:
E[yi|Di = 1]−E[yi|Di = 0] = 1.01
We can estimate this We can estimate this
Or, written in terms of potential outcomes:
⇒ E[yi1|Di = 1]−E[yi0|Di = 0] = 1.01
̸= E[yi1]−E[yi0] Why is this not equal to E[yi1]−E[yi0]?
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1]
Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]
Selection Effect
So What Do Differences in Conditional Means Tell You?
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1]
Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]
Selection Effect ̸= E[yi1]−E[yi0]
Average Treatment Effect
So our estimate could be different from the average effect of treatment E[yi1]−E[yi0] for two reasons:
(1) The morning section might have given better reviews anyway: E[yi0|Di = 1]−E[yi0|Di = 0] > 0
Selection Effect
(2) Candy matters more in the morning:
E[yi1|Di = 1]−E[yi0|Di = 1] ̸= E[yi1]−E[yi0]
Average Treatment Effect for the Treated Group Average Treatment Effect
What are the Benefits of Experiments?
Truly random experiments solve this “identification” problem—why? Suppose Di is chosen randomly for each individual
This means that Di is independent of (yi1,yi0) in a statistical sense (yi1,yi0) ⊥ Di
Intuition: potential outcomes yi1 and yi0 unrelated to treatment
What are the Benefits of Experiments?
(yi1,yi0) ⊥ Di
Sidenote: if two random variables are independent (X ⊥ Z ):
E[X|Z] = E[X]
Hence in an experiment:
E[yi1|Di = 1] = E[yi1]
E[yi0|Di = 0] = E[yi0]
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1]−E[yi0]
We can estimate this We can estimate this Average Treatment Effect!
What are the Benefits of Experiments?
Why does independence fix the two problems with using E[yi1|Di = 1]−E[yi0|Di = 0]
(1) The selection effect is now 0
E[yi0|Di = 1]−E[yi0|Di = 0] = 0
Selection Effect
(2) Average treatment effect for treated group now accurately measures average treatment effect in the whole sample:
E[yi1|Di = 1]−E[yi0|Di = 1] = E[yi1]−E[yi0]
Average Treatment Effect for the Treated Group
The Conditional Independence Assumption
Of course an experiment is not strictly necessary as long as As long as (yi1,yi0) ⊥ Di
This happens, but not likely for most practical applications
Slightly more reasonable is the conditional independence assumption Let Xi be a set of control variables
Conditional Independence: (yi1,yi0) ⊥ Di|Xi
Independence holds within a group with the same characteristics Xi
E[yi1|Di = 1,Xi]−E[yi0|Di = 0,Xi] = E[yi1 −yi0|Xi]
A (silly) Example of Conditional Independence
Suppose I randomly treat (Di = 1) 75% of the morning class (with candy)
And randomly treat (Di = 1) 25% of the afternoon class
And suppose I am a much better teacher in the morning
(yi1,yi0) ̸⊥ Di
Because E[yi0|Di = 1] > E[yi0|Di = 0]
A (silly) Example of Conditional Independence
Let xi = 1 for the morning class, xi = 0 for afternoon We can estimate the means for all groups:
Afternoon, no candy: E [yi |Di = 0, xi = 0] = 3.28
Afternoon, with candy E [yi |Di = 1, xi = 0] = 3.78
Morning, no candy: E[yi|Di = 0,xi = 1] = 3.95
Morning, with candy E[yi|Di = 1,xi = 1] = 4.45
A (silly) Example of Conditional Independence
If we try to calculate the difference in means directly
E[yi|Di =1]= 14×E[yi|Di =1,xi =0]+34×E[yi|Di =1,xi =1]=4.36
E[yi|Di =0]= 34×E[yi|Di =0,xi =0]+14×E[yi|Di =0,xi =1]=3.45
Our estimate is contaminated because the morning class is better E[yi|Di = 1]−E[yi|Di = 0] = 4.36−3.45 = 0.835
A (silly) Example of Conditional Independence
E[yi|Di = 0,xi = 0] = 3.28 and E[yi|Di = 1,xi = 0] = 3.78 E[yi|Di = 0,xi = 1] = 3.95 and E[yi|Di = 1,xi = 1] = 4.45
However, within each class treatment is random yi 0 ⊥ Di |Xi :
So we may recover the average treatment effect conditional on Xi :
E[yi1−yi0|xi =0]=? E[yi1−yi0|xi =1]=?
A (silly) Example of Conditional Independence
E[yi|Di = 0,xi = 0] = 3.28 and E[yi|Di = 1,xi = 0] = 3.78 E[yi|Di = 0,xi = 1] = 3.95 and E[yi|Di = 1,xi = 1] = 4.45
However, within each class treatment is random yi 0 ⊥ Di |xi :
So we may recover the average treatment effect conditional on xi :
For the afternoon:
E[yi1 −yi0|xi = 0] = 3.78−3.28 = 0.5
For the morning
E[yi1 −yi0|xi = 1] = 4.45−3.95 = 0.5
In this case:
E[yi1 −yi0] = 12E[yi1 −yi0|xi = 0]+ 12E[yi1 −yi0|xi = 1] = 0.5
Part 3: Causality and Regression
When does regression recover a causal effect? Need conditional mean independence
Threats to recovering causal effects
Omitted variables and measurement error
Controlling for confounding variables
Causality and Regression
When does linear regression capture a causal effect?
Start with a simple case: constant treatment effects
Suppose yi depends only on two random vars.: Di ∈ {0, 1} and vi
yi =α+ρDi +vi Di is some treatment (say candy)
vi is absolutely everything else that impacts yi
How good at R you are, How much you liked Paolo’s course, etc.
Set: E[vi]=0
Causality and Regression
yi =α+ρDi +vi Can then write potential outcomes:
yi1 =α+ρ+vi yi0 =α+vi
Because ρ is constant, individual and average treatment effects are: yi1 −yi0 = E[yi1 −yi0] = ρ
So ρ is what we want, the effect of treatment
But suppose we don’t know ρ, and only see yi and Di
Causality and Regression
yi =α+ρDi +vi
Suppose we regress yi on Di , and recover β OLS . When will
βOLS =ρ? 1
Because Di is binary, we have:
βOLS =E[y |D =1]−E[y |D =0]
= E[α +ρ +vi|Di = 1]−E[α +vi|Di = 0] = ρ +E[vi|Di = 1]−E[vi|Di = 0]
Causality and Regression
βOLS =ρ+E[v |D =1]−E[v |D =0] 1iiii
Selection Effect
SowhendoesβOLS=ρ? 1
Holds under independence assumption (yi1,yi0) ⊥ Di Since yi1 =α+ρ+vi, yi0 =α+vi:
(yi1,yi0) ⊥ Di ⇐⇒ vi ⊥ Di This independence means
⇒ E[vi|Di = 1] = E[vi|Di = 0] = E[vi]
Causality and Regression
βOLS =ρ+E[v |D =1]−E[v |D =0] 1iiii
Selection Effect
SowhendoesβOLS=ρ? 1
β OLS = ρ even under weaker assumption than independence: 1
Mean Independence : E [vi |Di ] = E [vi ]
When will βOLS =ρ? 1
Suppose we don’t know ρ
yi =α+ρDi +vi
Our regression coefficient captures the causal effect ( βOLS = ρ1) if: 1
E[vi|Di] = E[vi]
The conditional mean is the same for every Di
More intuitive: E [vi |Di ] = E [vi ] implies that Corr(vi,Di)=0
What if vi and Di are Correlated?
Our regression coefficient captures the causal effect ( βOLS = ρ) if:
E[vi|Di] = E[vi]
This implies that Corr(vi , Di ) = 0
So anytime Di and vi are correlated:
βOLS ̸=ρ 1
βOLS ̸=α 0
Anytime Di correlated with anything else unobserved that impacts yi
Causality and Regression: Continuous xi
Suppose there is a continuous xi with a causal relationship with yi :
A1unit↑inxi increasesyi byafixedamountβ1
e.g. an hour of studying increases your final grade by β1
Tempting to write:
yi =β0+β1xi
But in practice other things impact yi : again call these vi yi =β0+β1xi+vi
e.g. intelligence also matters for your final grade
OLS Estimator Fits a Line Through the Data
OLS Estimator Fits a Line Through the Data
βOLS +βOLSX 01
OLS Estimator Fits a Line Through the Data
OLS Estimator Fits a Line Through the Data
βOLS +βOLSX 01
Causality and Regression: Continuous xi yi =β0+β1xi+vi
Regression coefficient captures causal effect (β OLS = β ) if: 1
E[vi|xi] = E[vi] Fails anytime corr (xi , vi ) ̸= 0
An aside: we have used similar notation for 3 different things: 1. β1: the causal effect on yi of a 1 unit change in xi.
2. β OLS = Cov (xi ,yi ) : the population regression coefficient 1 Var(xi)
3. βˆOLS = Cov (xi ,yi ) : the sample regression coefficient
1 Var (xi )
The Causal Relationship Between X and Y
One Data Point
yi=β0 + β1xi+vi vi
If E[vi|xi] = E[vi] then βOLS = β1 1
If E[vi|xi] = E[vi] then βOLS = β1 1
βOLS +βOLSX 01
If E[vi|xi] = E[vi] then βOLS = β1 1
βOLS +βOLSX 01
What if vi and xi are Positively Correlated?
What if vi and xi are Positively Correlated?
If Corr(vi,xi) ̸= 0 then βOLS ̸= β1 1
βOLS +βOLSX 01
An Example from Economics
Consider the model for wages
Wagesi =β0+β1Si+vi
Where Si is years of schooling
Are there any reasons that Si might be correlated with vi ? If so, this regression won’t uncover β1
Examples from Corporate Finance
Consider the model for leverage
Leveragei = α + β Profitabilityi + vi
Why might we have trouble recovering β?
(1) Unprofitable firms tend to have higher bankruptcy risk and should
have lower leverage than more profitable firms (tradeoff theory) ⇒ corr(Profitabilityi , vi ) > 0
⇒ E [vi |Profitabilityi ] ̸= E [vi ]
(2) Unprofitable firms have accumulated lower profits in the past and may have to use debt financing implying higher leverage (pecking order theory)
⇒ corr(Profitabilityi , vi ) < 0 ⇒ E [vi |Profitabilityi ] ̸= E [vi ]
One reason vi and xi might be correlated?
Suppose that we know yi is generated by the following
yi =β0+β1xi+γai+ei
Where xi and ei are uncorrelated, but Corr (ai , xi ) > 0
Could think of yi as wages, xi as years of schooling, ai as ability
Suppose we see yi and xi but not ai , and have to consider the model
yi=β0+β1xi+ vi
A Quick Review: Properties of Covariance
A few properties of Covariance: If W , X , Z are random variables: Cov(W,X +Z) = Cov(W,X)+Cov(W,Z)
Cov(X,X) = Var(X)
If a and b are constants:
Cov(aW,bX) = abCov(W,X)
Cov(a+W,Y)=Cov(W,X)
Finally, remember that correlation is just the covariance scaled:
Cov(X,Z) Corr(X,Z) = Var(X)Var(Z)
I’ll switch back and forth between them sometimes
Omitted Variables Bias
Soifwehave:
What will the regression of yi on xi give us?
Recall that the regression coefficient is β OLS = Cov (yi ,xi ) : 1 Var(xi)
βOLS = Cov(yi,xi) 1 Var(xi)
= Cov(β0 +β1xi +vi,xi) Var(xi)
= β1Cov(xi,xi) + Cov(vi,xi) Var (xi ) Var (xi )
= β1 + Cov (vi , xi ) Var(xi)
yi =β0+β1xi+vi
Omitted Variables Bias
So βOLS is biased 1
βOLS =β +Cov(vi,xi) 1 1 Var(xi)
Ifvi=γai+ei with
Corr(ai,xi) ̸= 0 and Corr(ei,xi) = 0
we can characterize this bias in simple terms: βOLS=β +Cov(γai+ei,xi)
1 1 Var(xi)
Bias = β1 + γ Cov (ai , xi )
Omitted Variables Bias
βOLS =β +γCov(ai,xi) 1 1 Var(xi)
=β +γδOLS 11
Where δOLS is the coefficient from the regression:
a =δOLS+δOLSx +ηOLS i01ii
Omitted Variables Bias
Good heuristic for evaluating OLS estimates:
βOLS =β +γδOLS 111
γ: relationship between ai and Yi
δOLS: relationship between ai and xi
Might not be able to measure γ or δOLS—but can often make a 1
good guess
Impact of Schooling on Wages
Suppose wages (yi ) are determined by:
yi =β0+β1xi+γai+ei
and we see years of schooling (xi ) but not ability (ai ) Corr(xi,ai) > 0 and Corr(yi,ai) > 0
We estimate: And recover
yi =β0+β1xi+vi
βOLS =β +γδOLS 111
Impact of Schooling on Wages
βOLS =β +γδOLS 111
Is our estimated β OLS larger or smaller than β1 ? 1
menti.com
Controlling for a Confounding Variable
yi =β0+β1xi+γai+ei Suppose we are able to observe ability
e.g. an IQ test is all that matters For simplicity, let xi be binary
xi = 1 if individual i has an MSc, 0 otherwise Suppose we regress yi on xi and ai
βOLS =E[y |x =1,a ]−E[y |x =0,a ] 1 iiiiii
= E[β0 +β1 +γai +ei|xi = 1,ai]−E[β0 +γai +ei|xi = 0,ai]
Controlling for a Confounding Variable
βOLS =E[β +β +γa +e |x =1,a ]−E[β +γa +e |x =0,a ] 101iiii0iiii
Canceling out terms gives:
βOLS =β +E[e |x =1,a ]−E[e |x =0,a ]
11iiiiii So our β OLS = β1 if the following condition holds:
E[ei|xi,ai]−E[ei|ai] This is called Conditional Mean Independence
A slightly weaker version of our conditional independence assumption (yi1,yi0)⊥xi|ai
Example: Controlling for a Confounding Variable
Republican Votes and Income: South and North
Republican Votes and Income: South and North
Controlling for a Confounding Variable
Suppose we run the following regression
repvotesi =β0+β1income+vi
What is βˆols? menti.com… 1
So does being rich decrease Republican votes?
Suppose we run separately in the South and North:
repvotesi =β0+β1income+vi repvotesi =β0+β1income+vi
What is βˆols in the south? 1
Within regions, income p
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com