编程辅导 OLSX 01

An Introduction to Causality
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
January 17-18, 2022

Copyright By PowCoder代写 加微信 powcoder

Last Week’s Lecture: Two Parts
(1) Introduction to the conditional expectation function (CEF) (2) Ordinary Least Squares and the CEF

Today’s Lecture: Four Parts
(1) Analyzing an experiment in R
􏰀 Comparing means (t-test) and regression
(2) Causality and the potential outcomes framework 􏰀 What do we mean when we say X causes Y
(3) Linear regression, the CEF, and causality
􏰀 How do we think about causality in a regression framework?
(4) Instrumental Variables (if time)
􏰀 The intuition behind instrumental variables

Part 1: Analyzing an Experiment
􏰀 Last year I performed an experiment for my MODES scores 􏰀 Can I bribe students to get great teaching evaluations?
􏰀 Two sections: morning (9:00-12:00) vs. afternoon (1:00-4:00) 􏰀 Evaluation day: gave candy only to the morning students
􏰀 Compared evaluations across the two: scored 1-5

Part 1: Analyzing an Experiment
􏰀 Let’s define a few variables:
􏰀 yi : Teaching evaluation for student i from 1-5
􏰀 Di : Treatment status (candy vs. no candy)
􏰉1 if student received candy
Di = 0 otherwise
􏰀 How do we see if the bribe was effective?
􏰀 Are evaluations higher—on average—for students who got candy? E[yi|Di = 1] > E[yi|Di = 0]
􏰀 Equivalently:
E[yi|Di = 1]−E[yi|Di = 0] > 0

Plotting the Difference in (Conditional) Means
Treatment: Candy or Not
Modes Score

Estimating the Difference in (Conditional) Means
􏰀 Two exercises in R
􏰀 What is the difference in means between the two groups
􏰀 What is the magnitude of the t-statistic from a t-test for a difference in these means (two sample, equal variance)

Regression Provides Simple way to Analyze an Experiment
yi =β0+β1Di+vi
􏰀 β1 gives the difference in means
􏰀 t-statistic is equivalent to (two sample-equal variance) t-test

Part 2: Causality and the potential outcomes framework
􏰀 Differences in conditional means often represent correlations 􏰀 Not causal effects
􏰀 Potential outcomes framework helps us define a notion of causality 􏰀 And understand the assumptions necessary for conditional
expectations to reflect causal effects
􏰀 Experiments allow CEF to capture causal effects
􏰀 What’s really key is certain assumption: conditional independence

Conditional Means and Causality
􏰀 In economics/finance we often want more than conditional means 􏰀 Interested in causal questions:
Does a change in X cause a change in Y?
􏰀 How do corporate acquisitions affect the value of the acquirer?
􏰀 How does a firm’s capital structure impact investment?
􏰀 Does corporate governance affect firm performance?

Thinking Formally About Causality
􏰀 Consider the example we just studied
􏰀 Compare evaluations (1 – 5) for two (equal sized) groups: 􏰀 Treated with Candy vs. Not Treated
No Candy Candy
Sample Size Evaluation
80 3.32 80 4.33
Standard Error
0.065 0.092
􏰀 Treated group provided significantly higher evaluations 􏰀 Difference in conditional means:
E[yi|Di = 1]−E[yi|Di = 0] = 1.01 􏰀 So does candy cause higher scores?

Any Potential Reasons Why This Might not be Causal?
􏰀 What if students in the morning class would have given me higher ratings even without candy?
􏰀 Maybe I teach better in the morning?
􏰀 Or morning students are more generous
􏰀 We call this a “selection effect”
􏰀 What if students in the morning respond better to candy?
􏰀 Perhaps they are hungrier
􏰀 For both of these, would need to answer the following question:
􏰀 What scores would morning students have given me without candy?
􏰀 I’ll never know…

The Potential Outcomes Framework
􏰀 Ideally, how would we find the impact of candy on evaluations (yi )? 􏰀 Imagine we had access to two parallel universes and could observe
􏰀 The exact same student (i)
􏰀 At the exact same time
􏰀 In one universe they receive candy—in the other they do not
􏰀 And suppose we could see the student’s evaluations in both worlds 􏰀 Define the variables we would like to see: for each individual i:
yi1 = evaluation with candy yi0 = evaluation without candy

The Potential Outcomes Framework
􏰀 If we could see both yi1 and yi0 impact would be easy to find: 􏰀 The causal effect or treatment effect for individual i defined as
􏰀 Would answer our question—but we never see both yi1 and yi0!
􏰀 Some people call this the “fundamental problem of causal inference” 􏰀 Intuition: there are two “potential” worlds out there
􏰀 The treatment variable Di decides which one we see:
􏰉yi1 if Di = 1 yi= yi0ifDi=0

The Potential Outcomes Framework
􏰀 We can never see the individual treatment effect
􏰀 We are typically happy with population level alternatives
􏰀 For example, the average treatment effect:
Average Treatment Effect = E[yi1 −yi0] = E[yi1]−E[yi0]
􏰀 This is usually what’s meant by the “effect” of x on y
􏰀 We often aren’t even able to see the average treatment effect 􏰀 We typically only see conditional means

So What Do Differences in Conditional Means Tell You?
􏰀 In the MODES example, we compared:
E[yi|Di = 1]−E[yi|Di = 0] = 1.01
􏰎 􏰍􏰌 􏰏 􏰎 􏰍􏰌 􏰏
We can estimate this We can estimate this
􏰀 Or, written in terms of potential outcomes:
⇒ E[yi1|Di = 1]−E[yi0|Di = 0] = 1.01
̸= E[yi1]−E[yi0] 􏰀 Why is this not equal to E[yi1]−E[yi0]?
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1] 􏰎 􏰍􏰌 􏰏
Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]
Selection Effect

So What Do Differences in Conditional Means Tell You?
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1] 􏰎 􏰍􏰌 􏰏
Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]
Selection Effect ̸= E[yi1]−E[yi0]
Average Treatment Effect
􏰀 So our estimate could be different from the average effect of treatment E[yi1]−E[yi0] for two reasons:
(1) The morning section might have given better reviews anyway: E[yi0|Di = 1]−E[yi0|Di = 0] > 0
Selection Effect
(2) Candy matters more in the morning:
E[yi1|Di = 1]−E[yi0|Di = 1] ̸= E[yi1]−E[yi0]
􏰎 􏰍􏰌 􏰏􏰎􏰍􏰌􏰏
Average Treatment Effect for the Treated Group Average Treatment Effect

What are the Benefits of Experiments?
􏰀 Truly random experiments solve this “identification” problem—why? 􏰀 Suppose Di is chosen randomly for each individual
􏰀 This means that Di is independent of (yi1,yi0) in a statistical sense (yi1,yi0) ⊥ Di
􏰀 Intuition: potential outcomes yi1 and yi0 unrelated to treatment

What are the Benefits of Experiments?
(yi1,yi0) ⊥ Di
􏰀 Sidenote: if two random variables are independent (X ⊥ Z ):
E[X|Z] = E[X]
􏰀 Hence in an experiment:
E[yi1|Di = 1] = E[yi1]
E[yi0|Di = 0] = E[yi0]
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1]−E[yi0]
􏰎 􏰍􏰌 􏰏 􏰎 􏰍􏰌 􏰏 􏰎 􏰍􏰌 􏰏
We can estimate this We can estimate this Average Treatment Effect!

What are the Benefits of Experiments?
􏰀 Why does independence fix the two problems with using E[yi1|Di = 1]−E[yi0|Di = 0]
(1) The selection effect is now 0
E[yi0|Di = 1]−E[yi0|Di = 0] = 0
Selection Effect
(2) Average treatment effect for treated group now accurately measures average treatment effect in the whole sample:
E[yi1|Di = 1]−E[yi0|Di = 1] = E[yi1]−E[yi0] 􏰎 􏰍􏰌 􏰏
Average Treatment Effect for the Treated Group

The Conditional Independence Assumption
􏰀 Of course an experiment is not strictly necessary as long as As long as (yi1,yi0) ⊥ Di
􏰀 This happens, but not likely for most practical applications
􏰀 Slightly more reasonable is the conditional independence assumption 􏰀 Let Xi be a set of control variables
Conditional Independence: (yi1,yi0) ⊥ Di|Xi
􏰀 Independence holds within a group with the same characteristics Xi
E[yi1|Di = 1,Xi]−E[yi0|Di = 0,Xi] = E[yi1 −yi0|Xi]

A (silly) Example of Conditional Independence
􏰀 Suppose I randomly treat (Di = 1) 75% of the morning class (with candy)
􏰀 And randomly treat (Di = 1) 25% of the afternoon class
􏰀 And suppose I am a much better teacher in the morning
(yi1,yi0) ̸⊥ Di
􏰀 Because E[yi0|Di = 1] > E[yi0|Di = 0]

􏰘􏰺􏰑 􏰻􏰼􏰒􏰸􏰒􏰽 􏰘􏰺􏰗 􏰻􏰼􏰒􏰸􏰒􏰽 􏰘􏰺􏰑 􏰻􏰾􏰒􏰸􏰒􏰽 􏰘􏰺􏰗 􏰻􏰾􏰒􏰸􏰒􏰽 􏰿􏱀􏰨􏰛􏰛 􏰨􏰚􏰹 􏱁􏰥􏰙􏰨􏰝􏰦􏰙􏰚􏰝
􏰸􏰤􏰹􏰙􏰛 􏰑􏰗􏰓􏰷􏰔􏰠

A (silly) Example of Conditional Independence
􏰀 Let xi = 1 for the morning class, xi = 0 for afternoon 􏰀 We can estimate the means for all groups:
􏰀 Afternoon, no candy: E [yi |Di = 0, xi = 0] = 3.28
􏰀 Afternoon, with candy E [yi |Di = 1, xi = 0] = 3.78
􏰀 Morning, no candy: E[yi|Di = 0,xi = 1] = 3.95
􏰀 Morning, with candy E[yi|Di = 1,xi = 1] = 4.45

A (silly) Example of Conditional Independence
􏰀 If we try to calculate the difference in means directly
E[yi|Di =1]= 14×E[yi|Di =1,xi =0]+34×E[yi|Di =1,xi =1]=4.36
E[yi|Di =0]= 34×E[yi|Di =0,xi =0]+14×E[yi|Di =0,xi =1]=3.45
􏰀 Our estimate is contaminated because the morning class is better E[yi|Di = 1]−E[yi|Di = 0] = 4.36−3.45 = 0.835

A (silly) Example of Conditional Independence
E[yi|Di = 0,xi = 0] = 3.28 and E[yi|Di = 1,xi = 0] = 3.78 E[yi|Di = 0,xi = 1] = 3.95 and E[yi|Di = 1,xi = 1] = 4.45
􏰀 However, within each class treatment is random yi 0 ⊥ Di |Xi :
􏰀 So we may recover the average treatment effect conditional on Xi :
E[yi1−yi0|xi =0]=? E[yi1−yi0|xi =1]=?

A (silly) Example of Conditional Independence
E[yi|Di = 0,xi = 0] = 3.28 and E[yi|Di = 1,xi = 0] = 3.78 E[yi|Di = 0,xi = 1] = 3.95 and E[yi|Di = 1,xi = 1] = 4.45
􏰀 However, within each class treatment is random yi 0 ⊥ Di |xi :
􏰀 So we may recover the average treatment effect conditional on xi :
􏰀 For the afternoon:
E[yi1 −yi0|xi = 0] = 3.78−3.28 = 0.5
􏰀 For the morning
E[yi1 −yi0|xi = 1] = 4.45−3.95 = 0.5
􏰀 In this case:
E[yi1 −yi0] = 12E[yi1 −yi0|xi = 0]+ 12E[yi1 −yi0|xi = 1] = 0.5

Part 3: Causality and Regression
􏰀 When does regression recover a causal effect? 􏰀 Need conditional mean independence
􏰀 Threats to recovering causal effects
􏰀 Omitted variables and measurement error
􏰀 Controlling for confounding variables

Causality and Regression
􏰀 When does linear regression capture a causal effect?
􏰀 Start with a simple case: constant treatment effects
􏰀 Suppose yi depends only on two random vars.: Di ∈ {0, 1} and vi
yi =α+ρDi +vi 􏰀 Di is some treatment (say candy)
􏰀 vi is absolutely everything else that impacts yi
􏰀 How good at R you are, How much you liked Paolo’s course, etc.
􏰀 Set: E[vi]=0

Causality and Regression
yi =α+ρDi +vi 􏰀 Can then write potential outcomes:
yi1 =α+ρ+vi yi0 =α+vi
􏰀 Because ρ is constant, individual and average treatment effects are: yi1 −yi0 = E[yi1 −yi0] = ρ
􏰀 So ρ is what we want, the effect of treatment
􏰀 But suppose we don’t know ρ, and only see yi and Di

Causality and Regression
yi =α+ρDi +vi
􏰀 Suppose we regress yi on Di , and recover β OLS . When will
βOLS =ρ? 1
􏰀 Because Di is binary, we have:
βOLS =E[y |D =1]−E[y |D =0]
= E[α +ρ +vi|Di = 1]−E[α +vi|Di = 0] = ρ +E[vi|Di = 1]−E[vi|Di = 0]

Causality and Regression
βOLS =ρ+E[v |D =1]−E[v |D =0] 1iiii
Selection Effect
􏰀 SowhendoesβOLS=ρ? 1
􏰀 Holds under independence assumption (yi1,yi0) ⊥ Di 􏰀 Since yi1 =α+ρ+vi, yi0 =α+vi:
(yi1,yi0) ⊥ Di ⇐⇒ vi ⊥ Di 􏰀 This independence means
⇒ E[vi|Di = 1] = E[vi|Di = 0] = E[vi]

Causality and Regression
βOLS =ρ+E[v |D =1]−E[v |D =0] 1iiii
Selection Effect
􏰀 SowhendoesβOLS=ρ? 1
􏰀 β OLS = ρ even under weaker assumption than independence: 1
Mean Independence : E [vi |Di ] = E [vi ]

When will βOLS =ρ? 1
􏰀 Suppose we don’t know ρ
yi =α+ρDi +vi
􏰀 Our regression coefficient captures the causal effect ( βOLS = ρ1) if: 1
E[vi|Di] = E[vi]
􏰀 The conditional mean is the same for every Di
􏰀 More intuitive: E [vi |Di ] = E [vi ] implies that Corr(vi,Di)=0

What if vi and Di are Correlated?
􏰀 Our regression coefficient captures the causal effect ( βOLS = ρ) if:
E[vi|Di] = E[vi]
􏰀 This implies that Corr(vi , Di ) = 0
􏰀 So anytime Di and vi are correlated:
βOLS ̸=ρ 1
βOLS ̸=α 0
􏰀 Anytime Di correlated with anything else unobserved that impacts yi

Causality and Regression: Continuous xi
􏰀 Suppose there is a continuous xi with a causal relationship with yi :
􏰀 A1unit↑inxi increasesyi byafixedamountβ1
􏰀 e.g. an hour of studying increases your final grade by β1
􏰀 Tempting to write:
yi =β0+β1xi
􏰀 But in practice other things impact yi : again call these vi yi =β0+β1xi+vi
􏰀 e.g. intelligence also matters for your final grade

OLS Estimator Fits a Line Through the Data

OLS Estimator Fits a Line Through the Data
βOLS +βOLSX 01

OLS Estimator Fits a Line Through the Data

OLS Estimator Fits a Line Through the Data
βOLS +βOLSX 01

Causality and Regression: Continuous xi yi =β0+β1xi+vi
􏰀 Regression coefficient captures causal effect (β OLS = β ) if: 1
E[vi|xi] = E[vi] 􏰀 Fails anytime corr (xi , vi ) ̸= 0
􏰀 An aside: we have used similar notation for 3 different things: 1. β1: the causal effect on yi of a 1 unit change in xi.
2. β OLS = Cov (xi ,yi ) : the population regression coefficient 1 Var(xi)
3. βˆOLS = Cov (xi ,yi ) : the sample regression coefficient
1􏰐 Var (xi )

The Causal Relationship Between X and Y

One Data Point
yi=β0 + β1xi+vi vi

If E[vi|xi] = E[vi] then βOLS = β1 1

If E[vi|xi] = E[vi] then βOLS = β1 1
βOLS +βOLSX 01

If E[vi|xi] = E[vi] then βOLS = β1 1
βOLS +βOLSX 01

What if vi and xi are Positively Correlated?

What if vi and xi are Positively Correlated?

If Corr(vi,xi) ̸= 0 then βOLS ̸= β1 1
βOLS +βOLSX 01

An Example from Economics
􏰀 Consider the model for wages
Wagesi =β0+β1Si+vi
􏰀 Where Si is years of schooling
􏰀 Are there any reasons that Si might be correlated with vi ? 􏰀 If so, this regression won’t uncover β1

Examples from Corporate Finance
􏰀 Consider the model for leverage
Leveragei = α + β Profitabilityi + vi
􏰀 Why might we have trouble recovering β?
(1) Unprofitable firms tend to have higher bankruptcy risk and should
have lower leverage than more profitable firms (tradeoff theory) ⇒ corr(Profitabilityi , vi ) > 0
⇒ E [vi |Profitabilityi ] ̸= E [vi ]
(2) Unprofitable firms have accumulated lower profits in the past and may have to use debt financing implying higher leverage (pecking order theory)
⇒ corr(Profitabilityi , vi ) < 0 ⇒ E [vi |Profitabilityi ] ̸= E [vi ] One reason vi and xi might be correlated? 􏰀 Suppose that we know yi is generated by the following yi =β0+β1xi+γai+ei 􏰀 Where xi and ei are uncorrelated, but Corr (ai , xi ) > 0
􏰀 Could think of yi as wages, xi as years of schooling, ai as ability
􏰀 Suppose we see yi and xi but not ai , and have to consider the model
yi=β0+β1xi+ vi 􏰎􏰍􏰌􏰏

A Quick Review: Properties of Covariance
􏰀 A few properties of Covariance: If W , X , Z are random variables: Cov(W,X +Z) = Cov(W,X)+Cov(W,Z)
Cov(X,X) = Var(X)
􏰀 If a and b are constants:
Cov(aW,bX) = abCov(W,X)
Cov(a+W,Y)=Cov(W,X)
􏰀 Finally, remember that correlation is just the covariance scaled:
Cov(X,Z) Corr(X,Z) = 􏰶Var(X)􏰶Var(Z)
􏰀 I’ll switch back and forth between them sometimes

Omitted Variables Bias
􏰀 Soifwehave:
􏰀 What will the regression of yi on xi give us?
􏰀 Recall that the regression coefficient is β OLS = Cov (yi ,xi ) : 1 Var(xi)
βOLS = Cov(yi,xi) 1 Var(xi)
= Cov(β0 +β1xi +vi,xi) Var(xi)
= β1Cov(xi,xi) + Cov(vi,xi) Var (xi ) Var (xi )
= β1 + Cov (vi , xi ) Var(xi)
yi =β0+β1xi+vi

Omitted Variables Bias
􏰀 So βOLS is biased 1
βOLS =β +Cov(vi,xi) 1 1 Var(xi)
􏰀 Ifvi=γai+ei with
Corr(ai,xi) ̸= 0 and Corr(ei,xi) = 0
we can characterize this bias in simple terms: βOLS=β +Cov(γai+ei,xi)
1 1 Var(xi)
Bias = β1 + γ Cov (ai , xi )

Omitted Variables Bias
βOLS =β +γCov(ai,xi) 1 1 Var(xi)
=β +γδOLS 11
􏰀 Where δOLS is the coefficient from the regression:
a =δOLS+δOLSx +ηOLS i01ii

Omitted Variables Bias
􏰀 Good heuristic for evaluating OLS estimates:
βOLS =β +γδOLS 111
􏰀 γ: relationship between ai and Yi
􏰀 δOLS: relationship between ai and xi
􏰀 Might not be able to measure γ or δOLS—but can often make a 1
good guess

Impact of Schooling on Wages
􏰀 Suppose wages (yi ) are determined by:
yi =β0+β1xi+γai+ei
􏰀 and we see years of schooling (xi ) but not ability (ai ) Corr(xi,ai) > 0 and Corr(yi,ai) > 0
􏰀 We estimate: 􏰀 And recover
yi =β0+β1xi+vi
βOLS =β +γδOLS 111

Impact of Schooling on Wages
βOLS =β +γδOLS 111
􏰀 Is our estimated β OLS larger or smaller than β1 ? 1
􏰀 menti.com

Controlling for a Confounding Variable
yi =β0+β1xi+γai+ei 􏰀 Suppose we are able to observe ability
􏰀 e.g. an IQ test is all that matters 􏰀 For simplicity, let xi be binary
􏰀 xi = 1 if individual i has an MSc, 0 otherwise 􏰀 Suppose we regress yi on xi and ai
βOLS =E[y |x =1,a ]−E[y |x =0,a ] 1 iiiiii
= E[β0 +β1 +γai +ei|xi = 1,ai]−E[β0 +γai +ei|xi = 0,ai]

Controlling for a Confounding Variable
βOLS =E[β +β +γa +e |x =1,a ]−E[β +γa +e |x =0,a ] 101iiii0iiii
􏰀 Canceling out terms gives:
βOLS =β +E[e |x =1,a ]−E[e |x =0,a ]
11iiiiii 􏰀 So our β OLS = β1 if the following condition holds:
E[ei|xi,ai]−E[ei|ai] 􏰀 This is called Conditional Mean Independence
􏰀 A slightly weaker version of our conditional independence assumption (yi1,yi0)⊥xi|ai

Example: Controlling for a Confounding Variable
􏰓􏰑 􏰔􏰑 􏰕􏰑 􏰖􏰑 􏰗􏰑􏰑 􏰗􏰓􏰑 􏰸􏰙􏰹􏰜􏰨􏰚 􏰿􏰤􏰩􏰚􏰝􏰞 􏱆􏰚􏰭􏰤􏰦􏰙
􏱂􏰙􏰬􏰩􏱃􏱀􏰜􏰭􏰨􏰚 􏱄􏰤􏰝􏰙 􏱅􏰮􏰨􏰥􏰙
􏰑 􏰓􏰑 􏰔􏰑 􏰕􏰑 􏰖􏰑 􏰗􏰑􏰑

Republican Votes and Income: South and North
􏰓􏰑 􏰔􏰑 􏰕􏰑 􏰖􏰑 􏰗􏰑􏰑 􏰗􏰓􏰑 􏰸􏰙􏰹􏰜􏰨􏰚 􏰿􏰤􏰩􏰚􏰝􏰞 􏱆􏰚􏰭􏰤􏰦􏰙
􏱂􏰙􏰬􏰩􏱃􏱀􏰜􏰭􏰨􏰚 􏱄􏰤􏰝􏰙 􏱅􏰮􏰨􏰥􏰙
􏰑 􏰓􏰑 􏰔􏰑 􏰕􏰑 􏰖􏰑 􏰗􏰑􏰑

Republican Votes and Income: South and North
􏰓􏰑 􏰔􏰑 􏰕􏰑 􏰖􏰑 􏰗􏰑􏰑 􏰗􏰓􏰑 􏰸􏰙􏰹􏰜􏰨􏰚 􏰿􏰤􏰩􏰚􏰝􏰞 􏱆􏰚􏰭􏰤􏰦􏰙
􏱂􏰙􏰬􏰩􏱃􏱀􏰜􏰭􏰨􏰚 􏱄􏰤􏰝􏰙 􏱅􏰮􏰨􏰥􏰙
􏰑 􏰓􏰑 􏰔􏰑 􏰕􏰑 􏰖􏰑 􏰗􏰑􏰑

Controlling for a Confounding Variable
􏰀 Suppose we run the following regression
repvotesi =β0+β1income+vi
􏰀 What is βˆols? menti.com… 1
􏰀 So does being rich decrease Republican votes?
􏰀 Suppose we run separately in the South and North:
repvotesi =β0+β1income+vi repvotesi =β0+β1income+vi
􏰀 What is βˆols in the south? 1
􏰀 Within regions, income p

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com