CS计算机代考程序代写 MFIN6201

MFIN6201
Week 9 Causality
Leo Liu
April 12, 2021

Outline
• Potential Outcomes, Causal Effects, and Idealized Experiments
• Threats to Validity of Experiments
• Quasi-Experiments: Differences-in-Differences, IV Estimation, Regression Discontinuity Design
• Threats to Validity of Quasi-Experiments
2

Causality
• We are often interested in causal effects, that is, how does A cause B, but not how does B cause A.
• What is the effect of investing more in R&D on firm performance?
• Less interesting is how firm’s performance affects R&D decisions
• Such that we can provide advice to company on how much performance improvement they can expect given certain amount of R&D input.
3

Claiming Causality is Not Easy
We have several confounding factors: omitted variables and reverse causality (simultaneous equation bias) etc.
Consider we want to test the effect of firm’s charity donations on firm performance. We gather a bunch of data on firms’ ROA and their donations and estimate the following regression:
ROA = β0 + β1DONATION + ε
• Let’s say β1 is positive, can we claim donation *leads to* better ROA?
• Reverse Causality?
• Omitted Variables?
4

Causality
• The only way to claim causality is to extract exogeneous variations in X
– Exogeneous variations means they do not correlate with Y, so that there is no reverse causality
– They also do not correlate with any omitted variables, so that we do not have omitted variable bias
– We need to create or look for random variations of X.
5

IV example
• One way to extract exogeneous variations is to use IV
• Suppose we regress donations on natural disasters occurenses
first to extract the variations of donations caused by natural disasters.
– These variations are not caused by firm performance
– These variations are not likely to be correlated with variables
that we omit (for example, firm cash holdings)
We estimate X = α0 + γZ + v, where Z is the instrumental variable. The fitted value, Xˆ represents the exogeneous variations of X. Then you would want to estimate:
Y = β 0 + β 1 Xˆ + ε
This equation will give consistent estimates if our assumptions about Z are satisfied
6

Using experiment to obtain exogeneous variations
• If I am god, and I want to test the effect of charity donations on firm performance. I will run an identical firm in two paralleled worlds, and one firm donates, the other one does not donate. Then I compare the performance difference between the two.
• In this way, There is no omitted variable bias
• First, using identical firms will ensure no firm-specific omitted
variable.
• Second, running parallel worlds ensure there is no other
omitted variables, such as country GDP etc.
• There is also no reversed causality
• As a god, I decide on whether the firm donate or not, not firm performance.
In sum, as a god, I can create exogeneous variations myself by doing an idealized experiment.
7

Using experiment to obtain exogeneous variations
However, I am not a god and I cannot run parallel worlds. What should I do?
• Well, I can pick similar enough firms – Let’s say Woolies and Coles
• I can ask the CEO of Woolies not to donate, and ask the CEO of Coles to donate
• This is not too bad compared to being a god if the firms we pick are similar enough
• Let’s go through how to claim causality using experiments in more details.
8

Terminology: experiments and quasi-experiments
• An experiment is designed and implemented consciously by human researchers. An experiment randomly assigns subjects to treatment (firms that donate) and control groups (firms that do not donate)
• A quasi-experiment or natural experiment refers to experiments not conducted by the researchers, or at least not for the purpose of the study. But the treatment and control group selections are (close to) random such that treatment and control groups are likely to be similar.
• In the experiment settings, we call X the treatment variable Why randomness can allow us to claim causality?
9

Examples of experiments
• Clinical drug trial (lab experiment): does a proposed drug treat COVID-19?
– Y = level of COVID-19 virus
– X = treatment or control group (or dose of drug)
• Job training program (Program evaluation through government intervention )
– Y = has a job, or not (or Y = wage income)
– X = went through experimental program, or not
• Class size effect (Natural experiment)
– Y = test score (Stanford Achievement Test)
– X = class size treatment group (large vs. small)
10

Average Treatment Effect
• In general, different entities have different treatment effects (the effects of making donations are different on different firms). For entities drawn from a population, the average treatment effect is the population mean value of the individual treatment effects.
• For now, consider the case of a single treatment effect – that everyone’s treatment effect is the same in the population under study.
• While it is easy to imagine treatment effects can be different for different individual. For example, drugs can work differently for different age groups. Often general-purpose drugs work better for younger people. However, researchers are often interested in the average effectiveness of the drugs.
11

Estimation
Estimating the average treatment effect in an ideal randomized controlled experiment
• Let X be the treatment variable and Y the outcome variable of interest. If X is randomly assigned (for example by computer) then X is independent of all individual characteristics (no entity-specific omitted variable bias).
• Thus, in the regression model,
Yi = β0 + β1Xi + ui ,
if Xi is randomly assigned, then Xi is independent of ui and other potential confounder or covariates, so E (ui |Xi ) = 0, so OLS yields an unbiased estimator of β1.
• The causal effect is the population value of β1 in an ideal randomized controlled experiment
12

Estimation
Yi = β0 + β1Xi + ui
• When the treatment is binary, β1 is just the difference in mean outcome (Y) in the treatment vs. Control group
(Ytreated −Ycontrol)
• This difference in means is sometimes called the differences estimator.
13

Additional regressors
Let Xi =treatment variable and Wi = control variable(s). Yi = β0 + β1Xi + β2Wi + ui
Two reasons to include W in a regression analysis of the effect of a randomly assigned treatment:
• If Xi is randomly assigned then Xi is uncorrelated with Wi so omitting Wi doesn’t result in omitted variable bias. But including Wi reduces the error variance and can result in smaller standard errors (more likely to reject to the null).
• If the probability of assignment depends on Wi , so that Xi is randomly assigned given Wi , then omitting Wi can lead to OV bias, but including it eliminates that OV bias. This situation is called…
14

Randomization based on covariates
Example: Social planners want to evaluate the effect of a table manner course. Due to the course enrolment procedure, men (Wi = 0) and women (Wi = 1) are randomly assigned to the course (Xi ), but women are assigned with a higher probability than men (Xi is correlated with Wi ). Suppose women have better table manners than men prior to the course. Then even if the course has no effect, the treatment group will have better post-course table manners than the control group because the treatment group has a higher fraction of women than the control group. That is, the OLS estimator of β1 in the regression,
Yi = β0 + β1Xi + ui
has omitted variable bias, which is eliminated by the regression,
Yi = β0 + β1Xi + β2Wi + ui
15

Randomization based on covariates, ctd.
Yi = β0 + β1Xi + β2Wi + ui
• In this example, Xi is randomly assigned, given Wi , so E(ui|Xi,Wi) = E(ui|Wi) = 0
• In words, among women, treatment is randomly assigned, so among women, the error term is independent of Xi so, among women, its mean doesn’t depend on Xi . Same is true among men.
• Thus if randomization is based on covariates, conditional mean independence holds, so that once Wi is included in the regression the OLS estimator is unbiased as was discussed in Ch. 7
16

Estimating causal effects that depend on observables
The causal effect in the previous example might depend on observables, perhaps β1, men > β1, women (men could benefit more from the course than women). We already know how to estimate different coefficients for different groups – use interactions. In the table manners example, we would simply estimate the interactions model,
Yi =β0 +β1Xi +β2Xi ×Wi +β3Wi +ui
In, this regression, we can obtain the estimated effects for the different gender group. β1 estimate the average treatment effects of man (Wi = 0), and the average treatment effects of woman is β1 + β2.
17

Threats to Validity of Experiments
Threats to Internal Validity – issues within the experimental design
• Failure to randomize (or imperfect randomization)
– for example, openings in job training program are filled on first-come, first-serve basis; latecomers are controls
– result is correlation between X and u
• Failure to follow treatment protocol (or “partial compliance”) – some controls get the treatment
– some of those who should be treated aren’t
18

• Attrition (some subjects drop out)
– Suppose the controls who get jobs move out of town; then
corr (X , u) ̸= 0 – we lost those observations.
– This is a reincarnation of sample selection bias (the sample is
selected in a way related to the outcome variable). • Experimental effects
– experimenter bias (conscious or subconscious): treatment X is associated with “extra effort” or “extra care,” so
corr (X , u) ̸= 0 – thinking about the doctors pay extra attention to patients getting the drug (treatment group)
– subject behavior might be affected by being in an experiment, so corr (X , u) ̸= 0 (Hawthorne effect)
Just as in regression analysis with observational data, threats to the internal validity of regression with experimental data implies that corr (X , u) ̸= 0, so OLS (the differences estimator) is biased.
19

Threats to External Validity
External Validity is just another word for how applicable is the estimated effect to the general population.
• Non-representative sample (sample selection bias; effect of COVID-19 drug only estimated in the elder group)
• Non-representative “treatment” (program effects could change when move to large-scale; job training program only tested in one city, the effect of the program could differ if applied national-wide)
• General equilibrium effects (when generalise the program to large scale, it may change the environment where the experiment is conducted, therefore the effect of the program could differ; this often observed in environmental studies, for example, as the environmental protection programs rolling out, the environment improves, and the effects of environmental programs getting smaller and smaller.
20

Quasi-Experiments
A quasi-experiment or natural experiment has a source of randomization that is “as if” randomly assigned, but this variation was not the result of an explicit randomized treatment and control design.
21

Two types of quasi-experiments
• Type 1 : Treatment (X) is “as if” randomly assigned (perhaps conditional on some control variables W)
– Example: A government program to force certain firms to make charity donations, the selection of firms is random
• Type 2: A variable (Z) which influences (not determines) receipt of treatment (X) is “as if” randomly assigned, so we can run IV and use Z as an instrument for X (just like we extract exogeneous variations using IV).
– Example: Natural Disaster as Z for firm’s charity donation X to test its effect on firm performance (Y).
– Natural Disaster is random in time and location, so disaster-induced donations (treatment) are “as if” randomly assigned
22

The differences-in-differences estimator – A fall-back solution
• Let’s consider the parallel worlds example, ideally we would use identical firms, one treated (make donations) vs. one controlled (does not make donations) and the difference in firm performance between these two firms is the treatment effects.
• Here, with quasi-experiment, the treatment and control may not be “as if” randomly assigned (for example, government program has some selection criteria to select certain firms to make charitable donations)
• Therefore, additional adjustment (a fall-back solution) is needed compared to ideal case. Because the treatment and control groups do not contain same set of firms, the difference in firm performance may contain inherent difference between the two sets of firms.
23

The differences-in-differences estimator
βˆ DID = (Ytreated,after−Ytreated,before)−(Ycontrol,after−Ycontrol,before) 1
24

The differences-in-differences estimator
“Differences” regression formulation:
∆Yi = β0 + β1Xi + ui
where
∆Yi =(Yafter −Ybefore)
Xi = 1 if treated, = 0 otherwise β1 is the diffs-in-diffs estimator
The differences-in-differences estimator allows for systematic differences in pre-treatment characteristics, which can happen in a quasi-experiment because treatment is not randomly assigned.
25

Underlying assumptions for diffs-in-diffs
To obtain an unbiased difference estimator, the difference between treated and control group has to be constant pre-treatment.
26

Underlying assumptions for diffs-in-diffs
• A good way to check this assumption is to plot the time-series in the pre-treatment period.
• In the case where you do not have pre-treatment time-series, one can check the balance of covariates. The rationale is that even the outcome variable is different for the two groups, if they are very similar in other dimensions, we should have some comforts that the difference in outcome variable will persist.
• For example, firms making charitable donation perform better than firms which does not prior to the government program, we should expect the performance difference would persist over time given they are similar in other dimensions (such as firm size, firm age, leverage ratios, capital expenditures etc.)
27

Differences-in-differences with control variables
∆Yi = β0 + β1Xi + β2W1it + … + β1+r Writ + ui Xi = 1 if treated, = 0 otherwise
Why include control variables? For the usual reasons:
1. If the assignment of treat and controls does not depend on the controls, including them can lead to a smaller standard errors.
2. If the treatment is happen to be assigned based on those controls, we can restore the randomness of the assignment by including them.
3. Well, it has no harm to include controls in this case, As long as there is no multicollinearity issues.
28

Differences-in-differences in panel data
The drunk driving law analysis of Ch. 10 can be thought of as a quasi-experiment panel data design: if (given the control variables and fixed effects) the beer tax is as if randomly assigned, then the causal effect of the beer tax (the elasticity) can be estimated by panel data regression.
• Assignment of beer tax is not random across states: some states get more beer tax and some get less
• As long as the beer tax is randomly assigned within states, we can restore the randomness by adding state fixed effects
• The estimation should be good if assumptions for panel data are met.
1. E(u|xi1,xi2…xiT,αi)=0
2. (Xi1, Xi2…XiT , ui1, ui2…uiT ) are i.i.d draws from their joint
distribution
3. and assumption 3 and 4 as in OLS
29

Differences-in-differences in panel data
The tools of Ch. 10 apply. Ignoring W’s, the differences-in-differences estimator obtains from including individual fixed effects and time effects:
Yit =αi +δt +β1Xit +uit
* This regression is suitable for the Type 1 quasi-experiment with panel data, i.e. X is “as if” randomly assigned
30

IV estimation
For type 2 quasi-experiment, i.e. if a variable (Z) that influences treatment (X) is “as if” randomly assigned, conditional on W, then Z can be used as an instrumental variable for X in an IV regression that includes the control variables W.
Going back to charitable donations on firm performance example, one would like to estimate an IV regression
Yi =αi +δt +β1Xˆi +ui
and use natural disaster (Z) as instrumental variable, Xˆ is the
fitted value from first-stage regression.
* There is nothing really new from IV estimation we learnt in Lec 6, but adding fixed effects as controls.
31

Regression Discontinuity Estimators
If treatment occurs when some continuous variable W crosses a threshold w0, then you can estimate the treatment effect by comparing individuals with W just below the threshold (treated) to these with W just above the threshold (untreated). If the direct effect on Y of W is continuous, the effect of treatment should show up as a jump in the outcome. The magnitude of this jump estimates the treatment effect.
• In sharp regression discontinuity design, everyone above (or below) the threshold w0 gets treatment.
• In fuzzy regression discontinuity design, crossing the threshold w0 influences the probability of treatment, but that probability is between 0 and 1.
32

Sharp RDD example
One would like to estimate the effects of admission to UNSW on career outcomes. The admission criteria is having University admission test score of 500. The students having test score above 500 will definitely admitted to UNSW and otherwise not. The treatment would be having a score above 500, and control would be having a score below or equal to 500.
• The test score (w) is what we called forcing or running variable
• The score of 500 is the w0
• The treatment is assigned randomly if we look at students score between 499 and 501
• Therefore we obtain students with similar characteristics in the treatment and control group
• It may be a jump in career outcomes as treated students admitted into UNSW
• This jump in outcome is the treatment effect.
33

Sharp Regression Discontinuity
EveryonewithW >w0 getstreated,soXi =1ifWi w0, and the treatment effect is the jump or “discontinuity.”
35

Fuzzy Regression Discontinuity
Let’s again consider admission to UNSW and career outcomes example. Now the test score is not a determinant of admission, it only increase the probability of admission. The students with score lower than 500, but with high English score may also be admitted. The students with score higher than 500 do not have a guarantee to be admitted.
36

Fuzzy RDD estimation
Let Xi = binary treatment variable, and W is the forcing variable, w0 is the threshold
If W crossing the threshold w0 has no direct effect on Yi , so only affects Yi by influencing the probability of treatment, then it naturally leads to an IV estimation, suppose Zi = 1 if Wi > w0 and 0 otherwise:
first stage: second stage:
Xi = α0 + γZi + v Y i = β 0 + β 1 Xˆ i + ε i
37

Potential Problems with Quasi-Experiments
The threats to the internal validity of a quasi-experiment are the same as for a true experiment, with one addition.
• Failure to randomize (imperfect randomization) • Failure to follow treatment protocol
• Attrition
• Experimental effects
• Instrument invalidity (relevance + exogeneity)
38

The threats to the external validity of a quasi-experiment are the same as for an observational study.
• Non-representative sample
• Non-representative “treatment” (that is, program or policy)
39

Summary: Experiments and Quasi-Experiments
Ideal experiments and potential outcomes
• The average treatment effect is the population mean of the individual treatment effect, which is the difference in potential outcomes when treated and not treated.
• The treatment effect estimated in an ideal randomized controlled experiment is unbiased for the average treatment effect. Actual experiments
• Actual experiments have threats to internal validity
• Depending on the threat, these threats to internal validity can be addressed by: – panel data regression (differences-in-differences)
40

– multiple regression (including control variables), and – IV (using initial assignment as an instrument, possibly with control variables)
• External validity also can be an important threat to the validity of experiments Quasi-experiments
• Quasi-experiments have an “as-if” randomly assigned source of variation.
• This as-if random variation can generate: – Xi which plausibly satisfies E(ui|Xi) = 0 (so estimation proceeds using OLS); or – instrumental variable(s) which plausibly satisfy E (ui |Zi ) = 0 (so estimation proceeds using TSLS)
• Quasi-experiments also have threats to internal validity
41

Homeworks
For the diffs-in-diffs, what is the assumption for pre-treatment periods for treatment and control group, so that DID estimator is likely to be unbiased? How do we test if the underlying assumptions are satisfied? Write down a short paragraph.
42