DS-UA 201: Causal Inference: Differences-in-differences
University Center for Data Science
August 8, 2022
Copyright By PowCoder代写 加微信 powcoder
Acknowledgement: Slides including material from DS-US 201 Fall 2021 offered by .
Fixed-effects regression
Last lecture: we talked about regression with grouped data
▶ Grouped data is any data whose units of analysis can be
divided into groups
▶ and unobserved confounding is constant within groups
▶ even if we don’t observe all confounders causal inference is still possible
We can use linear regression to analyze grouped data
▶ We can de-mean outcomes and treatments and then use regression on the de-meaned variables (“within” estimator)
▶ or we can one-hot encode the group indicators and then run a linear regression on the new variables (Fixed effects estimator)
▶ we need the OLS assumptions in both cases
▶ . . . but we can account for correlation of units within groups
▶ . . . and deal with multiple groupings at once
Today: time-series data
Today and next week, we will talk about one specific kind of grouping: time-series data.
▶ This is data in which the same unit, i, was observed over multiple time-periods t
▶ Intuitively, there are two groups in this data ▶ The unit itself
▶ and the time of the observation
▶ We will see that, in this case, the assumption of constant unobserved confounding within unit can be relaxed even further in this setting
▶ FE regression is going to be our tool of choice for this setting as well
pictures/nobel-economics.jpeg
Card and Krueger (1994)
Question: Does increasing the minimum wage reduce employment?
▶ Classical theoretical models predict that wage floors ⇝ people unemployed that would otherwise be employed.
▶ But what does the empirical evidence tell us?
Card and Krueger (1994) analyze a policy change that occurred
in in 1992 raising MW from $4.25 to $5.05.
▶ They survey 410 fast food restaurants in and
neighboring Eastern Pennsylvania.
▶ Compare change in employment in NJ before and after the policy to the change in employment in PA restaurants (which experienced no minimum wage change) before and after NJ’s policy.
▶ Finding: The minimum wage increase didn’t decrease employment (in fact, there was a slight but statistically not significant increase).
Card and Krueger: Research design
Key Idea: not only do Card and Krueger compare “treated” (NJ) restaurants to “control” (PA) restaurants, but also address concerns that there is something about NJ restaurants that affects both the MW change, and employment.
Underlying assumption
Here the key assumption is that whatever was special about NJ did not change in the period before and after the policy was implemented.
▶ In other words, if there was truly no effect of the MW change, the difference in employment before and after the MW policy should be the same in the two states.
▶ Alternatively, if there was something special about the NJ restaurants, we should see those districts have a different employment rate before the policy was implemented.
Difference-in-differences
Two groups (treated/control); two time periods (0, 1).
▶ Di = 1: treated in time 1; Di = 0: control in time 1. All units under control in time 0. Can also think in terms of a treatment indicator for each time period: Di1 = Di, Di0 = 0
▶ Observe two outcomes for each unit i: Yi1: outcome in period 1, Yi0: outcome in period 0.
Potential outcomes:
Yi1(d)=Yi1 ifDi =d
Treatment in time 1 has no effect on the outcome in time 0
(everyone is under control in time 0)
Yi0(1) = Yi0(0) = Yi0.
Types of datasets for DiD
The most typical dataset used for DiD is one with repeated observations of the same unit
▶ Known in economics as “panel” data
▶ We will work with this throughout the rest of this lecture
Alternatively, DiD works also for repeated cross-sections sampled from treated/untreated units.
▶ Here we have different samples of units in the two time periods
▶ Inferences still valid as long as the samples are from the same population
▶ Example: Two different samples of restaurants from NJ pre-and post NJ minimum wage change, Two samples from PA during the same period
Identifying assumptions
Causal effect of interest is the Average Treatment Effect on the Treated (ATT) in time 1:
τ1 = E[Yi1(1)|Di = 1] − E[Yi1(0)|Di = 1]
The first part we can estimate directly from the data (observed
outcome among the treated group)
τ1 = E[Yi1|Di = 1] − E[Yi1(0)|Di = 1]
The second part we don’t observe directly and need additional assumptions to identify from the observed data.
Identifying assumptions
Remember the selection bias formula for the ATT: τ1 ={E[Yi1|Di =1]−E[Yi1|Di =0]}−
Difference-in-means in time 1 {E[Yi1(0)|Di = 1] − E[Yi1(0)|Di = 0]}
Selection bias
▶ Can we estimate the selection bias? Key Assumption: Parallel trends
The selection bias in time 1 (difference in Yi1(0) between treated and control) is the same as the selection bias in time 0 (difference in Yi0(0) between treated and control).
Identifying assumptions
Our identifying assumption lets us write
E[Yi1(0)|Di = 1] − E[Yi1(0)|Di = 0] =
E[Yi0(0)|Di = 1] − E[Yi0(0)|Di = 0] Since Yi0(0) = Yi0(1) (no effect of future on past)
E[Yi1(0)|Di = 1] − E[Yi1(0)|Di = 0] = E[Yi0(1)|Di = 1] − E[Yi0(0)|Di = 0]
Then consistency yields:
E[Yi1(1)|Di = 1] − E[Yi1(0)|Di = 0] = E[Yi0|Di = 1] − E[Yi0|Di = 0]
Identifying assumptions
Substituting back into the ATT formula yields the differences-in-differences formula
τ1 ={E[Yi1|Di =1]−E[Yi1|Di =0]}−
Difference-in-means in time 1 {E[Yi0|Di = 1] − E[Yi0|Di = 0]}
Difference-in-means in time 0
▶ Each of these can be estimated non-parametrically using the conditional sample means.
▶ But: the Neyman variance won’t work. We will need a special type of bootstrap.
Visualizing DiD
Identifying assumptions
The key identifying assumption of DiD is often referred to as the “parallel trends” assumption.
▶ We cannot assume treatment is ignorable with respect to the potential outcomes at time 1.
▶ We instead assume that the trend in the potential outcomes under control from time 0 to time 1 in the treated group is the same as the observed trend in the control group.
Parallel trends assumption:
E[Yi1(0)|Di = 1] − E[Yi0(1)|Di = 1] =
Trend in control counterfactual for treated
E[Yi1(0)|Di = 0] − E[Yi0(0)|Di = 0]
Trend in control counterfactual for control
Identifying assumptions
Another way of phrasing the parallel trends assumption is that the trends are independent of treatment assignment (but not the levels):
{Yi1(0) − Yi0(0)} Di But it is not true that:
Parallel trends is equivalent to saying that confounding is constant
over time.
▶ So parallel trends is a special case of the more general assumption we made for grouped data
Yi1(0) Di , and Yi0(0)
Estimation
In the most general setting, we can just estimate the four different means used in DiD using the sample averages:
▶Eˆ[Yi1|Di=1]= 1
N1,1 i:ti=1
▶Eˆ[Yi0|Di=1]= 1
N1,0 i:ti=0
▶ Eˆ[Yi1|Di =0]= 1
N0,1 i:ti=1
▶ Eˆ[Yi0|Di =0]= 1
N0,0 i:ti=0
and then take the difference of the differences to estimate τ1:
τˆ1 = Eˆ[Yi1|Di = 1] − Eˆ[Yi1|Di = 0] −Eˆ[Yi0|Di =1]−Eˆ[Yi0|Di =0].
Estimation
If the data consists of repeated observations of the same unit, then we an use a simple difference-in-means estimator on the differenced outcomes:
(Yi1 −Yi0)Di − N (Yi1 −Yi0)(1−Di)
c i=1 No parametric assumptions required!
▶ We did not have to assume that we know the functional form of our outcome!
The SE of this estimator can be estimated with the Neyman variance estimator applied to the differenced outcomes.
Connection to fixed-effects estimators
Suppose our dataset is organized where each row is a unit/time period, it (just like we just did).
We can write a model with “two-way fixed effects”: Yit =γi +δt +τDit +εit
▶ Estimating this regression and obtaining τˆ is mathematically equivalent to the nonparametric DiD estimator in the two-period/two-treatment case.
▶ This means that this specific regression is valid even without the OLS assumption as it’s just a shortcut to another estimator.
▶ More complicated when we have many time periods and treatment initiation at different times (additional hidden assumptions to estimating the “two-way fixed effects” model)
Standard Errors
In the two-way FE model, we have correlated errors (Cov(εi1, εi0) ̸= 0)
▶ This is because the same unit appears at multiple times
▶ It is unrealistic to believe that there will be no error
correlation within the same unit! Two Solutions:
▶ “Cluster-robust” standard errors: the estimator we saw during our last lecture.
▶ Block-bootstrap (bootstrapping but resampling all observations within a cluster rather than just it rows)
The block bootstrap
The block bootstrap is a version of the bootstrap for grouped data.
▶ The key is that we sample whole groups instead of individual observations
Algorithm: For b = 1, . . . , B bootstrap iterations:
1. From the n units, randomly sample n with replacement
2. For each unit sampled, store both Yi1 and Yi0, and Di.
3. On the bootstrapped data, estimate τˆ(b) using either a 1
difference in de-meaned outcomes or a two-way FE regression
4. Store τˆ(b)
The standard deviation of the vector (τˆ(1), . . . , τˆ(B)) will be a 11
consistent estimator of the standard error of τˆ1.
Block bootstrap in R (example from )
unit group D Y 1 A 0 0.5 2 A 1 0.06 3 B 1 -0.14 4 B 0 -1.3
tau boot = rep(NA, B) for (b in 1:B){
lookup <− split (1:nrow(data) , dat$group)
gnames <− names(lookup)
star <− sample(gnames, size = length(gnames), replace
= TRUE) head(lookup[star], n = 2)
dat.star <− dat[unlist(lookup[star]), ]
tau boot[b]=lm robust(Y ̃D+group, data=dat) }
1 2 3 4 5 6 7
9 10 11 12
The Card/ Wage Study
Question: Does raising the minimum wage raise unemployment? Data:
▶ Fast food restaurant in PA and NJ before and after the MW policy
▶ treatment: restaurant is treated if is in NJ after the policy ▶ outcome: number of full time employees
Analyzing the Card/Krueger data
### Analyze the Card− Wage study library(tidyverse)
library (haven)
library(estimatr)
## Read in the data (missing data denoted with a .)
minwage <− read table2(”public.tab”, na= ”.”)
## Drop units with available wage + employment data in wave 1 and 2
minwage <− subset(minwage, !is.na(WAGE ST)&!is.na(WAGE ST2)&! is .na(EMPFT)&! is .na(EMPFT2)&! is .na(EMPPT)&! is
. na (EMPPT2) )
## State = 1: ( treated ) , State = 0: Pennsylvania ( control )
## Outcome i s FT employment
1 2 3 4 5 6 7 8 9
Analyzing the Card/Krueger data
## State = 1: ( treated ) , State = 0: Pennsylvania ( control )
Outcome i s FT employment Naive difference −in−means
robust (EMPFT2 ̃ STATE, data=minwage)
### But NJ and PA differ − we want to look at the change
minwage$CHGEMPFT <− minwage$EMPFT2 − minwage$EMPFT
## Get the DiD estimate
lm robust (CHGEMPFT ̃ STATE, data=minwage)
2 3 4 5 6 7
Method Naive Estimate 0.23 SE 1.16
95% CI -2.05, 2.5
-0.47, 6.32
▶ Assessing DiD assumptions + what happens with multiple time periods.
▶ Instrumental variables - what happens when treatment is confounded but we have a variable that’s as-good-as randomized and could only affect the outcome through the treatment?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com