DS-UA 201: Causal Inference: Regression Discontinuity
University Center for Data Science
August 15, 2022
Copyright By PowCoder代写 加微信 powcoder
Acknowledgement: Slides including material from DS-US 201 Fall 2021 offered by .
So far: Identification under unobserved confounding
In the last few weeks we have been focusing on identifying treatment effects under some (structured) forms of unobserved confounding.
▶ DiD: Unobserved confounding is time-invariant
▶ IV: Unobserved confounding affects the treatment but not the
instrument
We have seen how linear regression is a standard tool in these settings.
▶ Sometimes just a short-hand for difference-in-means estimators (works under mild assumptions)
▶ Sometimes an actual outcome model (stronger assumptions needed!)
Today: Regression Discontinuity Designs
Regression Discontinuity Designs (RDD) is another approach to identification when there is unobserved confounding.
▶ Goal: Use knowledge about particular treatment assignment mechanisms to find a subset where treatment is “quasi-randomly” assigned.
▶ Intuition: Exploit the fact that treatment is sometimes assigned based on a cut-off or threshold value of a continuous running variable
▶ Merit aid scholarships awarded based on a test-score cut-off ▶ Geographic boundaries affect policy implementation.
▶ Candidates receiving a plurality of the vote get elected.
Same as last weeks, we allow unobserved confounding, but we restrict its structure with assumptions.
Regression Discontinuity Designs (RDD)
▶ Regression: We’re working with a conditional expectation (regression) function (CEF) of the outcome given a running variable.
▶ Discontinuity: There is a jump in the CEF that is driven by treatment-assignment at some cut-off
▶ Design: The assumptions are informed by some substantive knowledge of a particular situation.
Quasi-randomization at a cut-off
Key intuition: The continuous variable is a confounder, but units that are close to the cut-off are very similar in characteristics except for the fact that some get treatment vs. control.
▶ Not too much of a difference between candidates receiving 49.9% of the vote and candidates receiving 50.1% of the vote. Except one wins and the other loses.
Key assumption: The relationship between the continuous variable and outcome would be continuous but for the discontinuity in treatment assignment.
▶ Units are not able to control their score precisely such that near the discontinuity, the assignment is “as-good-as-random”
Example: Incumbency advantage (Lee, 2008)
How much does being an incumbent boost a candidate’s probability of winning an election?
Problem: Hard to disentangle effect of incumbency over other district characteristics.
▶ Is a Democrat (or Republican) more likely to hold on to a seat because they’re incumbents or because they align with district preferences?
Lee (2008) exploits the fact that U.S. congressional elections are “first-past-the-post” – the plurality vote-getter wins the seat.
▶ Districts where the Democratic candidate won by a tiny margin and ones where they lost by a tiny margin are very similar…except for the fact that the incumbent party is different!
Lee 2008: Research Design
Design: Examine the probability of Democratic victory in an election at time t + 1 conditional on the Democratic margin-of-victory in an election at t.
▶ Only look at those districts where democrats won or lost by a small margin in the previous election.
Placebo test: If there was something particular about those districts where democrats win at t + 1 then it should have been affecting victory in previous times as well.
▶ If there is no discontinuity in previous margin of victory, then there is evidence that there is no confounding.
Example: Incumbency advantage (Lee, 2008)
Example: Incumbency advantage (Lee, 2008)
We will be working with the following setting:
▶ Binary treatment Di ∈ {0, 1}
▶ Potential outcomes Yi (1), Yi (0).
▶ Observed outcomes/consistency:
Yi =Yi(1)Di +Yi(0)(1−Di)
We also have Xi ∈ R, which is a continuous variable that affects
treatment assignment.
▶ Sometimes referred to as the running or forcing variable.
Assumption 1: Treatment assignment
Sharp RD – Treatment assignment is perfectly determined by the value of the forcing variable Xi and the threshold c
Assignment at the discontinuity
0 ifXi
Units with Xi above the cut-off/threshold c get the treatment. Units with Xi below the cut-off get control.
Assumption 2: Continuity in potential outcomes
Assumption 2: The conditional expectation of the potential outcomes given Xi are continuous
Continuity
E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] are continuous in x
In other words:
lim E[Yi(d)|Xi = x] = E[Yi(d)|Xi = c]
As x gets infinitesimally closer to c, we will have: E[Yi(d)|Xi = x] = E[Yi(d)|Xi = c]
Visualizing the regression functions
Identification using limits
Our identification strategy leverages the fact that the limit of the CEF as we approach the discontinuity will be different depending on whether it is from the right vs. left.
τ= lim E[Yi|Xi =x]− lim E[Yi|Xi =x] x→c+ x→c−
▶ The limit from the right (x → c+) identifies E[Yi(1)|Xi = c] (all values of Xi > c are “treated”)
▶ The limit from the left (x → c−) identifies E[Yi(0)|Xi = c] (all values of Xi < c are “control”)
Identification using limits
How do the assumptions get us identification? E[Yi(0)|Xi = c] = lim E[Yi(0)|Xi = x]
= lim E[Yi(0)|Di = 0,Xi = x]
= lim E[Yi|Xi =x]
Same intuition for the limit from the right (x → c+) for identifying E[Yi(1)|Xi = c]
RDD and conditional ignorability
RDD implicitly makes a conditional ignorability assumption. We know:
Pr(Di =1)=⊮(Xi >c),
therefore:
Pr(Di=1|Yi(d),Xi=x)= 0 x
Extrapolation
However, positivity is violated!
▶ Probability of treatment is 0 for units with Xi < c and 1 for
units with Xi > c.
▶ There is no covariate overlap between treated and untreated units, i.e., all treated units have Xi > c and all control units haveXi
Illustration: Lee (2008) Election RDD
## We’ re just using this for
library(rddtools) library(estimatr) library(tidyverse)
## Load the Lee (2008) data
data ( house )
# What does it look like?
# x (vote share at time t−1) # y (vote share at time t)
> head(house)
xy 1 0.1049 0.5810 2 0.1393 0.4611 3 −0.0736 0.5434 4 0.0868 0.5846 5 0.3994 0.5803 6 0.1681 0.6244
the Lee (2008)
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
Illustration: Lee (2008) Election RDD
# The regular scatterplot is really noisy − let ’s bin the points instead
bin scatter <− ggplot(aes(x=x, y=y) , data=house) + stat summary bin(fun.y=’mean’, bins=200,
size=2, geom=’point’) + geom vline(xintercept=0, col=”red”, lty=2) +
xlab (”Democratic vote share margin of victory , Election t”) +
ylab (”Democratic vote share , Election t+1”) + theme bw()
Illustration: Lee (2008) Election RDD
−0.5 0.0 0.5 1.0
Democratic vote share margin of victory, Election t
● ●● ● ●● ● ● ● ● ●
● ●● ● ● ●
●● ● ●●● ●
● ●●● ● ●● ● ●
Democratic vote share, Election t+1
Illustration: Lee (2008) Election RDD
Now change to outcome variable:
▶ X: Democratic margin of victory in time t ▶ Y : Democratic vote share in time t + 1 ▶ D: Victory in time t (margin > 0).
Illustration: Lee (2008) Election RDD
−0.2 −0.1 0.0 0.1 0.2
Democratic vote share margin of victory, Election t
Democratic vote share, Election t+1
Local linear regression
How do we get point estimates for the TE?
▶ Goal: Estimate limx→c+ E[Yi|Xi]
▶ Fit a model to the treated units and get the prediction at the cut-point.
▶ This requires assumptions on the form of Y !
▶ To reduce dependence on getting the correct model, use only
units with Xi close to c (within some “bandwidth” h)
▶ Use a regression to account for how E[Yi|Xi = x] changes with x (even for those “close” units).
Local linear regression
▶ Let μˆ+(x) denote the predicted value from a regression of Yi on Xi fit to observations within the bandwidth above the cut-point (c,c +h].
▶ Let μˆ−(x) denote the predicted value from a regression of Yi on Xi fit to observations within the bandwidth below the cut-point [c −h,c).
▶ Our estimate of the ATE is the difference between the predictions at c
τˆRD =μˆ+(c)−μˆ−(c)
Local linear regression
Can get this (+ valid SEs) from a single regression with interactions:
▶ Subset the data to only observations with Xi within h of the cutpoint c (the “close” observations. Then fit:
Yi =α+τDi +β(Xi −c)+γ(Xi −c)Di +εi
▶ τˆ is our estimate of the RD effect – usual methods for getting SEs.
Illustration: Lee (2008) Election RDD
# generate a treatment indicator
house$d <− as.integer(house$x > 0) # Subset to the close observations
house close <− subset(house, x>−.25&x<.25)
# Fit the regression model w/ interaction
rd reg <− lm robust(y ̃ d + x + d∗x, data=house close)
( Intercept ) 0.4509 0.4399 0.4618 2757
d 0.0827 0.0663 0.0991 2757
x 0.3665 0.2854 0.4476 2757
d:x 0.0760
0.00558 80.82 0.00838 9.87 0.04135 8.86
Estimate Std. Error t value Lower CI Upper DF
Pr(>|t|) CI 0.00e+00 1.31e−22 1.37e−18 2.27e−01
0.06288 1.21 0.1993 2757
1 2 3 4 5 6 7 8 9
Illustration: Lee (2008) Election RDD
#Add it to the plot
# Scatterplot w/ regression
bin scatter close reg <− ggplot(aes(x=x, y=y), data= house close ) +
stat summary bin(fun.y=’mean’, bins=50, size=2, geom=’point’) +
geom vline(xintercept=0, col=”red”, lty=2) +
geom smooth(data=subset(house close ,d==1), formula= y
̃ x, method=”lm robust”) +
geom smooth(data=subset(house close ,d==0), formula= y
̃ x, method=”lm robust”, col=”orange”) + xlab (”Democratic vote share margin of victory ,
Election t”) +
ylab (”Democratic vote share , Election t+1”) + theme bw()
Illustration: Lee (2008) Election RDD
−0.2 −0.1 0.0 0.1 0.2
Democratic vote share margin of victory, Election t
Democratic vote share, Election t+1
Today we have introduced Regression Discontinuity Designs ▶ Assumes that treatment is assigned past a threshold c on a
continuous variable X
▶ Identification at the threshold possible with limits ▶ Inference with local linear regression
▶ Implementing and making RDD plots
▶ Fuzzy RDD (when the forcing variable increases probability of
treatment in a discontinuous way). ▶ Diagnosing RDD assumptions
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com