CS代写 UA 201: Causal Inference: Regression and grouping

DS-UA 201: Causal Inference: Regression and grouping

University Center for Data Science
August 8, 2022

Copyright By PowCoder代写 加微信 powcoder

Acknowledgement: Slides including material from DS-US 201 Fall 2021 offered by .

So far we have studied selection-on-observables designs. These are settings in which:
▶ Yi(d) Di|Xi …
▶ …and Xi is observed in full.
Starting with today we will be looking at settings in which Xi is not fully observed
▶ We have seen that we often incur in omitted variable bias in such settings
▶ However we will see that if the unobserved confounders follow certain conditions then causal inference is still possible!

What types of unobservables?
Today and in the next few lectures we will see that causal inference is still possible if there is unobserved confounding but:
▶ Unobservables are constant within groups. . .
▶ Unobservables are constant over time. . .
▶ The relationship between unobservables and the treatment assignment can be accounted for with some other variable . . .
▶ Treatment is assigned across an arbitrary threshold.
Today: Observations are grouped in such a way that unobserved
confounders are constant within groups.
▶ and how to use regression to estimate the ATE in this setting.

Recall: Regression
Last lecture: we talked about linear regression: Yi =α+Xiβ+Diτ+εi
▶ We saw how linear regression can be used as an imputation estimator for the ATE
▶ We also saw that the coefficient on Di can be interpreted as the ATE under certain conditions . . .
▶ . . . and what to do when we don’t think those conditions are met
▶ In all cases: OLS requires some strong assumptions!

Grouped observations
Suppose we are in this setting:
▶ We have our usual n units
▶ But they are grouped into G groups: S1,…,SG. ▶ There is an unobserved confounder, U
Key Assumption
Assume that U is constant within groups, that is, if i , j ∈ Sg , thenUi =Uj =Ug.

Grouped observations
We do not observe U
▶ We cannot condition on U explicitly
▶ But if we do not account for U then we will have OVB
Can we exploit the fact that U is constant within groups to solve
this problem?
▶ Intuitively…

What kinds of groups could we have?
There are many examples of grouped data that exist in real-world research:
▶ The same unit is observed multiple times over time
▶ Households, cities, villages, neighborhoods . . . (geographic
▶ Days, weeks, years . . . (temporal groups)
▶ Social networks, clusters . . .

Stratified vs. Grouped data
In this class we will use the word “grouped” to denote data such that:
▶ The units can be divided into mutually exclusive groups ▶ Unobserved confounders are constant within groups
What are the differences between grouped and stratified data?
▶ Differences are almost exclusively conceptual
▶ In practice, methods for stratified data will work on grouped data
▶ The distinction is still useful to design and understand a study

Stratified vs. Grouped data
Stratified vs. grouped
In stratified data we observe the covariates that we stratify on, whereas in grouped data we do not.
▶ In the first case, we create strata so that observed confounders are constant within each strata
▶ In the second case, we assume that confounders are constant within a group
▶ In stratified data we know the value of the observed confounder that is associated with a stratum
▶ In grouped data we do not know the value of the confounder

Grouped observations and linear models
Today we make an additional assumption, that our outcome model is linear:
Yi =α+Diβ+Ui +εi,
▶ This enables us to estimate the ATE with OLS
with E[εi] = 0.
▶ Grouped observations can also be used without this
assumption, but other estimators are needed.
If we use our assumption that Ui = Ug for all i ∈ Sg:
Yi =α+Diβ+􏰀⊮(i ∈Sg)Ug +εi g=1
Can we use this to get rid of U somehow?

“Within” estimator
What happens if we average all the outcomes within one group? Y ̄ g = α + β D ̄ g + U g + ε ̄ g ,
where: D ̄g = 1 􏰃 Di, and ε ̄g = 1 􏰃 εi, Ng i∈ i∈Sg
If we de-mean the units in group g with Y ̄g we get …
Y ̃ i = Y i − Y ̄ g
=α+βDi +Ug −εi −α−βD ̄g −Ug −ε ̄g
= β ( D i − D ̄ g ) + ( ε i − ε ̄ g )
= β D ̃ + ε ̃ . ii
▶ Ug is gone!
▶ E[εi −ε ̄]=0byassumption ▶ This is still a linear regression!

“Within” estimator: summary
To fit a within estimator to grouped data you must:
1. Compute the mean outcome Y ̄g and mean treatment D ̄g in each group g
2. For each unit, subtract the respective group means from outcome and treatment:
▶ Y ̃ i = Y i − Y ̄ g ▶ D ̃ i = D i − D ̄ g
3. estimate the regression Y ̃ = α + βD ̃ + ε ̃ iii
The resulting coefficient on D ̃, βˆ will be a consistent estimate of the ATE.

Fixed-effects regression
There is an alternative way to formulate and implement regression for grouped data
▶ Results are going to be eqivalent to the “within” estimator ▶ Important to know because the two are used interchangeably
both in science and industry
For each unit, create a new set of G binary variables:
Wi1,…,WiG suchthat:Wig =⊮(i∈Sg).
▶ For each unit, only one of these variables is 1 and all others
Then we estimate the model:
Yi =α+βDi +􏰀λgWig +εi

Fixed-effects estimator
The fixed-effects model looks like this:
Yi =α+βDi +􏰀λgWig +εi
▶ Essentially fitting a different intercept for each unit i. Yi =βDi +λgi +εi
▶ Again, this is equivalent to the “within” estimator.
▶ Coefficients on each dummy variable (λg ) are often referred to as “fixed-effects”

“Within” estimator: summary
To fit a fixed effect estimator to grouped data you must: 1. Create G binary variables for each unit, such that:
Wig =⊮(i∈Sg).
2. Estimatetheregression: Yi =α+βDi +􏰃Gg=1λgWig +εi
The resulting coefficient on D, βˆ will be a consistent estimate of the ATE.

Treatment effect heterogeneity
We’ve assumed so far that the TE is constant: β. What if we have heterogeneous effects βi ?
Recall: lastlecturewehaveseenthatifYi =α+βiDi +εi,then βˆ→p E[wiβi],
and this is not the ATE!
This problem still persist in fixed-effects regression. i.e., if
Yi =α+βiDi +λg +εi,thenβˆwillstillnotbeconsistentforthe ATE.
▶ Intuitively, this happens because Pr(Di = 1) is different for units in different groups!

Treatment effect heterogeneity
Solution: Adapt the Lin estimator to this setting (Gibbons, Serrato, and Urbancic, 2019).
▶ We create the G variables denoting group membership ▶ We compute the mean of each Wig across all data:
covariates:
▶ We de-mean each unit’s binary variable like we did with
W ̃ig =Wig −W ̃g. GG
Then we fit the regression:
Y = α + β D + 􏰀 λ W ̃ + 􏰀 ω 􏰄 W ̃ D 􏰅 + ε
i i gig gigii g=1 g=1

Treatment effect heterogeneity
The regression:
Y = α + β D + 􏰀 λ W ̃ + 􏰀 ω 􏰄 W ̃ D 􏰅 + ε
i i gig gigii
▶ It’s the “Lin” estimator with the “dummy” variables as the covariates!
▶ Easily implemented in lm lin() in estimatr.
This model will lead to consistent estimates of the ATE even when individual TEs vary across units.

Relationship with stratification
The regression:
Y = α + β D + 􏰀 λ W ̃ + 􏰀 ω 􏰄 W ̃ D 􏰅 + ε
i i gig gigii
Is in effect the same as applying the stratification estimator to the
strata defined by the groups.
▶ Estimate τˆ(g) by taking the difference-in-means in each group
▶ Aggregate with a weighted average: τˆ = 􏰃G τˆ(g)Ng block g=1 n
▶ βˆ and τˆ will have the same value block

Relationship with stratification
If de-meaned FE regression is the same as the stratification estimator then why use FE regression at all?
▶ FE regression requires stronger assumptions on Y than the stratification estimator!
▶ FE regression can give use precise variance estimates even when outcomes are correlated within groups
▶ In this case the Neyman variance estimator will not be consistent for the ATE variance
▶ FE regression can handle multiple groups at once

Fixed-effects regression variance
For simplicity define: Xi = [1,Di,Wi1,…,WiG]. Then: Yi =Xiγ+εi.
Recall: Estimating the variance of γˆ in OLS requires at least one assumption, that errors in the outcome are uncorrelated across units:
Cov(εi,εj|Xi,Xj) = 0,
However sometimes it makes sense to believe that errors for units
within the same group are correlated: Cov(εi,εj|i,j ∈ Sg) ̸= 0
▶ For example, when groups are social networks and units are individuals
▶ In general reasonable to assume this when there is significant interaction between units

Fixed-effects regression variance
Recall: Estimating the variance of γˆ in OLS requires at least one assumption, that errors in the outcome are uncorrelated across units:
Cov(εi,εj|Xi,Xj) = 0,
Under this assumption, we saw that a consistent estimator for the
variance matrix of γˆ is: 􏰁
VarHC0[γˆ] = (X′X)−1(X′SX)(X′X)−1,  εˆ , . . . , 0 
and εˆ = Y − X γˆ. iii
0, εˆ2,…, 0
S =  . .   ., , . 
0, …, εˆn,

Fixed-effects regression variance
To account for correlation of errors within the same group, we change our assumption to
Cov(εi,εj|Gi ̸= Gj) = 0,
That is errors are uncorrelated only if units are in different groups.
Under this assumption, we can extend the previous variance estimator by defining:
􏰂(Yi−Xiγˆ)(Yj−Xjγˆ) ifGi=Gj
Notethatεˆ =εˆ.
0 otherwise . 2

Fixed-effects regression variance
Then we construct the matrix:
 . Sgrp =  .,
and finally, we can just plug-in Sgrp in the previous variance
Vargrp[γˆ] = (X′X)−1(X′SgrpX)(X′X)−1.
εˆ n , estimator to obtain a cluster-robust estimator:
εˆ 1 , 2 , . . . , . . . ,
▶ This will account for error correlation across units in the same group

Multiple groups
Sometimes multiple groups are present within the same dataset
▶ For example, if the same units are observed at multiple times, one group is the unit and another group is the day of the observation
▶ The same unit is member of multiple groups at once
▶ Mutually-exclusive strata would have only one observation in them!
▶ Simple stratification won’t work!

Many fixed effects
Fixed-effects regression can handle this setting easily.
▶ We simply include as many fixed-effects as we have grouping
variables:
Yi =βDi +􏰀λgWig +􏰀δhVih +εi
GH g=1 h=1
▶ there are two grouping variables g and h
▶ G is the total number of groups defined by g and H is the same for groups defined by H
▶ Wig is1ifi∈Sg andVig is1ifi∈Lh.

Many fixed effects
Fixed-effects regression can handle multiple groups easily . . . …but we have to be careful:
▶ Intuitively, FE regression solves the problem of unique groups having only one member by extrapolating from the observations
▶ Units in different groups contribute to estimates for units in other groups
▶ If we believe there to be error correlation within multiple groups variance estimation is a problem
▶ Many strong modeling assumptions!

The BIMAS program
Dataset of 1027 observations of rice farms in indonesia
▶ Question: does an intensive rice production program increase
▶ Treatment: is the farm taking part in the program?
▶ Outcome: Gross rice output in Kilograms
We will group on the The geographic region of the farm
▶ Some regions inherently are more favorable to rice production ▶ Because of weather, economic conditions…

Data loading and Naive estimator
library(estimatr) library (plm)
data ( RiceFarms )
# Define treatment , outcome and grouping variable
D= ifelse(RiceFarms$bimas==”no”, 0, 1.0) Y = RiceFarms$goutput
S = RiceFarms$region
# Naive estimate
lm robust(Y ̃ D)
1 2 3 4 5 6 7 8 9
▶ Estimate: 770.5
▶ SE: 164.2
▶ 95% CI: [448.2, 1092.8]

“within” estimator
Ytilde = rep(NA, nrow(ChickWeight)) Dtilde = rep(NA, nrow(ChickWeight)) for (chk in unique(S)){
ss = which(S == chk)
Ytilde[ss] = Y[ss] − mean(Y[ss]) Dtilde[ss] = D[ss] − mean(D[ss])
lm robust(Ytilde ̃ Dtilde)
1 2 3 4 5 6 7 8 9
▶ Estimate: 632.2
▶ SE: 167.5
▶ 95% CI: [303.5, 961.0]

FE estimator
# FE estimator
lm robust(Y ̃ D + factor(S))
▶ Estimate: 632.2
▶ SE: 167.5
▶ 95% CI: [303.5, 961.0]

E estimator
# E estimator
lm lin(Y ̃ D, covariates= ̃ S)
▶ Estimate: 539.6
▶ SE: 150.9
▶ 95% CI: [243.4, 835.7]

Allowing for in-group correlations
# Robust variance
lm robust(Y ̃ D + factor(S), clusters = S)
▶ Estimate: 632.2
▶ SE: 258.9
▶ 95% CI: [-81.7, 1346.3]

Multiple groupings
Now we also group by the variety of rice produced by the farm ▶ traditional, high-yield, and mixed rice varieties
# Multiple FEs
L = RiceFarms$varieties
lm robust(Y ̃ D + factor(S) + factor(L))
▶ Estimate: 665.7
▶ SE: 163.3
▶ 95% CI: [345.2, 986.3]

Today we have introduced linear regression for grouped data
▶ When there are unobserved confounders that are constant
within groups, then we can still estimate the ATE
▶ Fixed effects or “within” estimators extend regression to these settings
▶ The Lin estimator still allows for valid OLS with TE heterogeneity
▶ Variance of FE regression can account for correlation inside groups
▶ FE regression can handle multiple groups

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com