MFIN6201 Lecture 7
Panel Data
Leo Liu
April 7, 2021
Outline
• Panel Data: What and Why
• Panel Data with Two Time Periods
• Fixed Effects Regression
• Regression with Time Fixed Effects
• Standard Errors for Fixed Effects Regression
• Application to Drunk Driving and Traffic Safety
2
Panel Data: What and Why
A panel dataset contains observations on multiple entities (individuals, states, companies…), where each entity is observed at two or more points in time. Hypothetical examples:
• Data on 420 California school districts in 1999 and again in 2000, for 840 observations total.
• Data on 50 U.S. states, each state is observed in 3 years, for a total of 150 observations.
• Data on 1000 individuals, in four different months, for 4000 observations total.
3
Notation for panel data
A double subscript distinguishes entities (states) and time periods (years)
i = entity (state), n = number of entities, so i = 1, …, n
t = time period (year), T = number of time periods so t = 1, …, T
Data: Suppose we have 1 regressor. The data are: (Xit,Yit),i = 1,…,n,t = 1,…,T
4
Panel data notation, ctd
Panel data with k regressors:
(X1it,X2it,…,Xkit,Yit),i = 1,…,n,t = 1,…,T
n = number of entities (states) T = number of time periods (years) Some jargon…
• Another term for panel data is longitudinal data
• balanced panel: no missing observations, that is, all variables are observed for all entities (states) and all time periods (years)
5
Why are panel data useful?
With panel data we can control for factors that:
• Vary across entities but do not vary over time
– Could cause omitted variable bias if they are omitted
– Are unobserved or unmeasured and therefore cannot be
included in the regression using multiple regression Here’s the key idea:
If an omitted variable does not change over time, then any changes in Y over time cannot be caused by the omitted variable.
Suppose we study the effects of education on salary, and we want to control for the effects of gender. We are sure that the change in salary over time cannot be driven by the change of gender because gender does not vary over time.
6
Example of a panel data set:
Traffic deaths and alcohol taxes
Observational unit: a year in a U.S. state
• 48 U.S. states, so n = number of entities = 48
• 7 years (1982,…, 1988), so T = number of time periods = 7
• Balanced panel, so total number of observations = 7×48 =
336 Variables:
• Traffic fatality rate ( number of traffic deaths in that state in that year, per 10,000 state residents)
• Tax on a case of beer
• Other (legal driving age, drunk driving laws, etc.)
7
U.S. traffic death data for 1982:
Higher alcohol taxes, more traffic deaths?
Does it make economic sense?
Is this result likely to be driven by confounding factors?
8
Why might there be more traffic deaths?
in states that have higher alcohol taxes
Other factors that determine traffic fatality rate:
• Quality (age) of automobiles
• Quality of roads
• “Culture” around drinking and driving • Density of cars on the road
9
OV could cause omitted variable(OV) bias
Example 1: traffic density. Suppose:
• High traffic density means more traffic deaths
• (Western) states with lower traffic density have lower alcohol taxes
• In sum, OV could cause OVB if it is correlated with X AND Y.
• Then the two conditions for omitted variable bias are satisfied. Specifically, “high taxes” could reflect “high traffic density” (so the OLS coefficient would be biased positively – high taxes, more deaths)
• Panel data allows us eliminate omitted variable bias when the omitted variables are constant over time within a given state.
10
Example 2: Cultural attitudes towards drinking and driving:
1. arguably are a determinant of traffic deaths; and
2. potentially are correlated with the beer tax (why would the state set the high beer tax in the first place?).
• Then the two conditions for omitted variable bias are satisfied. Specifically, “high taxes” could pick up the effect of “cultural attitudes towards drinking” so the OLS coefficient would be biased
• Panel data allows us eliminate omitted variable bias when the omitted variables are constant over time within a given state.
11
Panel Data with Two Time Periods
Consider the panel data model,
FatalityRateit = β0 + β1BeerTaxit + β2Zi + uit
Zi is a factor that does not change over time (e.g. density, culture), at least during the years on which we have data, and it correlates with BeerTax and FatalityRate.
• Suppose Zi is not observed, its omission could result in omitted variable bias, because what you estimate is
FatalityRateit = β0 + β1BeerTaxit + εi,t
whereεi,t =β2Zi +uit,andCov(BeerTaxi,t,εi,t)̸=0 • The effect of Zi can be eliminated using T = 2 years.
12
The key idea
Any change in the fatality rate from 1982 to 1988 cannot be caused by Zi, because Zi (by assumption) does not change between 1982 and 1988.
The math: consider fatality rates in 1988 and 1982:
FatalityRatei1988 = β0 + β1BeerTaxi1988 + β2Zi + ui1988
FatalityRatei1982 = β0 + β1BeerTaxi1982 + β2Zi + ui1982
Suppose E(uit|BeerTaxit,Zi) = 0. Subtracting 1988 – 1982 (that is, calculating the change), eliminates the effect of Zi…
13
FatalityRatei1988 = β0 + β1BeerTaxi1988 + β2Zi + ui1988 FatalityRatei1982 = β0 + β1BeerTaxi1982 + β2Zi + ui1982
So,
FatalityRatei1988 − FatalityRatei1982 = β1(BeerTaxi1988 − BeerTaxi1982) + (ui1988 − ui1982)
• The new error term, ui 1988 − ui 1982 is uncorrelated with either BeerTaxi1988 and BeerTaxi1982.
• This “difference” equation can be estimated by OLS , even though Zi is not observed.
• The omitted variable Zi doesn’t change, so it cannot be a determinant of the change in Y.
• The differences regression does not have an intercept – it was eliminated by the substraction step.
14
Example: Traffic deaths and beer taxes
1982 data:
FatalityRate =2.01 + 0.15BeerTax(n = 48)
(0.15) (0.13)
1988 data:
FatalityRate =1.86 + 0.44BeerTax(n = 48)
(0.11) (0.13) Difference regression (n = 48):
FR1988 − FR1982 = −0.072 − 1.04(BeerTax1988 − BeerTax1982) (0.065) (0.36)
An intercept is included in this differences regression allows for the mean change in FR to be nonzero – more on this later…
15
∆FatalityRate v. ∆BeerTax
Note that the intercept is nearly zero
16
Fixed Effects Regression
What if you have more than 2 time periods (T > 2)?
Yit =β0 +β1Xit +β2Zi +uit,i =1,…,n,T =1,…,T
We can rewrite this in three ways:
1. Take difference among different time periods. like T1 – T2, T2 – T3.
2. “n-1 binary regressor” regression model 3. “Fixed Effects” regression model
Three methods will give identical results, but some are more efficient: 3 > 2 > 1, in terms of computations. For T>2, we normally do not use 1, because it has no advantages over the other two.
Therefore, let’s discuss the other two.
17
Suppose we have n = 3 states: California, Texas, and Massachusetts.
Yit =β0 +β1Xit +β2Zi +uit,i =1,…,n,T =1,…,T Population regression for California (that is, i =CA)
YCA,t = β0 + β1XCA,t + β2ZCA + uCA,t = (β0 + β2ZCA) + β1XCA,t + uCA,t
Or, the regression model becomes
YCA,t = αCA + β1XCA,t + uCA,t
• αCA = β0 + β2ZCA does not change over time
• αCA is the intercept for CA, and β1 is the slope
• The intercept is unique to CA, but the slope is the same in all
states: parallel lines
18
for TX:
or
YTX,t = β0 + β1XTX,t + β2ZTX + uTX,t =(β0 +β2ZTX)+β1XTX,t +uTX,t
YTX,t = αTX + β1XTX,t + uTX,t
where αTX = β0 + β2ZTX . Collecting the lines for all three states:
YCA,t = αCA + β1XCA,t + uCA,t YTX,t = αTX + β1XTX,t + uTX,t
or
YMA,t = αMA + β1XMA,t + uMA,t
Yi,t =αi +β1Xit +uit,i =CA,TX,MA,T =1,…T
19
The regression lines for each state in a picture
Recall that shifts in the intercept can be represented using binary regressors…
20
The regression lines for each state in a picture
In binary regressor form:
Yit = β0 + γCADummyCAi + γTX DummyTXi + β1Xit + uit
where DummyCAi = 1 if state is CA, = 0 otherwise; DummyTXi = 1 if state is TX, = 0 otherwise. Note that we leave out DummyMAi (why?)
21
n-1 Binary regressors
Now, we can formally write down this type of fixed effect regression Yit =β0 +β1Xit +γ2D2i +…+γnDni +uit
whereD2i =1fori=2(state2),elseD2i =0
• First create the binary variables D2i, …, Dni
• Then estimate (1) by OLS
• Inference (hypothesis tests, confidence intervals) is as usual (using heteroskedasticity – robust standard errors)
• This is impractical when n is very large (for example if n = 1000 workers) (this is essentially a degree of freedom problem)
22
The third way: Entity-demeaned OLS regression
(by the way, “demean” means “substract the mean”) The fixed effects regression model:
Yit = β0 + β1Xit + Zi + uit (1) We take the average of both side for each i, it satisfies:
1TT1T
Yit = β0 + β1 Xit + Zi + μit (2)
T t=1 t=1 T t=1 Eqn (1) – Eqn (2)
1TT1T
Yit − Yit =β1(Xit −Xit)+(uit − uit) (3)
T t=1 t=1 T t=1
23
Entity-demeaned OLS regression, ctd.
or
T t=1 t=1 T t=1 Y ̃it =β1X ̃it +u ̃it
1TT1T
Yit − Yit =β1(Xit −Xit)+(uit − uit)
where Y ̃it = Yit − T1 Tt=1 Yit and X ̃it = Xit − T1 Tt=1 Xit • Y ̃it and X ̃it are “entity-demeaned” data
• For i=1 and t = 1982, Yit is the difference between the fatality rate in Alabama in 1982, and its average value in Alabama averaged over all 7 years.
24
Entity-demeaned OLS regression, ctd.
Y ̃it =β1X ̃it +u ̃it
• First construct the entity-demeaned variables Y ̃it and X ̃it
• Then estimate above equation by regressing Y ̃it and X ̃it using
OLS
• This is like the “changes” approach, but instead Yit is
deviated from the state average instead of Yi1.
• Standard errors need to be computed in a way that accounts
for the panel nature of the data set (more later)
• This can be done in a single command in STATA
25
Fixed Effects Regression: Estimation
Three estimation methods:
• “Changes” specification, without an intercept (only works for T = 2) (method 1)
• “n-1 binary regressors” OLS regression (method 2)
• “Entity-demeaned” OLS regression (method 3)
• These three methods produce identical estimates of the regression coefficients, and identical standard errors.
• We already did the “changes” specification (1988 minus 1982) – but this only works for T = 2 years, it is not efficient for T>2 years
• Methods 2 and 3 work for general T
• Method 2 is only practical when n isn’t too big
26
Example: Traffic deaths and beer taxes in STATA First let STATA know you are working with panel data by defining the entity variable (state) and time variable (year):
27
28
Example, ctd. For n = 48, T = 7:
FatalityRate = −0.66BeerTax + Statefixedeffects (0.29)
• Should you report the intercept?
• How many binary regressors would you include to estimate
this using the “binary regressor” method?
• Compare slope, standard error to the estimate for the 1988 v.
1982 ”changes” specification (T = 2, n = 48) (note that this includes an intercept – return to this below):
FR1988 − FR1982 = −0.072−1.04(BeerTax1988 − BeerTax1982) (0.065)(0.36)
29
Regression with Time Fixed Effects
An omitted variable might vary over time but not across states:
• Safer cars (air bags, etc.); changes in national laws
• These produce intercepts that change over time
• Let St denote the combined effect of variables which changes over time but not states (“safer cars, national laws, GDP, unemployment rate etc..”).
• The resulting population regression model is:
Yit = β0 + β1Xit + β2Zi + β3St + uit
30
Time fixed effects only
Yit = β0 + β1Xit + β3St + uit
This model can be recast as having an intercept that varies from one year to the next:
Yi,1982 = β0 + β1Xi,1982 + β3S1982 + ui,1982 = (β0 + β3S1982) + β1Xi,1982 + ui,1982 = λ1982 + β1Xi,1982 + ui,1982
where λ1982 = (β0 + β3S1982). Similarly
Yi,1983 = λ1983 + β1Xi,1983 + ui,1983 where λ1983 = (β0 + β3S1983), etc..
31
Time fixed effects: estimation methods
Let 1, 2, 3 … T denotes time periods (e.g. 1982, 1983, 1984…) and B denotes the binary variables
• “T-1 binary regressor” OLS regression
Yit =β0 +β1Xit +γ2B2t +…+γnBTt +uit
– Create binary variables B2, …, BT
– B2=1ift=year2,=0otherwise
– Regress Y on X, B2, …, BT using OLS – Where’s B1?
• “Year-demeaned” OLS regression
– Deviate Yit , Xit from year (not state) averages – Estimate by OLS using “year-demeaned” data
32
Estimation with both entity and time fixed effects
Yit =αi +λt +β1Xit +uit where αi + λt = β0 + Zi + St
• When T = 2, computing the first difference and including an intercept is equivalent to (gives exactly the same regression as) including entity and time fixed effects.
• When T > 2, there are various equivalent ways to incorporate both entity and time fixed effects:
– entity demeaning and T – 1 time indicators (this is done in the following STATA example)
– time demeaning and n – 1 entity indicators
– T – 1 time indicators and n – 1 entity indicators
– entity and time demeaning
33
34
OLS Estimated Coefficient
̃ Ni=1 Tt=1 XitYit βols= N T X2
i=1 t=1 it
Ni=1 Tt=1 Xit(Xitβ + εit)
= N T X2 i=1 t=1 it
Ni=1 Tt=1 Xitεit =β+ N T X2
i=1 t=1 it
* Without loss of generality, assume zero mean for Xit and Yit.
35
LS Assumptions for Panel Data
Here we consider the case of entity fixed effects. Time fixed effects can simply be included as additional binary regressors. Consider a single X:
Yit =αi +β1Xit +uit
1. E(uit|Xi1,…,Xit,αi) = 0.
2. (Xi1,…,XiT,ui1,…,uiT ), i = 1, …, n, are i.i.d. draws from
their joint distribution.
3. (Xit,uit) have finite fourth moments.
4. There is no perfect multicollinearity (multiple X’s)
Assumptions 3 and 4 are least squares assumptions Assumptions 1 and 2 differ
36
Assumptions
The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects Regression
Under a panel data version of the least squares assumptions, the OLS fixed effects estimator of β1 is normally distributed. However, a new standard error formula needs to be introduced: the “clustered” standard error formula. This new formula is needed because observations for the same entity are not independent (it’s the same entity!), even though observations across entities are independent if entities are drawn by simple random sampling.
37
Independence and autocorrelation in Panel data
• If entities are sampled by simple random sampling, then (ui1, …, uiT ) is independent of (uj1, …, ujT ) for different entities i ̸= j.
• But if the omitted factors comprising uit are serially correlated, then uit is serially correlated.
38
Under the LS assumptions for panel data:
• The OLS fixed effect estimator βˆ1 is unbiased, consistent, and asymptotically normally distributed
• However, the usual OLS standard errors (both homoskedasticity-only and heteroskedasticity-robust) will in general be wrong because they assume that uit is serially uncorrelated.
– This problem is solved by using “clustered” standard errors.
39
Clustered Standard Errors
• Clustered standard errors estimate the variance of βˆ1when the variables are i.i.d. across entities but are potentially autocorrelated within an entity.
• Clustered SEs are easiest to understand if we first consider the simpler problem of estimating the mean of Y using panel data…
40
Clustered SEs for the mean estimated using panel data
Let’s derive the Clustered SEs first using the simplest case Yit =μ+uit,i =1,…,n,t =1,…,T
The estimator of mean μ is Y = 1 ni=1 Tt=1 Yit. nT
It is useful to write Y as the average across entities of the mean value for each entity:
1nT 1n1T 1n
Y = Yit = ( Yit)= Yi
nTi=1t=1 ni=1 Tt=1 ni=1 where Y i = T1 Tt =1 Yit is the sample mean for entity i
41
Because observations are i.i.d. across entities, (Y 1, …, Y n) are i.i.d. Thus, if n is large, the CLT applies and
Y=
1 n n i=1
σ 2 Yi→N(μ, Yi)
n
σ2
• TheSEofY isthesquarerootofanestimatorof Yi .
n
• The natural estimator of σ2 is the sample variance of Y i , S2 .
• This delivers the clustered standard error formula for Y computed using panel data:
YiY
S2 1
Clustered SE of Y = Y , where S2 = ni=1(Yi −Y)2
nYn−1
* If you do not use clustered SE, the pooled standard error:
SY2 = 1 ni=1 Tt=1(Yit − Y )2 which is incorrect…! (why?) nT − 1
42
What’s special about clustered SEs?
• Not much, really – the previous derivation is the same as was used in Ch. 3 to derive the SE of the sample average, except that here the “data” are the i.i.d. entity averages (Yi , . . . , Yn) instead of a single i.i.d. observation for each entity.
• But in fact there is one key feature: in the cluster SE derivation we never assumed that observations are i.i.d. within an entity. Thus we have implicitly allowed for serial correlation within an entity.
• What happened to that serial correlation – where did it go? It
determines S2 , the variance of Y i . . . Yi
43
Serial correlation in Yit enters σ2 ̄ Yi
σ2 =var(Yi) Yi
Yit)= T2(Yi1 +Yi2 +…+YiT)
t=1
= T2 var(Yi1) + var(Yi2) + … + var(YiT )
=var(T 1
1T 1
+2cov(Yi1,Yi2)+2cov(Yi1,Yi3)+2cov(YiT−1,YiT)
• If Yit is serially uncorrelated, all the autocovariances = 0 and we have the usual (Ch. 3) derivation.
• If these autocovariances are nonzero, the usual formula (which sets them to 0) will be wrong.
• If these autocovariances are positive, the usual formula will understate the variance of Y i .
44
The “magic” of clustered SEs is that, by working at the level of the entities and their averages Yi, you never need to worry about estimating any of the underlying autocovariances – they are in effect estimated automatically by the cluster SE formula.
Here’s the math: Clustered SE of Y = S2 /n, where Yi
1 n
S2 = (Y1−Y)2
Y i n − 1 i =1
1 n 1 T
= n−1 (T Yit −Y)2 i=1 t=1
45
=
1 n 1 T Yit − Y ̄ 1 T Yis − Y ̄
n − 1 i=1 T t=1 T s=1 1n1TT ̄ ̄
= Yit−YYis−Y n−1 T2
i=1 t=1s=1
1TT1n ̄ ̄
= Yit−Y Yis−Y T2 n−1
t=1s=1 i=1
• The final term in brackets, 1 ni=1(Yit − Y )(Yis − Y ) n−1
estimates the autocovariance between Yis and Yit. Thus the
clustered SE formula implicitly is estimating all the
autocovariances, then using them to estimate σ2 ! Yi
• In contrast, the “usual” SE formula zeros out these autocovariances by omitting all the cross terms – which is only valid if those autocovariances are all zero.
46
Clustered SEs for the FE estimator in panel data regression
• The idea of clustered SEs in panel data is completely analogous to the case of the panel-data mean above – just a lot messier notation and formulas. See SW Appendix 10.2.
• Clustered SEs for panel data are the logical extension of HR SEs for cross-section. In cross-section regression, HR SEs are valid whether or not there is heteroskedasticity. In panel data regression, clustered SEs are valid whether or not there is heteroskedasticity and/or serial correlation.
• By the way . . . The term “clustered” comes from allowing correlation within a “cluster” of observations (within an entity), but not across clusters.
47
Clustered SEs: Visualization
48
Clustered SEs: Implementation in STATA
49
Application: Drunk Driving Laws and Traffic Deaths
Some facts
• Approx. 40,000 traffic fatalities annually in the U.S.
• 1/3 of traffic fatalities involve a drinking driver
• 25% of drivers on the road between 1am and 3am have been drinking (estimate)
• A drunk driver is 13 times as likely to cause a fatal crash as a non-drinking driver (estimate)
50
The drunk driving panel data set
n = 48 U.S. states, T = 7 years (1982 to 1988) (balanced) Variables
• Traffic fatality rate (deaths per 10,000 residents)
• Tax on a case of beer (Beertax )
• Minimum legal drinking age
• Minimum sentencing laws for first DWI violation: Mandatory Jail Mandatory Community Service otherwise, sentence will just be a monetary fine
• vehicle miles per driver (US DOT)
• State economic data (real per capita income, etc.)
51
Why might panel data help?
• Potential OV bias from variables that vary across states but are constant over time:
– culture of drinking and driving – quality of roads
– vintage of autos on the road
* use state fixed effects
• Potential OV bias from variables that vary over time but are constant across states:
– improvements in auto safety over time
– changing national attitudes towards drunk driving
* use time fixed effects
52
53
54
Empirical Analysis: Main Results
• Sign of the beer tax coefficient changes when fixed state effects are included
• Time effects are statistically significant but including them doesn’t have a big impact on the estimated coefficients
• Estimated effect of beer tax drops when other laws are included.
• The only policy variable that seems to have an impact is the tax on beer – not minimum drinking age, not mandatory sentencing, etc.-however the beer tax is not significant even at the 10% level using clustered SEs in the specifications which control for state economic conditions (unemployment rate, personal income)
55
Empirical results, ctd.
• In particular, the minimum legal drinking age has a small coefficient which is precisely estimated-reducing the MLDA doesn’t seem to have much effect on overall driving fatalities
• What are the threats to internal validity? How about: 1. Omitted variable bias
2. Wrong functional form
3. Errors-in-variables bias
4. Sample selection bias
5. Simultaneous causality bias
What do you think?
56
Summary: Regression with Panel Data
Advantages and limitations of fixed effects regression Advantages
• You can control for unobserved variables that: 1. vary across states but not over time, and/or 2. vary over time but not across states
• More observations give you more information
• Estimation involves relatively straightforward extensions of multiple regression
57
Limitations/challenges
• Need variation in X over time within entities
– Which can be problematic if X of interest is time-invariant or
have little variation across time (multicollinear with fixed effects!)
• You need to use clustered standard errors to guard against the often-plausible possibility uit is autocorrelated
Estimation and interpretation
• fixed effects regression can be done three ways: 1. “changes” method when t = 2
2. “entity-demeaned” regression
3. “n-1 binary regressors” method when n is small
• similar methods apply to regression with time fixed effects and to both time and state fixed effects
• statistical inference: like multiple regression.
58
Practice questions
• Try question 10.1, 10.2, 10.8, 10.11.
• Answers will be provide next week.
• This is not an assessment. They are just for practice.
• I provide a nice paper (Petersen 2008) to explain the clustered standard error in much details, since we only covered the simplest case. However, the reading of the paper is entirely optional.
59