ECONOMETRICS I ECON GR5411
Lectures 24 – Panel Data III and Big Data (Ridge Regression)
by
Seyhan Erden Columbia University MA in Economics
The Fixed Effects Covariance Matrix:
𝑉𝑎𝑟!” < 𝑉𝑎𝑟#$$%
&
(%
𝑉𝑎𝑟!"= %𝑋#'Ω(#%𝑋# #$%
& (%& (%& (%
𝑉𝑎𝑟)**+= %𝑋#'𝑋# %𝑋#'Ω(#%𝑋# #$% #$%
%𝑋#'𝑋# #$%
Lecture 24 - GR5411 by Seyhan Erden
2
Asymptotic Distribution of Fixed Effects Estimator:
Assumptions:
•𝑦#, =𝑥' 𝛽+𝑢# +𝜀#, for𝑖=1,...,𝑛and𝑡=1,...,𝑇,𝑇≥2
#,
• The variables 𝑋# , 𝜀# are i.i.d.
•𝐸 𝑥#-𝜀#, =0forall𝑠=1,...,𝑇 •𝑄.=𝐸𝑋̈#'𝑋̈# >0
•𝐸𝜀/ <∞ #,
•𝐸𝑥#, /<∞
Lecture 24 - GR5411 by Seyhan Erden 3
Asymptotic Distribution of Fixed Effects Estimator:
Given the above assumptions we can establish asymptotic normality of 𝛽?0"
where
𝑛𝛽?0" −𝛽 →1 𝑁0,𝑉2 𝑉 2 = 𝑄 .( % Ω . 𝑄 .( %
Ω =𝐸𝑋̈'𝜀̂𝜀̂'𝑋̈ . ####
Lecture 24 - GR5411 by Seyhan Erden 4
Proof:
Since 𝑋! , 𝜀! are i.i.d. across 𝑖 and have finite fourth moments, by WLLN we can write $
𝑛1'𝑋̈!%𝑋̈! →& 𝐸𝑋̈!%𝑋̈! =𝑄' !"#
From the 3rd assumption,
''''
𝐸 𝑋̈!%𝜀! = '𝐸 𝑥̈!)𝜀!( ='𝐸 𝑥!)𝜀!( ("# ("#
− ''𝐸 𝑥!*𝜀!( =0 ("# *"#
Lecture 24 - GR5411 by Seyhan Erden
5
Proof:
Last two assumptions imply that 𝑋̈!%𝜀! has a finite covariance matrix, Ω'. Thus, the assumptions for CLT hold, thus
Then,
1$+ 𝑛'𝑋̈!%𝜀! →𝑁0,Ω'
!"#
& (%& )
𝑛𝛽"!" −𝛽 = &𝑋̈#'𝑋̈# &𝑋̈#'𝜀# →𝑄*(%𝑁0,Ω* =𝑁0,𝑉+ #$% #$%
Lecture 24 - GR5411 by Seyhan Erden 6
Example 1 to discuss:
Consider a model for new capital investment in a particular industry (say, manufacturing) where the cross section observations are at the county level and there are 𝑇 years of data for each country:
𝐿𝑛 𝑖𝑛𝑣𝑒𝑠𝑡!" = 𝜃" + 𝒛!"𝛾 + 𝛿#𝑡𝑎𝑥!" + 𝛿$𝑑𝑖𝑠𝑎𝑠𝑡𝑒𝑟!" + 𝑐! + 𝑢!"
where 𝑡𝑎𝑥!" is a measure of marginal tax rate on capital in country 𝑖 at time 𝑡, and 𝑑𝑖𝑠𝑎𝑠𝑡𝑒𝑟!" is a dummy indicator for a significant natural disaster (for example, a major flood, a hurricane, or an earthquake), 𝒛!" includes other variables that affect capital investment and 𝜃" represent different time intercepts (to control for time variant but country invariant specifics)
ØWhy is allowing for aggregate time effects in the model is important?
Lecture 24 - GR5411 by Seyhan Erden 7
𝐿𝑛 𝑖𝑛𝑣𝑒𝑠𝑡!" = 𝜃" + 𝒛!"𝛾 + 𝛿#𝑡𝑎𝑥!" + 𝛿$𝑑𝑖𝑠𝑎𝑠𝑡𝑒𝑟!" + 𝑐! + 𝑢!"
ØWhy is allowing for aggregate time effects in the model is important?
ØAnswer: investment is likely to be affected by macroeconomic factors (say, recession), it is important to allow for these by including time intercepts. This is done by including T −
1 dummies (intercepts)
ØWhat kinds of variables are captured by 𝑐!?
ØAnswer: time-invariant but country specific variables, such as economic climate of counties. These are correlated with tax rates. So if we only have cross section data, we need an instrumental variable for 𝑡𝑎𝑥! that is uncorrelated with 𝑐! but
correlated with 𝑡𝑎𝑥! Lecture 24 - GR5411 by Seyhan Erden 8
𝐿𝑛 𝑖𝑛𝑣𝑒𝑠𝑡!" = 𝜃" + 𝒛!"𝛾 + 𝛿#𝑡𝑎𝑥!" + 𝛿$𝑑𝑖𝑠𝑎𝑠𝑡𝑒𝑟!" + 𝑐! + 𝑢!" ØWhat is the expected sign of 𝛿#?
ØAnswer: standard investment theories suggest that, ceteris paribus, larger marginal tax rates decrease investment, so the sign of 𝛿#is expected to be negative.
ØHow would you estimate this model? Assumptions?
ØAnswer: start with FE to allow arbitrary correlation between all time-varying explanatory variables and 𝑐! (or can start from pooled
OLS to compare results with FE) This assumes strict exogeneity of 𝒛!", 𝑡𝑎𝑥!", and 𝑑𝑖𝑠𝑎𝑠𝑡𝑒𝑟!" in the sense that these are uncorrelated with the errors 𝑢!" for all 𝑡 and 𝑠. Must use robust errors as one would expect serial correlation in errors.
Lecture 24 - GR5411 by Seyhan Erden 9
𝐿𝑛 𝑖𝑛𝑣𝑒𝑠𝑡!" = 𝜃" + 𝒛!"𝛾 + 𝛿#𝑡𝑎𝑥!" + 𝛿$𝑑𝑖𝑠𝑎𝑠𝑡𝑒𝑟!" + 𝑐! + 𝑢!"
ØDiscuss whether strict exogeneity is reasonable for the two variables 𝑡𝑎𝑥!", and 𝑑𝑖𝑠𝑎𝑠𝑡𝑒𝑟!" assuming neither of these variables has a lagged effect on capital investment?
ØAnswer: if 𝑡𝑎𝑥!", and 𝑑𝑖𝑠𝑎𝑠𝑡𝑒𝑟!" do not have lagged effects on investment, then the only possible violation of strict exogeneity assumption is if future values of these are correlated with 𝑢!". It seems reasonable not to worry whether future natural disasters are determined by past investment. On the other hand, state officials might look at the levels of past investment in determining future tax policy, especially if there is a target level of tax revenue the officials are trying to achieve. More on next slide....
Lecture 24 - GR5411 by Seyhan Erden 10
ØAnswer (cont’) This could be similar to setting property tax rates: sometimes property tax rates are set depending on recent housing values because a larger base means a smaller rate can achieve the same amount of tax revenue given that we allow 𝑡𝑎𝑥!" to be correlated with 𝑐!, feedback might not be much of a problem. But it cannot be ruled out ahead of time.
Lecture 24 - GR5411 by Seyhan Erden 11
Example 2: panel_country_macro_data.dta
. xtset country year
panel variable: country (unbalanced)
time variable: year, 2010 to 2018, but with a gap
delta: 1 unit
Macro variables for 131 countries from 2010 to 2018 with some gaps for some countries.
Lecture 24 - GR5411 by Seyhan Erden 12
. describe
Contains data from /Users/seyhanerden/Documents/COLUMBIA ECONOMETRICS I (GR5411) MA/Lectures Fall 2020 ONLINE/Lecture 22 _ Fall 2019 _ GR5411 _ Panel DataI /panel_country_macro_data.dta
obs: 1,016
vars: 10 25 Nov 2020 16:53
------------------------------------------------------------------------------ storage display value
variable name type format label variable label ------------------------------------------------------------------------------
CountryName
year
irate
consumption
gdp
country
ln_cons
ln_gdp
ln_irate
region ------------------------------------------------------------------------------ Sorted by: country year
str30 %30s
float %9.0g
double %10.0g
double %10.0g
double %10.0g
long
float
float
float
long
%30.0g
%9.0g
%9.0g
%9.0g
%12.0g
country
region
Country Name
Deposit interest rate
Consumption (Billions 2000 US$)
GDP (Billions 2000 US$)
Country Name
Log of consumption
Log of gdp
Log of irate
Regional groups
Lecture 24 - GR5411 by Seyhan Erden 13
Example 2, Random Effects:
. xtreg ln_cons ln_gdp ln_irate, re Random-effects GLS regression
Group variable: country
R-sq:
within =
between =
overall =
Number of obs = 1,016
Number of groups = 131
Obs per group:
0.8033
0.9859
0.9847
min = 1 avg = 7.8 max = 9
= 13277.81
corr(u_i, X) ------------------------------------------------------------------------------
ln_cons | Coef. Std. Err. z P>|z| [95% Conf. Interval] ————-+—————————————————————- ln_gdp| .958856 .0084128 113.98 0.000 .9423672 .9753449 ln_irate | -.0039294 .0041147 -0.95 0.340 -.011994 .0041352 _cons| .760708 .2065915 3.68 0.000 .3557961 1.16562 ————-+—————————————————————-
sigma_u | .2339765
sigma_e | .05205235
rho | .95284182 (fraction of variance due to u_i)
——————————————————————————
= 0 (assumed)
Prob > chi2
= 0.0000
Wald chi2(2)
Lecture 24 – GR5411 by Seyhan Erden 14
Example 2, interpretation :
ØThe Wald chi2(df) statistic is equivalent of the F and regards the overall performance.
ØThe three different R-sq statistics represent the variability of y explained by its predicted values. There are three possible measures of y:
Ø𝑦#, overall
Ø𝑦2# between Ø𝑦#,−𝑦2# within
Øcorr(u_i,X)refers to the correlation between the time invariant component 𝑢# and the regressors. For the random effects, we must assume this is zero.
Øsigma_u=𝜎-,sigma_e=𝜎.,rho=𝜎%$ 𝜎&$+𝜎%$ ‘#
Lecture 24 – GR5411 by Seyhan Erden 15
Example 2, LM Test (RE vs. Pooled OLS):
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects ln_cons[country,t] = Xb + u[country] + e[country,t] Estimated results:
Test:
Var(u) = 0
| Var sd = sqrt(Var) ———+—————————–
ln_cons | e|
4.027035 2.006747
.0027094 .0520524 u| .054745 .2339765
chibar2(01) = 3108.85
Prob > chibar2 = 0.0000 → Reject 𝑯𝟎
𝐻!: 𝜎”# = 0 meaning there is no variation across panels, so no panel (country) effect.
If you reject 𝐻!, then there is panel (country) effect so you should use random effects.
Lecture 24 – GR5411 by Seyhan Erden 16
Example 2, Fixed Effects: 𝑐𝑜𝑣 𝑢D , 𝑋D ≠ 0 The regressors are correlated with the unobserved time-invariant
component 𝑢#. In the random effects model this was assumed to be zero. Still need to assume strict exogeneity:
𝐸 𝜀#,|𝑥#,%,…,𝑥#,/,𝑢# =0
(so there cannot be any lagged dependent variables on the right hand side). The model is
𝐿𝑛 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛#, = 𝛽0 + 𝛽%𝐿𝑛 𝑔𝑑𝑝#, + 𝛽1𝐿𝑛 𝑖𝑟𝑎𝑡𝑒#, + 𝑢# + 𝜀#,
It is difficult to maintain, for a particular model, that the unobserved component 𝑢# is independent of all regressors.
Under fixed effects model, we find a way to omit the unobserved component 𝑢#
Lecture 24 – GR5411 by Seyhan Erden 17
Example 2, Fixed Effects: 𝑐𝑜𝑣 𝑢D , 𝑋D ≠ 0 𝑦!” =𝛽# +𝛽$𝑥!”$ +⋯+𝛽%𝑥!”% +𝑢! +𝜀!”
ØIf we take the average over T observations of each panel, we obtain 𝑦; = 𝛽 + 𝛽 𝑥 ̅ + ⋯ + 𝛽 𝑥 ̅ + 𝑢 + 𝜀 ̅
where
1′
𝑦; ! = 𝑇 ? 𝑦 ! ”
“&$
1′
𝑥̅!( =𝑇?𝑥!”( “&$
! # $!$
%!% ! !
ØWe can construct the following:
𝑦 − 𝑦; = 𝛽 − 𝛽 + 𝛽 𝑥 − 𝑥 ̅ + ⋯ + 𝛽 𝑥 − 𝑥 ̅ + 𝑢 − 𝑢 + 𝜀 − 𝜀 ̅
!” ! # # $ !”$ !$ % !”% !% ! ! !” !
Lecture 24 – GR5411 by Seyhan Erden 18
Example 2, Fixed Effects :
ØWe can then estimate the parameters of interest from the de-meaned model:
𝑦D = 𝛽 𝑥D + ⋯ + 𝛽 𝑥D + 𝜀 ̃ #%#% A#A#
No more 𝑢# in the model.
Lecture 24 – GR5411 by Seyhan Erden 19
Example 2, Fixed Effects with Stata :
. xtreg ln_cons ln_gdp ln_irate, fe Fixed-effects (within) regression Group variable: country
R-sq:
within = 0.8034
between = 0.9858
overall = 0.9845
corr(u_i, Xb) = -0.0175 → estimated corr ——————————————————————————
ln_cons | Coef. Std. Err. t P>|t| [95% Conf. Interval] ————-+—————————————————————- ln_gdp| .958713 .016523 58.02 0.000 .926284 .991142 ln_irate | -.0074047 .0042761 -1.73 0.084 -.0157972 .0009878 _cons| .7750608 .4063998 1.91 0.057 -.0225615 1.572683 ————-+—————————————————————-
sigma_u | .24585324
sigma_e | .05205235
rho | .95709727 (fraction of variance due to u_i) —————————————————————————— F test that all u_i=0: F(130, 883) = 152.63 Prob > F = 0.0000
Lecture 24 – GR5411 by Seyhan Erden
Number of obs = 1,016
Number of groups = 131
Obs per group:
min = 1 avg = 7.8 max = 9
F(2,883)
Prob > F
= 1804.60
= 0.0000
→ 𝑯𝟎:
𝒖𝒊 = 𝟎
time-invariant components are jointly zero
20
Example 2, Fixed Effects vs Random Effects:
. quietly xtreg ln_cons ln_gdp ln_irate, fe
. estimates store eq_fe
. quietly xtreg ln_cons ln_gdp ln_irate, re
. estimates store eq_re
. hausman eq_fe eq_re → consistent estimate goes first efficient estimate goes last —- Coefficients —-
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| eq_fe eq_re Difference S.E. ————-+—————————————————————-
ln_gdp | .958713 .958856 -.000143 .0142209
ln_irate | -.0074047 -.0039294 -.0034753 .0011638
—————————————————————————— b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic → no difference between FE and RE chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 17.25
Prob>chi2 = 0.0002 → reject 𝐻’ so you should go with FE
→ if there is no correlation between 𝑢! and regressors then RE is more efficient, and with RE you will be able to
recover the effect of country specific variables on consumption.
Lecture 24 – GR5411 by Seyhan Erden 21
Example 2, Marginal Effects:
. quietly xtreg ln_cons ln_gdp ln_irate, fe
. margins, dydx(*) → instead of * you can write any variable name
Average marginal effects Number of obs = 1,016 Model VCE : Conventional
Expression : Linear prediction, predict()
dy/dx w.r.t. : ln_gdp ln_irate ——————————————————————————
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval] ————-+—————————————————————- ln_gdp| .958713 .016523 58.02 0.000 .9263284 .9910976 ln_irate | -.0074047 .0042761 -1.73 0.083 -.0157857 .0009763 ——————————————————————————
Since we have a log-log form those coefficients are elasticities.
Lecture 24 – GR5411 by Seyhan Erden 22
Example 2, Interactions :
. xtreg ln_cons ln_gdp i.region#c.ln_gdp ln_irate, fe
Fixed-effects (within) regression Group variable: country
R-sq:
within = 0.8089
between = 0.6329
overall = 0.6392
Number of obs = 986
Number of groups = 116
Obs per group:
corr(u_i, Xb) = -0.3037 ——————————————————————————
ln_cons | Coef. Std. Err. t P>|t| [95% Conf. Interval] ————-+—————————————————————- ln_gdp| 1.003669 .0253091 39.66 0.000 .9539948 1.053344
Top of the output is on the left and the bottom part is on the right side of this slide. This can be done quietly the next slide is going to give marginal effects of log GDP of regions on log Consumption.
F(6,864)
=
609.45
region#|
c.ln_gdp |
America |
Prob > F
=
0.0000
min = 5
avg = 8.5
max = 9
.0481679
.0361194
|
ln_irate | -.0091313 .0048491 -1.88 0.060 -.0186486 .0003861
_cons | 1.087914 .4680887 2.32 0.020 .1691903 2.006638
————-+—————————————————————-
sigma_u | 1.2747145
sigma_e | .05170026
rho | .99835773 (fraction of variance due to u_i)
——————————————————————————
F test that all u_i=0: F(115, 864) = 151.98 Prob > F = 0.0000
-.0129759
.011263
Aust_Ocea~a | .0143233 .1638698 0.09 0.930 -.3073061 .3359528
Europe | -.1306809 .0856627 -1.53 0.127 -.2988122 .0374504
Asia |
-1.65 0.099
-.1075157
-.059629
-2.23 0.026
-.2020555
-.130521
Lecture 24 – GR5411 by Seyhan Erden 23
Example 2, Marginal effects with interactions:
. margins, dydx(ln_gdp) over (region)
Average marginal effects
Model VCE : Conventional
Expression : Linear prediction, predict()
dy/dx w.r.t. : ln_gdp
Number of obs
= 986
over : region
——————————————————————————
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval] ————-+—————————————————————- ln_gdp | Marginal effects are elasticities here because it is a log-log regression!
region |
Africa | 1.003669 .0253091 39.66 0.000 .9540644 1.053274
America | .8961536 .0409304 21.89 0.000 .8159314 .9763758
Asia | .9440403 .0260334 36.26 0.000 .8930157 .9950649
Aust_Ocea~a | 1.017993 .1622033 6.28 0.000 .70008 1.335905
Europe | .8729883 .0837015 10.43 0.000 .7089363 1.03704
——————————————————————————
Lecture 24 – GR5411 by Seyhan Erden 24
Example 2, Marginal effects with marginsplot: . marginsplot
Average Marginal Effects of ln_gdp with 95% CIs
Africa America Asia Aust_Oceania Europe Regional groups
Lecture 24 – GR5411 by Seyhan Erden 25
Effects on Linear Prediction
.6 .8 1 1.2 1.4
Example 2, Predictive Margins with limits on vars:
. quietly xtreg ln_cons ln_gdp i.region##c.ln_irate, re → 2 # (hashtags) are used becauseit is two-way regression and RE because if FE is used main factor will disappear when we do within transformation.
. margins region, at(ln_irate=(-4(2)0)) → predictive margins when interest rate is from -4 to 0 with two increments More interesting to visualize the results NEXT SLIDE
Predictive margins
Number of obs
= 986
: Conventional
: Linear prediction, predict() : ln_irate = -4 : ln_irate = -2 :ln_irate = 0
Model VCE
Expression
1._at
2._at
3._at ——————————————————————————
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval] ————-+—————————————————————-
_at#region |
1#Africa |
1#America |
1#Asia |
1 #|
24.43755 .0723811 337.62 0.000 24.29568 24.57941 24.4076 .0650437 375.25 0.000 24.28012 24.53509 24.1763 .0638297 378.76 0.000 24.0512 24.30141
Lecture 24 – GR5411 by Seyhan Erden 26
Example 2, Predictive Margins with limits on vars:
More interesting to visualize the results NEXT SLIDE
1 #|
Aust_Ocea~a |
1#Europe |
2#Africa |
2#America |
24.35519 .2790555 87.28 0.000 23.80825 24.90213
24.39827 .064778 376.64 0.000 24.27131 24.52524
24.39772 .0545192 447.51 0.000 24.29087 24.50458
24.3948 .0541505 450.50 0.000 24.28867 24.50093
2#Asia | 24.17952 .049138 492.07 0.000 24.08321 24.27583
2 #|
Aust_Ocea~a |
2#Europe |
3#Africa |
3#America |
3#Asia |
3 #|
Aust_Ocea~a |
3#Europe |
24.33569 .195698 124.35 0.000 23.95213 24.71925
24.39851 .0606394 402.35 0.000 24.27966 24.51736
24.3579 .0412137 591.01 0.000 24.27712 24.43868
24.382 .0474905 513.41 0.000 24.28892 24.47508
24.18273 .0403068 599.97 0.000 24.10373 24.26173
24.31619 .1380183 176.18 0.000 24.04568 24.58671
24.39874 .0585828 416.48 0.000 24.28392 24.51356
——————————————————————————
Lecture 24 – GR5411 by Seyhan Erden 27
Predictive Marginal Effect Plot
. marginsplot, noci noci suppresses C.I.
Lines are crossing so there is interaction effect, this is the interaction effect, for some regions the negative relation ship is decreasing faster.
Predictive Margins of region
-4 -2 0 Log of irate
Africa America
Asia Aust_O ceania Europe
Lecture 24 – GR5411 by Seyhan Erden 28
Linear Prediction
24.2 24.3 24.4 24.5
Instrumental Variables under Panel Data:
Take the fixed effects model ‘
𝑦#, =𝑥#,𝛽+𝑢# +𝜀#,
We say 𝑥#, is exogenous for 𝜀#, if 𝐸 𝑥#,𝜀#, = 0, and we say 𝑥#, is endogenous for 𝜀#, if 𝐸 𝑥#,𝜀#, ≠ 0. Under Instrumental variables topic, we discussed several examples of endogeneity. The same issues apply to panel data. The only difference is that under fixed effects model, endogeneity has to the with correlation between 𝑥#, and 𝜀#, and not the correlation between 𝑥#, and 𝑢#. Because the latter is eliminated in fixed effects model.
We can use instrumental variables to cure endogeneity under panel data models as well.
Lecture 24 – GR5411 by Seyhan Erden 29
Instrumental Variables under Panel Data:
Let 𝑧!” be an 𝑙×1 instrumental variables where 𝑙 ≥ 𝑘, as in the cross section case 𝑧!” may contain both included exogenous variables, 𝑤!”, and excluded exogenous variables (instrumental variables) that are exogenous and correlated with endogenous variables.
Let 𝑍! be the stacked instruments for the entity 𝑖, and 𝑍 be the stacked instruments for the full sample.
The dummy variable formulation of the fixed effects model is
𝑦 =𝑥(𝛽+𝑑(𝑢+𝜀 !” !” ! !”
where 𝑑! is 𝑛×1 vector of dummies, one for each entity in the sample.
Lecture 24 – GR5411 by Seyhan Erden 30
Instrumental Variables under Panel Data:
The model in matrix notation for the full sample is
𝑦 = 𝑋𝛽 + 𝐷𝑢 + 𝜀
We can calculate the fixed effects estimator for 𝛽 by least squares estimator. Dummies, 𝐷 should be viewed as included exogenous variable. So it should also be a part of the 1st stage regression in 2SLS.
The 2SLS estimation of 𝛽 using the instruments 𝑍 for 𝑋:
𝛽O$)*) = 𝑋(𝑀+𝑍 𝑍(𝑀+𝑍 ‘#𝑍(𝑀+𝑋 ‘# 𝑋(𝑀+𝑍 𝑍(𝑀+𝑍 ‘#𝑍(𝑀+𝑦
Lecture 24 – GR5411 by Seyhan Erden 31
Instrumental Variables under Panel Data:
The 2SLS estimation of 𝛽 using the instruments 𝑍 for 𝑋:
𝛽O$)*) = 𝑋(𝑀+𝑍 𝑍(𝑀+𝑍 ‘#𝑍(𝑀+𝑋 ‘# 𝑋(𝑀+𝑍 𝑍(𝑀+𝑍 ‘#𝑍(𝑀+𝑦
where 𝑀3 =𝐼−𝐷 𝐷’𝐷 (%𝐷’
Recall that we defined 𝑦̈ = 𝑀3𝑦 and 𝑋̈ = 𝑀3𝑋, now let 𝑍̈ = 𝑀3𝑍. Then the
2SLS estimator can be written as (% 𝛽”1454 = 𝑋̈’𝑍̈ 𝑍̈’𝑍̈ (%𝑍̈’𝑋̈
𝑋̈’𝑍̈ 𝑍̈’𝑍̈ (%𝑍̈’𝑦̈
This conveniently shows that the 2SLS estimator for the fixed effects model can be calculated by applying the standard 2SLS formula to the within-
transformed 𝑦#, , 𝑥#, , and 𝑧#, . The 2SLS residual 𝜀̂ = 𝑦̈ − 𝑋̈ 𝛽”1454
Lecture 24 – GR5411 by Seyhan Erden 32
Instrumental Variables under Panel Data:
This estimator can be obtained using Stata command xtivreg ,fe. Or using the Stata command ivregress after making the within transformations.
So far we focused on one – way fixed effects model. However there is no substantial difference in the two – way fixed effects model:
𝑦 =𝑥(𝛽+𝑢+𝑣+𝜀 !” !” ! ” !”
The easiest way to estimate the two – way model is to add 𝑇 − 1 time dummies to the regression model. We must include these
dummies as regressors and as instruments (both to 1st and 2nd stage regressions)
Lecture 24 – GR5411 by Seyhan Erden 33
Example with Instruments and Panel Data:
Consider the example about Cigarette Demand (cigarettes_extra.dta) for 48 continental U.S. states from 1985 – 1995.
𝑦 =𝑥(𝛽+𝑢+𝑣+𝜀 &’ &’ & ‘ &’
where 𝑦 is the the quantity consumed measured by annual per capita cigarette sales in packs per fiscal year. The regressors 𝑥 includes the price is the average retail cigarette price per pack during the fiscal year, including taxes and income is per capita. Instrumental variables will be the general sales tax is the average tax, in cents per pack, due to the broad – based state sales tax applied to all consumption goods. The cigarette – specific tax is the tax applied to cigarettes only. All prices, income, and taxes used in the regressions are deflated by the Consumer Price Index and thus are in constant (real) dollars.
Lecture 24 – GR5411 by Seyhan Erden 34
FE with IVs
. xtivreg dlpackpc (dlavgprs = drtax drtaxso) dlperinc, fe
Fixed-effects (within) IV regression
Group variable: state_id
R-sq:
within = 0.6526
between = 0.1285
overall = 0.0922
corr(u_i, Xb) = -0.9155
——————————————————————————
dlpackpc | Coef. Std. Err. z P>|z| [95% Conf. Interval]
————-+—————————————————————-
dlavgprs | -1.224156 .1473582 -8.31 0.000 -1.512973 -.9353393
dlperinc | -.3356518 .1397079 -2.40 0.016 -.6094743 -.0618294
_cons | .0471912 .0127962 3.69 0.000 .0221111 .0722713
————-+—————————————————————-
sigma_u | .71738492
sigma_e | .08261503
rho | .98691144 (fraction of variance due to u_i)
——————————————————————————
F test that all u_i=0: F(47,36) = 16.16 Prob > F = 0.0000
——————————————————————————
Instrumented: dlavgprs
Instruments: dlperinc drtax drtaxso
. estimate store fe
Number of obs = 86
Number of groups = 48
Obs per group:
Wald chi2(2)
min = 1
avg = 1.8
max = 2
= 81.07
Prob > chi2
= 0.0000
Lecture 24 – GR5411 by Seyhan Erden 35
RE with IVs
. xtivreg dlpackpc (dlavgprs = drtax drtaxso) dlperinc, re
G2SLS random-effects IV regression
Group variable: state_id
R-sq:
within = 0.6002
between = 0.2682
overall = 0.2436
corr(u_i, X) = 0 (assumed)
——————————————————————————
dlpackpc | Coef. Std. Err. z P>|z| [95% Conf. Interval] ————-+—————————————————————- dlavgprs | -1.039522 .1263077 -8.23 0.000 -1.28708 -.7919632 dlperinc| .0293745 .0237501 1.24 0.216 -.0171749 .075924 _cons | .0620523 .0404391 1.53 0.125 -.0172069 .1413115 ————-+—————————————————————-
sigma_u | .24231051
sigma_e | .08261503
rho | .89586081 (fraction of variance due to u_i)
——————————————————————————
Instrumented: dlavgprs
Instruments: dlperinc drtax drtaxso
——————————————————————————
. estimates store re
Number of obs = 86
Number of groups = 48
Obs per group:
Wald chi2(2)
min = 1
avg = 1.8
max = 2
= 70.16
Prob > chi2
= 0.0000
Lecture 24 – GR5411 by Seyhan Erden 36
Hausman test
. estimates store re
. hausman fe re
—- Coefficients —-
| (b) (B)
| fe re
————-+—————————————————————-
dlavgprs | -1.224156 -1.039522 -.1846342 .0758999
dlperinc | -.3356518 .0293745 -.3650264 .1376744
——————————————————————————
b = consistent under Ho and Ha; obtained from xtivreg
B = inconsistent under Ha, efficient under Ho; obtained from xtivreg
Test: Ho: difference in coefficients not systematic
chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 9.74
Prob>chi2 = 0.0077 → reject 𝐻” so you should go with FE
Lecture 24 – GR5411 by Seyhan Erden 37
(b-B)
Difference S.E.
sqrt(diag(V_b-V_B))
Big Data and Ridge Outline: 1.What is “big data”?
2.Prediction with many predictors: the MSPE, OLS and the principle of shrinkage
3.Ridge regression
Lecture 24 – GR5411 by Seyhan Erden 38
1. What is “Big Data”?
“Big Data” means many things:
ØData sets with many observations (millions)
ØData sets with many variables (thousands or more)
ØData sets with nonstandard data types, like texts, voice, or images
Lecture 24 – GR5411 by Seyhan Erden 39
“Big Data” has many different applications:
ØPrediction using many predictors
ØGiven your browsing history, what products are
you most likely to shop for now?
ØGiven your loan application profile, how likely are you to repay a bank loan?
ØPrediction using highly nonlinear models (for which you need many observations)
ØRecognition problems, like facial and voice recognition
Lecture 24 – GR5411 by Seyhan Erden 40
“Big Data” has different jargon
”Big Data” has different jargon, which makes it seem very different than statistics and econometrics…
Ø“Machine learning” when a computer (machine) uses a large data set to learn (e.g. about your online shopping preferences)
But at its core, machine learning builds on familiar tools of prediction:
ØOne of the major big data application: prediction with many predictors. We treat this as a regression problem, but we need new methods that go beyond OLS.
ØFor prediction, we do not need causal coefficients.
Lecture 24 – GR5411 by Seyhan Erden 41
2. Prediction with many predictors:
The MSPE, OLS, and the principle of shrinkage
The many−predictor problem:
ØThe goal is to provide a good prediction of some outcome 𝑌 given a large numbers of 𝑋’𝑠, when the number of 𝑋’𝑠 (𝑘) is large relative to the number of observations (𝑛) – in fact, maybe 𝑘 > 𝑛!
Lecture 24 – GR5411 by Seyhan Erden 42
The goal is good out-of-sample prediction
ØThe estimation sample is the 𝑛 observations used to estimate the prediction model
ØThe prediction is made using the estimated model, for an out−of−sample (OOS) observations – an observation not in the estimation sample.
Lecture 24 – GR5411 by Seyhan Erden 43
The Predictive Regression Model
The standardized predictive regression model is the linear model, with all the 𝑋’𝑠 normalized (standardized) to have mean of zero and a standard deviation of one, and 𝑌 is deviated from its mean:
𝑌# =𝛽%𝑋%# +𝛽1𝑋1# +⋯+𝛽/𝑋/# +𝑢#
The intercept is excluded because all the variables have mean zero. Let 𝑋∗ ,…,𝑋∗ ,𝑌∗ , 𝑖 = 1,…,𝑛,
%#/## ∗ denotes the data as originally collected, where 𝑋7#
is the 𝑖,8 observation on the 𝑗,8 original regressor.
In matrix form:
𝑦 = 𝑋𝛽 + 𝑢
Lecture 24 – GR5411 by Seyhan Erden 44
The Predictive Regression Model (cont’)
Throughout this lecture, we use standardized X’s, demeaned Y’s and standardized predictive regression model:
𝑌# =𝛽%𝑋%# +𝛽1𝑋1# +⋯+𝛽/𝑋/# +𝑢#
ØWeassume𝐸𝑌|𝑋 =𝛽%𝑋%+⋯+𝛽/𝑋/so 𝐸𝑢#|𝑋# =0
ØBecause all the variables, including 𝑌, are deviated from their means, the intercept is zero – so is omitted from the model.
ØThe model allows for linearities in 𝑋 by letting some of the 𝑋’𝑠 be squares, cubes, logs, interactions, etc.
Lecture 24 – GR5411 by Seyhan Erden 45
The Mean Squared Prediction Error
The Mean Squared Prediction Error (MSPE) is the
expected value of the squared error made by predicting
Y for an observation not in the estimation data set:
1
where:
Ø𝑌 is the variable to be predicted
Ø𝑋 denotes the 𝑘 variables used to make prediction,
𝑋$$%, 𝑌$$% are the values of 𝑋 and 𝑌 in the out-of-sample data set.
ØThe prediction 𝑌% 𝑋$$% uses a model estimated using the estimation data set, evaluated at 𝑋$$%.
ØThe MSPE measures the expected quality of the
prediction made for an out-of-sample observation.
𝑀 𝑆 𝑃 𝐸 = 𝐸 𝑌 9 9 : − 𝑌U 𝑋 9 9 :
Lecture 24 – GR5411 by Seyhan Erden 46
The Mean Squared Prediction Error
1 𝑌U 𝑋 9 9 : = 𝛽” % 𝑋 %9 9 : + ⋯ + 𝛽” / 𝑋 /9 9 :
𝑀 𝑆 𝑃 𝐸 = 𝐸 𝑌 9 9 : − 𝑌U 𝑋 9 9 :
where
The prediction error is
𝑌 $ $ % − 𝛽’ & 𝑋 &$ $ % + ⋯ + 𝛽’ ‘ 𝑋 ‘$ $ %
= 𝛽&𝑋&$$% + ⋯ + 𝛽’𝑋’$$% + 𝑢$$% − 𝛽’&𝑋&$$% + ⋯ + 𝛽”𝑋’$$%
=𝑢$$%− 𝛽’&−𝛽&𝑋&$$%+⋯+𝛽”−𝛽’𝑋’$$%
Then, 𝑀𝑆𝑃𝐸 = 𝐸 𝑌99: − 𝑌U 𝑋99: 1is …… next slide
Lecture 24 – GR5411 by Seyhan Erden 47
The Mean Squared Prediction Error
( =𝜎)(+𝐸 𝛽’&−𝛽& 𝑋&$$%+⋯+ 𝛽”−𝛽’ 𝑋’$$% (
𝑀𝑆𝑃𝐸=𝐸 𝑢$$%− 𝛽’&−𝛽& 𝑋&$$%+⋯+ 𝛽”−𝛽’ 𝑋’$$%
The first term, 𝜎)(, is the variance of the oracle prediction error – the prediction error made using the true (unkown) conditional mean, 𝐸 𝑌|𝑋 . The 1st term 𝜎)( is the MSPE of the oracle forecast – can’t be beaten!
The second term is the contribution to the prediction error arising from the estimated regression coefficients. The second term represents the cost, measured in terms of increased mean squared prediction error, of needing to estimate the coefficients instead of using the oracle prediction. The 2nd term arises because the 𝛽’s aren’t known, so must be estimated using estimation sample.
Lecture 24 – GR5411 by Seyhan Erden 48
49
The Oracle Prediction
The oracle prediction is the best-possible prediction – the prediction that minimizes the MSPE – if you knew the joint distribution of 𝑌 and 𝑋.
The oracle prediction is the conditional expectation of 𝑌 given 𝑋, 𝐸 𝑌$$%|𝑋 = 𝑋$$%
ØSuppose not. Then the forecast error could be predicted using 𝑋##$ – if so, the forecast couldn’t have been the best possible, because it could be improved using the predicted error.
ØThe math: Suppose that your prediction of 𝑌, given the random variable 𝑋 is 𝑔(𝑋). The prediction error is 𝑌 − 𝑔(𝑋) and the quadratic loss associated with
this prediction is defined as 𝐿𝑜𝑠𝑠 = 𝐸 𝑌 − 𝑔(𝑋) % . We must show that, of all possible functions of 𝑔(𝑋), the 𝐿𝑜𝑠𝑠 is minimized by 𝑔 𝑋 = E 𝑌|𝑋 of all possible functions of 𝑔(𝑋), the
𝐿𝑜𝑠𝑠 = 𝐸 𝑌 − 𝑔(𝑋) % is minimized by 𝑔 𝑋 = E 𝑌|𝑋 Lecture 24 – GR5411 by Seyhan Erden
The MSPE for Linear Regression Estimated by OLS:
Let the 𝑘×1 vector 𝑋99: denote the values of the 𝑋’s for the out-of-sample observation (“𝑜𝑜𝑠”) to be predicted. With this notation, the MSPE on the slide above can be written using matrix notation as
𝑀𝑆𝑃𝐸=𝜎-1+𝐸 𝛽”−𝛽’𝑋99:1 (1) where 𝛽” denotes any estimator of 𝛽, not just OLS
estimator.
Under the least squares assumptions for prediction, the out-of-sample observation is assumed to be an i.i.d. draw from the same population as the estimation sample.
Lecture 24 – GR5411 by Seyhan Erden 50