ECONOMETRICS I ECON GR5411
Lecture 15 – Finishing Restricted Least Squares and Generalized Regression Model
by
Seyhan Erden Columbia University
Hypothesis Testing Example 2:
. reg testscr str el_pct comp_stu
Source | SS df MS ————-+———————————- Model | 66004.0238 3 22001.3413 Residual | 86105.5698 416 206.984543 ————-+———————————- Total | 152109.594 419 363.030056
Number of obs F(3, 416) Prob>F R-squared Adj R-squared Root MSE
= 420
= 106.29
= 0.0000
= 0.4339
= 0.4298
= 14.387
—————————————————————————— testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ————-+—————————————————————- str | -.8489998 .3932246 -2.16 0.031 -1.621955 -.0760449 el_pct | -.6303601 .039997 -15.76 0.000 -.7089814 -.5517387 comp_stu| 27.26961 11.62113 2.35 0.019 4.426158 50.11307 _cons| 677.0642 8.303396 81.54 0.000 660.7424 693.3861 ——————————————————————————
11/5/20 Lecture 13 GR5411 by Seyhan Erden 2
Finite Sample Properties of Restricted LS:
Under the linear model
𝑦 = 𝑋𝛽 + 𝜀
with usual assumptions,
𝐸𝜀|𝑋 =0
𝑉𝑎𝑟𝜀|𝑋 =𝐸𝜀𝜀-|𝑋 =𝜎/𝐼
𝑥2 and 𝑦2 are iid and their 4th moments are finite. 𝐸 𝑋-𝑋 = 𝑄44 is positive definite.
11/5/20 Lecture 15 GR5411 by Seyhan Erden 3
First some useful properties:
Let
𝐴= 𝑋-𝑋 67𝑅 𝑅- 𝑋-𝑋 67𝑅 67𝑅- 𝑋-𝑋 67
Then,
1. 𝑅-𝛽;−𝑟=𝑅- 𝑋-𝑋 67𝑋-𝜀 2.𝛽;>−𝛽= 𝑋-𝑋67𝑋-−𝐴𝑋-
3. 𝜀̂ = 𝐼−𝑃+𝑋𝐴𝑋′ 𝜀 >
4. 𝐼 − 𝑃 + 𝑋𝐴𝑋 is symmetric and idempotent 5. 𝑡𝑟 𝐼−𝑃+𝑋𝐴𝑋 =𝑛−𝑘+𝑗
11/5/20 Lecture 15 GR5411 by Seyhan Erden 4
Proof of these useful properties:
1. 𝑅-𝛽;−𝑟=𝑅- 𝑋-𝑋 67𝑋-𝜀
𝑅-𝛽;−𝑟=𝑅- 𝑋-𝑋 67𝑋-𝑦−𝑟
=𝑅- 𝑋-𝑋 67𝑋- 𝑋𝛽+𝜀 −𝑟
=𝑅′𝛽+𝑅- 𝑋-𝑋 67𝑋-𝜀−𝑟 = 𝑅- 𝑋-𝑋 67𝑋-𝜀
11/5/20 Lecture 15 GR5411 by Seyhan Erden 5
Proof of these useful properties:
2.𝛽;>−𝛽= 𝑋-𝑋67𝑋-−𝐴𝑋-𝜀
𝛽;> =𝛽;− 𝑋-𝑋 67𝑅 𝑅- 𝑋-𝑋 67𝑅 67 𝑅-𝛽;−𝑟
=𝛽+𝑋-𝑋67 -𝜀−𝑋-𝑋67𝑅𝑅-𝑋-𝑋67𝑅67 𝑅-𝛽+𝑋-𝑋67 -𝜀−𝑟
=𝛽+𝑋-𝑋67 -𝜀−𝑋-𝑋67𝑅𝑅-𝑋-𝑋67𝑅67 𝑅-𝛽+𝑅- 𝑋-𝑋 67𝑋-𝜀−𝑟
=𝛽+(𝑋-𝑋67𝑋-− 𝑋-𝑋67 𝑅- 𝑋-𝑋67 67 𝑅- 𝑋-𝑋 67𝑋-)𝜀
=𝛽+ 𝑋-𝑋67𝑋-−𝐴𝑋-𝜀
𝑋
𝑋
11/5/20 Lecture 15 GR5411 by Seyhan Erden 6
𝑅
𝑋
𝑅
Proof of these useful properties:
3. 𝜀̂ = 𝐼−𝑃+𝑋𝐴𝑋′ 𝜀 >
𝜀̂ =𝑦−𝑋𝛽; >>
=𝑦−𝑋𝛽+𝑋-𝑋67-− -𝜀
=𝑋𝛽+𝜀−𝑋𝛽−𝑋 𝑋-𝑋 67𝑋- −𝐴𝑋- 𝜀 =𝜀−𝑋 𝑋-𝑋 67𝑋-𝜀+𝑋𝐴𝑋-𝜀
= 𝐼−𝑃+𝑋𝐴𝑋′𝜀
𝑋
𝐴
𝑋
11/5/20
Lecture 15 GR5411 by Seyhan Erden 7
Proof of these useful properties:
4. 𝐼 − 𝑃 + 𝑋𝐴𝑋′ is symmetric and idempotent 𝐼−𝑃+𝑋𝐴𝑋′ 𝐼−𝑃+𝑋𝐴𝑋′
= 𝐼 − 𝑃 + 𝑋𝐴𝑋′ − 𝑃 + 𝑃 − 𝑃𝑋𝐴𝑋′
+𝑋𝐴𝑋′ − 𝑃𝑋𝐴𝑋′ + 𝑋𝐴𝑋′𝑋𝐴𝑋
= 𝐼 − 𝑃 + 𝑋𝐴𝑋′
Since, 𝑃𝑋𝐴𝑋- = 𝑋 𝑋-𝑋 67𝑋-𝑋𝐴𝑋- = 𝑋𝐴𝑋′ and the last
term:
𝑋𝐴𝑋-𝑋𝐴𝑋 = 𝑋 𝑋-𝑋 67𝑅 𝑅- 𝑋-𝑋 67𝑅 67𝑅- 𝑋-𝑋 67
𝑋-𝑋 𝑋-𝑋 67𝑅 𝑅- 𝑋-𝑋 67𝑅 67𝑅- 𝑋-𝑋 67𝑋-
=𝑋𝑋-𝑋67𝑅 -𝑋-𝑋6 67 –
𝑅
= 𝑋𝐴𝑋-
𝑋-𝑋 6 𝑅- 𝑋-𝑋 67𝑅 67𝑅- 𝑋-𝑋 67𝑋- Lecture 15 GR5411 by Seyhan Erden 8
7
𝑅
7
𝑅
𝑅
Proof of these useful properties:
5. 𝑡𝑟 𝐼−𝑃+𝑋𝐴𝑋′ =𝑛−𝑘+𝑗
𝑡𝑟𝑃 =𝑡𝑟𝑋𝑋-𝑋67𝑋′ =𝑡𝑟 𝑋-𝑋67𝑋-𝑋 =𝑘
𝑡𝑟 𝑋𝐴𝑋-
=𝑡𝑟𝑋𝑋-𝑋67𝑅𝑅- 𝑋-𝑋67𝑅67𝑅- 𝑋-𝑋67𝑋-
=𝑡𝑟 𝑅- 𝑋-𝑋 67𝑅 67𝑅- 𝑋-𝑋 67 𝑋-𝑋 𝑋-𝑋 67𝑅 =𝑡𝑟 𝑅- 𝑋-𝑋67𝑅67𝑅- 𝑋-𝑋67𝑅 =𝑗
Thus,
𝑡𝑟 𝐼−𝑃+𝑋𝐴𝑋′ =𝑛−𝑘+𝑗
11/5/20 Lecture 15 GR5411 by Seyhan Erden 9
Finite Sample Properties of Restricted LS:
From property 2
𝛽;>=𝛽+ 𝑋-𝑋67𝑋-−𝐴𝑋-𝜀
Hence,
𝐸𝛽;>|𝑋=𝐸 𝛽+ 𝑋-𝑋67𝑋-−𝐴𝑋-𝜀|𝑋
=𝛽+𝐸 𝐸 𝑋-𝑋67𝑋-−𝐴𝑋- 𝜀|𝑋
=𝛽+ 𝑋-𝑋67𝑋-−𝐴𝑋-𝐸𝜀|𝑋 =𝛽
Since𝐸𝜀|𝑋 =0
11/5/20 Lecture 15 GR5411 by Seyhan Erden 10
Finite Sample Properties of Restricted LS:
From property 2
𝛽;>−𝛽= 𝑋-𝑋67𝑋-−𝐴𝑋-𝜀 𝑉𝑎𝑟𝛽;>|𝑋 =𝐸 𝛽;>−𝛽 𝛽;>−𝛽-|𝑋
Hence,
=𝐸
= 𝐸
=
𝑋-𝑋 67𝑋- −𝐴𝑋- 𝜀
𝑋-𝑋 67𝑋- −𝐴𝑋- 𝜀𝜀-
𝑋-𝑋 67𝑋- −𝐴𝑋- 𝐸 𝜀𝜀-|𝑋
𝑋-𝑋 67𝑋- −𝐴𝑋- 𝜀 -|𝑋 𝑋-𝑋 67𝑋- −𝐴𝑋- -|𝑋
=𝜎/ 𝑋-𝑋 67 − 𝑋-𝑋 67𝑋-𝑋𝐴−𝐴𝑋-𝑋 𝑋-𝑋 67 +𝐴𝑋-𝑋𝐴 =𝜎/ 𝑋-𝑋67−𝐴
Since 𝐸 𝜀𝜀-|𝑋 = 𝜎/𝐼 and 𝐴𝑋-𝑋𝐴 = 𝐴
11/5/20 Lecture 15 GR5411 by Seyhan Erden 11
𝑋-𝑋 67𝑋- −𝐴𝑋-
–
Finite Sample Properties of Restricted LS:
𝑉𝑎𝑟𝛽;>|𝑋=𝜎/ 𝑋-𝑋67−𝐴
can be estimated by
K
𝑉𝑎𝑟𝛽;>|𝑋=𝑆>/ 𝑋-𝑋67−𝐴
where
𝑆/ = >
1 𝑛−𝑘+𝑗
O
M𝜀̂/ = 1
11/5/20
Lecture 15 GR5411 by Seyhan Erden 12
2N7
𝜀̂-𝜀̂ > 𝑛−𝑘+𝑗 > >
Finite Sample Properties of Restricted LS:
1O1
𝑆/ = M𝜀̂/ = 𝜀̂-𝜀̂
2N7
Note that, from property 3 above, we have
𝜀̂ = 𝐼−𝑃+𝑋𝐴𝑋′ 𝜀 >
Then,- – —
𝜀̂ 𝜀̂ =𝜀 𝐼−𝑃+𝑋𝐴𝑋 𝐼−𝑃+𝑋𝐴𝑋′ 𝜀
From property 4, we know the term in parenthesis is idempotent, then
𝜀̂-𝜀̂ =𝜀- 𝐼−𝑃+𝑋𝐴𝑋′𝜀 >>
> 𝑛−𝑘+𝑗
> 𝑛−𝑘+𝑗 > >
>>
11/5/20
Lecture 15 GR5411 by Seyhan Erden 13
Is 𝑠>/ unbiased estimator of 𝜎/?
𝐸𝑠/|𝑋 =𝐸 >
1 𝜀̂-𝜀̂|𝑋 𝑛−𝑘+𝑗 > >
= =
1 𝐸 𝑡𝑟 𝜀- 𝐼 − 𝑃 + 𝑋𝐴𝑋′ 𝜀 𝑛−𝑘+𝑗
1 𝐸𝑡𝑟 𝐼−𝑃+𝑋𝐴𝑋′𝜀𝜀-|𝑋 𝑛−𝑘+𝑗
=𝑛−𝑘+𝑗𝐸 𝜀𝜀-|𝑋 𝑛−𝑘+𝑗
= 𝜎/
Since, 𝜀̂-𝜀̂ =𝜀- 𝐼−𝑃+𝑋𝐴𝑋′ 𝜀 >>
And 𝑡𝑟 𝐼 − 𝑃 + 𝑋𝐴𝑋′ = 𝑛 − 𝑘 + 𝑗 from property 5. 11/5/20 Lecture 15 GR5411 by Seyhan Erden 14
Distributional Properties:
By linearity of property 2 above, conditional on 𝑋, 𝛽;> − 𝛽 is normal. Given the mean and the variance above, we deduce,
𝛽; > ~ 𝑁 𝛽 , 𝜎 / 𝑋 – 𝑋 6 7 − 𝐴
We know that 𝜀̂ = 𝐼 − 𝑃 + 𝑋𝐴𝑋′ 𝜀 is linear in 𝜀, so is >
also conditionally normal.
Since 𝐼−𝑃+𝑋𝐴𝑋′ 𝑋-𝑋 67𝑋- −𝐴𝑋-
–
= 𝐼−𝑃+𝑋𝐴𝑋′ 𝑋 𝑋-𝑋 67 −𝑋𝐴 =0,𝜀̂ and𝛽; are /> ;>
uncorrelated and thus independent. Thus, 𝑠> and 𝛽> are independent.
11/5/20 Lecture 15 GR5411 by Seyhan Erden 15
Distributional Properties:
Since, 𝜀̂- 𝜀̂ = 𝜀- 𝐼 − 𝑃 + 𝑋𝐴𝑋′ 𝜀 and the fact that >>
𝐼 − 𝑃 + 𝑋𝐴𝑋′ is idempotent with rank (𝑛 − 𝑘 + 𝑗), it follows
( 𝑛 − 𝑘 + 𝑗 ) 𝑠 >/ 𝜎/
~ 𝜒/ O6UVW
𝛽; −𝛽 𝑡= W W ~
𝑠𝑒 𝛽; W
𝑁(0,1)
𝜒/ /(𝑛 − 𝑘 + 𝑗) O6UVW
~𝑡O6UVW
Since there are 𝑗 restrictions, there are 𝑘 − 𝑗 free parameters instead of 𝑘 Estimating a model with 𝑘 coefficients and 𝑗 restrictions is equivalent to estimation with 𝑘 − 𝑗 coefficients
Lecture 15 GR5411 by Seyhan Erden 16
Is 𝛽;> more efficient?
An interesting relationship under homoscedastic
regression model:
𝐶𝑜𝑣𝛽;>,𝛽;−𝛽;> =𝐸 𝛽;−𝛽;> 𝛽;>−𝛽- =0
You will show in the problem set, it is easy using properties 2 and the fact that 𝐴𝑋-𝑋𝐴 = 𝐴.
One corollary is ; ; ; 𝑐𝑜𝑣𝛽>,𝛽 =𝑣𝑎𝑟𝛽>
Second corollary is
𝑣𝑎𝑟 𝛽;−𝛽;> =𝑣𝑎𝑟 𝛽; +𝑣𝑎𝑟 𝛽;> −2𝑐𝑜𝑣 𝛽;,𝛽;> = 𝑣 𝑎 𝑟 𝛽; − 𝑣 𝑎 𝑟 𝛽; >
11/5/20 Lecture 15 GR5411 by Seyhan Erden 17
Hausman Equality:
Note that the Second corollary is known as Hausman Equality
𝑣𝑎𝑟 𝛽;−𝛽;> =𝑣𝑎𝑟 𝛽; −𝑣𝑎𝑟 𝛽;>
This will appear again in GLS and IV estimators this semester.
The expression says that the variance of the difference between the estimators is equal to the difference between variances.
It occurs (generally) when we are comparing an efficient and an inefficient estimator.
11/5/20 Lecture 15 GR5411 by Seyhan Erden 18
Is 𝛽;> more efficient?
𝑣𝑎𝑟 𝛽;−𝛽;> =𝑣𝑎𝑟 𝛽; −𝑣𝑎𝑟 𝛽;>
=𝜎/𝑋-𝑋67−𝜎/ 𝑋-𝑋67−𝐴 = 𝜎/𝐴
Recall that A is a positive semi definite matrix.
11/5/20 Lecture 15 GR5411 by Seyhan Erden 19
Is 𝛽;> more efficient? 𝑣 𝑎 𝑟 𝛽; − 𝑣 𝑎 𝑟 𝛽; >
=𝜎/ 𝑋-𝑋 67𝑅 𝑅- 𝑋-𝑋 67𝑅 67𝑅- 𝑋-𝑋 67 ≥0
Hence,
𝑣𝑎𝑟𝛽; ≥𝑣𝑎𝑟𝛽;>
in the positive definite sense. Thus Restricted LS is more efficient than OLS estimator (in the linear homoscedastic model)
11/5/20 Lecture 15 GR5411 by Seyhan Erden 20
Why do we need Generalized Linear Regression Model?
The assumption of i.i.d. sampling fits many applications.
For example, 𝑦 and 𝑋 may contain information about individuals, such as wages, education, personal characteristics. If individuals are selected by simple random sampling 𝑦 and 𝑋 will be i.i.d.
Because 𝑋2 , 𝑦2 and 𝑋W , 𝑦W are independently distributed for 𝑖 ≠ 𝑗 and 𝜀2 and 𝜀W are independently distributed for 𝑖 ≠ 𝑗.
In the context of Gauss – Markov assumptions, the assumption 𝐸 𝜀𝜀-|𝑋 is diagonal therefore is appropriate if the data is collected in a way that makes the observations independently distributed.
11/5/20 Lecture 15 GR5411 by Seyhan Erden 21
Some sampling schemes encountered in econometrics do not, however, result in independent observations and instead can lead to error terms that are correlated. The leading example is time series data.
The presence of correlated errors creates two problems for inference based on OLS.
1. Neither the heteroskedasticity-robust nor homoskedasticity-only standard errors produced by OLS provide valid basis for inference.
2. If the error term is correlated across observations, then 𝐸 𝜀𝜀-|𝑋 is not diagonal, hence 𝐸 𝜀𝜀-|𝑋 ≠ 𝜎/𝐼, thus OLS is not BLUE.
11/5/20 Lecture 15 GR5411 by Seyhan Erden
22
In this lecture we will study an estimator, generalized least squares (GLS), that is BLUE (at least asymptotically) when conditional covariance matrix of errors is no longer proportional to the identity matrix (errors are non-spherical, 𝐸 𝜀𝜀-|𝑋 ≠ 𝜎/𝐼).
A special case of GLS is weighted least squares (WLS) in which the conditional covariance matrix of errors is diagonal and the 𝑖ab diagonal element is a function of 𝑋2.
Like WLS, GLS transforms the regression model so that the errors of the transformed model satisfy Gauss- Markov conditions. The GLS estimator is the OLS estimator of the transformed model.
11/5/20 Lecture 15 GR5411 by Seyhan Erden 23
Recall Gauss-Markov Conditions for Multiple Regression:
1. 𝐸𝜀|𝑋=0
(𝑛×𝑛 𝑚𝑎𝑡𝑟𝑖𝑥 𝑜𝑓 𝑧𝑒𝑟𝑜𝑠)
2. 𝐸𝜀𝜀′|𝑋 =𝜎/𝐼 (𝑛×𝑛 𝑚𝑎𝑡𝑟𝑖𝑥
𝑤𝑖𝑡h 𝜎/𝑜𝑛 𝑡h𝑒 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙𝑠) 3. 𝑋 has a full column rank
Recall that under these conditions OLS is BLUE
11/5/20 Lecture 15 GR5411 by Seyhan Erden 24
Generalized Linear Regression Model: Recall that when 𝜀|𝑋 is not spherical, the model
is
𝑦 = 𝑋𝛽 + 𝜀 𝐸𝜀|𝑋 =0
𝑉𝑎𝑟 𝜀|𝑋 =𝐸 𝜀𝜀′|𝑋 =Ω
where Ω is an 𝑛×𝑛 positive definite matrix that can depend on X. When errors are spherical we have the special case that Ω = 𝜎/𝐼
Two leading cases for Ω ≠ 𝜎/𝐼 are heteroskedasticity and autocorrelation.
11/5/20 Lecture 15 GR5411 by Seyhan Erden 25
Generalized Linear Regression Model:
Heteroskedasticity arises when errors do not have the same variances. This can happen with cross section as well as with time series data. For example volatile high- frequency data such as daily observations of financial market and in cross section data where the scale of observations depend on the level of the regressor. Disturbances are still assumed to be uncorrelated across observations under heteroskedasticity so Ω would be
𝜎7/ 0 0 0
11/5/20
Lecture 15 GR5411 by Seyhan Erden 26
0 𝜎/ … ⋮ ⋮ ⋮ ⋱𝜎/
Ω=
000O
Generalized Linear Regression Model:
Autocorrelation is more of a time-series data issue, let’s how Ω would look like if we have the following auto correlation: Let’s say
𝜀7 = 𝑣7
But thereafter, the errors follow an AR(1) model:
𝜀a = 𝜌𝜀a67 + 𝑣a
where 𝑣a is i.i.d. with mean zero and variance 1.
Hence,
𝜀a = 𝜌/𝜌𝜀a6/ + 𝑣a67 + 𝑣a =𝜌 𝜀a6/ +𝜌𝑣a67 +𝑣a
11/5/20 Lecture 15 GR5411 by Seyhan Erden 27