ECONOMETRICS I ECON GR5411
Lecture 7 – Finite Sample Properties and Asymptotic Properties of OLS
by
Seyhan Erden Columbia University MA in Economics
Orthogonal Projection Matrix:
Showthat𝑇𝑟𝑎𝑐𝑒 𝑀 =𝑛−𝑘.
While 𝑃 creates fitted values 𝑀 creates residuals.
9/30/20
𝑷 𝒚 = 𝑋 𝑋 / 𝑋 0 1 𝑋 / 𝑦 = 𝑋 𝛽3 = 𝒚5
𝑴𝒚= 𝐼−𝑋3𝑋/𝑋01𝑋/ 𝑦
= 𝑦 − 𝑋𝛽
= 𝑦 − 𝑦8
= 𝒆8
Lecture 7 GR5411 by Seyhan Erden 2
Projection:
𝑒 ̂ = 𝑦 − 𝑋 𝛽3
𝛽3 = 𝑋 / 𝑋 0 1 𝑋 / 𝑦 𝑋/𝑒̂=𝑋/𝑦−𝑋/𝑋𝛽3=𝑋/𝑦−𝑋/𝑋 𝑋/𝑋 01𝑋/𝑦=0
𝑒̂ = 𝑦 − 𝑋 𝑋/𝑋 01𝑋/𝑦 = 𝐼−𝑋𝑋/𝑋01𝑋/ 𝑦
=𝐼−𝑃𝑦 = 𝑀𝑦
Recall that both 𝑃 and 𝑀 are symmetric and idempotent, 𝑖.𝑒.𝑃=𝑃/,𝑀=𝑀/ 𝑃=𝑃𝑃and𝑀=𝑀𝑀
𝑀 is fundamental in regression analysis.
9/30/20 Lecture 7 GR5411 by Seyhan Erden 3
Projection:
And note that since
𝑒 ̂ = 𝑦 − 𝑋 𝛽3 = 𝑦 − 𝑋 𝑋 / 𝑋 0 1 𝑋 / 𝑦 = 𝑀 𝑦
Why is 𝑀 also called the residual maker matrix, 𝑀𝑦 = 𝑒̂
where 𝑒̂ is the residuals in the least squares regression of 𝑦 on 𝑋.
Recall
𝑀𝑋 = 0
Show this…. (intuitively 𝑋 regressed on 𝑋, a perfect fit
will result and the residuals will be zero)
𝑀𝑋= 𝐼−𝑋 𝑋/𝑋 01𝑋/ 𝑋=𝑋−𝑋=0
9/30/20 Lecture 7 GR5411 by Seyhan Erden 4
Projection:
And note that since
𝑒̂=𝑀𝑦=𝑀𝑋𝛽+𝑒 =𝑀𝑒 since 𝑀𝑋 = 0
A special case is when 𝑿 = 𝒊 and 𝑛×1 vector of one’s 𝑷 creates a vector of sample mean:
𝑃𝑦=𝑖 𝑖/𝑖 01𝑖/𝑦=𝑖𝑦C
𝑴 creates demeaned values:
𝑀𝑦= 𝐼−𝑖 𝑖/𝑖 01𝑖/ 𝑦=𝑦−𝑖𝑦C
𝑦1 𝑦C 𝑦1−𝑦C = 𝑦 D − 𝑦C = 𝑦 D − 𝑦C
𝑦⋮ ⋮ ⋮
F 𝑦C 𝑦F−𝑦C
Some textbooks refer to 𝐼 − 𝑖 𝑖/𝑖 01𝑖/ as 𝑀G (centering
matrix, see below…)
9/30/20 Lecture 7 GR5411 by Seyhan Erden 5
Projection:
The least squares results partition 𝑦 into two parts (1) The fitted values: 𝑦8 = 𝑋𝛽3
(2) The residuals: 𝑒̂ = 𝑀𝑦
Because 𝑀𝑋 = 0, these two parts are orthogonal 𝑦8 is orthogonal to 𝑒̂ (𝑋 is also orthogonal to 𝑒̂) Recall that 𝑀 and 𝑃 are orthogonal
𝑀𝑃 = 𝑃𝑀 = 0 𝑃𝑋 = 𝑋
Also, since
From above, least squares partitions the vector 𝑦 into two
orthogonal parts:
𝑦 = 𝑋 𝛽3 + 𝑒 ̂ = 𝑦8 + 𝑒 ̂
𝑦 = 𝑃𝑦 + 𝑀𝑦 = 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑖𝑜𝑛 + 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙
9/30/20 Lecture 7 GR5411 by Seyhan Erden 6
Projection:
𝑦 = 𝑦8 + 𝑒 ̂
𝑦 = 𝑃𝑦 + 𝑀𝑦 = 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑖𝑜𝑛 + 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙
Useful results:
1. Premultiplying by 𝑦/
𝑦/𝑦 = 𝑦/𝑃′𝑃𝑦 + 𝑦/𝑀′𝑀𝑦
𝑦 / 𝑦 = 𝑦8 / 𝑦8 + 𝑒 ̂ / 𝑒 ̂
(this is the Pythagorean theorem)
2. The equivalent expression of sum of squared residuals is useful
𝑒̂/𝑒̂ = 𝑦/𝑀/𝑀𝑦 = 𝑦/𝑀𝑦 = 𝑦/𝑒̂ = 𝑒̂/𝑦
9/30/20 Lecture 7 GR5411 by Seyhan Erden 7
Projection Theorems:
There exists a 𝑦8 and 𝑒̂ such that 𝑦 = 𝑦8 + 𝑒 ̂
𝑦8 / 𝑒 ̂ = 0 Ø𝑦8=𝑋𝛽3 where𝛽3=𝑋/𝑋01𝑋/𝑦
Ø 𝑋 is orthogonal to 𝑒̂ and therefore 𝑦8 is orthogonal to 𝑒̂
Ø 𝑒̂ has mean zero when the first column of 𝑋 is all ones
Ømean of 𝑦8 is equal to mean of 𝑦
9/30/20 Lecture 7 GR5411 by Seyhan Erden 8
Centering matrix: 𝑀G
𝑀G =𝐼−𝑖 𝑖/𝑖 01𝑖/
where 𝑖 is a unit vector (first column of 𝑋 the regressor matrix)
Then, deviations of 𝑥 from the sample mean is 𝑥∗ = 𝑀G𝑥
Show this:
=𝑥−𝑖 𝑛1 𝑖/𝑥
= 𝑥 − 𝑖𝑥̅
𝑥∗ = 𝐼−𝑖 𝑖/𝑖 01𝑖/ 𝑥 =𝑥−𝑖 𝑖/𝑖 01𝑖/𝑥
9/30/20
Lecture 7 GR5411 by Seyhan Erden 9
Centering matrix: 𝑀G
𝑀G is primarily used for computing sum of
squared deviations:
Note that 𝑀G can be written in two identical ways:
1. 𝑀G =𝐼−𝑖 𝑖/𝑖 01𝑖/ 2 . 𝑀 G = 𝐼 − F1 𝑖 𝑖 /
in the 2nd definition 𝑀G is an 𝑛×𝑛 matrix with diagonal elements equal to 1 − F1 and off-
diagonal elements equal to − F1 9/30/20 Lecture 7 GR5411 by Seyhan Erden
10
Centering matrix: 𝑀G
Then we can write sum of squared deviations as follows:
F
V 𝑥W − 𝑥̅ D = 𝑥/𝑀G𝑥 WX1
Showing this (backwards) 1 𝑥/𝑀G𝑥=𝑥/ 𝐼− 𝑛 𝑖𝑖/ 𝑥
= 𝑥/𝑥 − 𝑛1 𝑥/𝑖𝑖/𝑥 =V𝑥WD−𝑛1 V𝑥W
= V 𝑥W − 𝑥̅ D
D
9/30/20
Lecture 7 GR5411 by Seyhan Erden
11
Centering matrix: 𝑀G Similarly for two vectors 𝑥 and 𝑦:
F
V𝑥W−𝑥̅ 𝑦W−𝑦C = 𝑀G𝑥/𝑀G𝑦=𝑥/𝑀G𝑦 WX1
So,
V𝑥W−𝑥̅D V𝑥W−𝑥̅ 𝑦W−𝑦C
D
V𝑦W−𝑦C 𝑥W−𝑥̅ V𝑦W−𝑦C
𝑥/𝑀G𝑥 𝑥/𝑀G𝑦
𝑦/𝑀G𝑥 𝑦/𝑀G𝑦
9/30/20
Lecture 7 GR5411 by Seyhan Erden
12
=
Centering matrix: 𝑀G
If we put the two column vectors x and y in an 𝑛×2 matrix 𝑍 = 𝑥 𝑦 , then 𝑀G𝑍 is the 𝑛×2 matrix in which the two columns of data are in mean deviation form.
Then,
𝑀G𝑍 / 𝑀G𝑍 = 𝑍/𝑀G𝑍
9/30/20 Lecture 7 GR5411 by Seyhan Erden 13
Finite sample properties of 𝛽3: Under 𝐴6.
𝛽3|𝑋 ~ 𝑁 𝛽,𝜎D(𝑋/𝑋)01
Ø 𝐸 𝛽3 | 𝑋 = 𝐸 𝛽3 = 𝛽 Ø𝐸𝑠D|𝑋=𝐸𝑠D =𝜎D Ø𝑉𝑎𝑟 𝛽3 X = 𝜎D 𝑋/𝑋 01
Ø𝑉𝑎𝑟𝛽3 =𝜎D𝐸e 𝑋/𝑋01 Ø𝛽3 is 𝐵. 𝐿. 𝑈. 𝐸. under 𝐴6
9/30/20 Lecture 7 GR5411 by Seyhan Erden 14
Least Squares Regression: OLS is unbiased
Using matrix notation:
𝑦 = 𝑋𝛽 + 𝑒 𝐸 𝑦|𝑋 = 𝑋𝛽
We already showed that the OLS estimator of 𝛽 is 𝛽3 = 𝑋 / 𝑋 0 1 𝑋 / 𝑦
Then
𝐸 𝛽3 | 𝑋 = 𝐸 𝑋 / 𝑋 0 1 𝑋 / 𝑦 | 𝑋
= 𝑋/𝑋 01𝑋/𝐸 𝑦|𝑋 = 𝑋/𝑋 01𝑋/𝑋𝛽
=𝛽
9/30/20
Lecture 7 GR5411 by Seyhan Erden 15
Least Squares Regression: OLS is unbiased
Another way to show that 𝛽3 is unbiased 𝛽3 = 𝑋 / 𝑋 0 1 𝑋 / 𝑦
Then
𝐸𝛽3|𝑋=𝐸 𝑋/𝑋01𝑋/𝑋𝛽+𝑒 |𝑋
= 𝑋/𝑋 01𝑋/𝑋𝛽 + 𝑋/𝑋 01𝑋/𝐸 𝑒|𝑋 =𝛽+0
=𝛽 dueto𝐴3−Exogeneityofregressors:𝐸𝑒|𝑋 =0
9/30/20 Lecture 7 GR5411 by Seyhan Erden 16
Least Squares Regression: OLS is unbiased
Another way to show 𝛽3 is unbiased can be done by showing 𝐸 𝛽3 − 𝛽 |𝑋 = 0, because
𝛽3 = 𝑋 / 𝑋 0 1 𝑋 / 𝑦
= 𝑋/𝑋01𝑋/ 𝑋𝛽+𝑒
= 𝑋/𝑋 01𝑋/𝑋𝛽 + 𝑋/𝑋 01𝑋/𝑒 =𝛽+ 𝑋/𝑋01𝑋/𝑒
We can show that
𝐸 𝛽3 − 𝛽 | 𝑋 = 𝐸 𝑋 / 𝑋 0 1 𝑋 / 𝑒 | 𝑋
= 𝑋/𝑋 01𝑋/𝐸 𝑒|𝑋
=0
9/30/20 Lecture 7 GR5411 by Seyhan Erden 17
Theorem: Mean Least Squares Estimator
In the linear regression model with usual assumptions
A1 through A6
Also, the unconditional expectation of OLS estimator is
equal to 𝛽
𝐸 𝛽3 = 𝐸 𝐸 𝛽3 | 𝑋 = 𝐸 𝛽 = 𝛽
𝐸 𝛽3 | 𝑋 = 𝛽
9/30/20 Lecture 7 GR5411 by Seyhan Erden 18
Variance of Least Squares Estimator under spherical errors:
𝛽3 = 𝑋 / 𝑋 0 1 𝑋 / 𝑦
= 𝑋/𝑋01𝑋/ 𝑋𝛽+𝜀 3 =𝛽+ 𝑋/𝑋01𝑋/𝜀
𝛽−𝛽= 𝑋/𝑋01𝑋/𝜀
Conditional Variance:
𝑉𝑎𝑟𝛽3|𝑋 =𝐸 𝛽3−𝛽 𝛽3−𝛽/|𝑋
= 𝐸 𝑋/𝑋 01𝑋/𝜀 𝑋/𝑋 01𝑋/𝜀 /|𝑋
9/30/20
Lecture 7 GR5411 by Seyhan Erden
19
Variance of Least Squares Estimator under spherical errors:
= 𝐸 𝑋/𝑋 01𝑋/𝜀 𝑋/𝑋 01𝑋/𝜀 /|𝑋
= 𝐸 𝑋/𝑋 01𝑋/𝜀𝜀/𝑋 𝑋/𝑋 01|𝑋 = 𝑋/𝑋 01𝑋/𝐸[𝜀𝜀/|𝑋]𝑋 𝑋/𝑋 01
= 𝜎D 𝑋/𝑋 01
This is valid under homoskedasticity, however not valid under heteroskedasticity (later…)
9/30/20 Lecture 7 GR5411 by Seyhan Erden 20
Gauss Markov Theorem (we will prove this later):
The Gauss-Markov conditions for multiple regression are
1. 𝐸 𝜀|𝑋 = 𝑂
2. 𝐸 𝜀𝜀/|𝑋 = 𝜎D𝐼 3. 𝑋 has full rank
GM theorem says under these conditions, for the linear regression model, 𝛽3 is the best (minimum variance) linear unbiased estimator.
𝑉𝑎𝑟 𝛽3 X = 𝜎D 𝑋/𝑋 01
𝑉𝑎𝑟𝛽3 =𝐸e𝑉𝑎𝑟𝛽3X +𝑉𝑎𝑟e𝐸𝛽3X
9/30/20 Lecture 7 GR5411 by Seyhan Erden 21
Variance of Least Squares Estimator under non-spherical errors:
What happens to the variance under heteroskedasticity?
𝐸 𝜀𝜀/ 𝑋 = Ω
𝜎1D 0 0
Ω= 0 𝜎D 0⋮ =𝑑𝑖𝑎𝑔𝜎1D,…,𝜎FD 0⋮ ⋱𝜎FD
Then, from above
𝑉𝑎𝑟 𝛽3|𝑋 = 𝑋/𝑋 01𝑋/𝐸[𝜀𝜀/|𝑋]𝑋 𝑋/𝑋 01 = 𝑋/𝑋 01 𝑋/Ω𝑋 𝑋/𝑋 01
where
9/30/20 Lecture 7 GR5411 by Seyhan Erden 22
Non-spherical errors:
𝐸 𝛽3 𝑋 = 𝛽 is unbiased provided 𝐸 𝜀 𝑋 = 0. It is no longer efficient when 𝜀 are not spherical.
𝑉𝑎𝑟𝛽3𝑋 =𝐸 𝛽3−𝛽 𝛽3−𝛽/|𝑋
= 𝐸 𝑋/𝑋 01𝑋/𝜀 𝑋/𝑋 01𝑋/𝜀 /|𝑋
= 𝐸 𝑋/𝑋 01𝑋/𝜀𝜀/𝑋 𝑋/𝑋 01|𝑋
= 𝑋/𝑋 01𝑋/𝐸[𝜀𝜀/|𝑋]𝑋 𝑋/𝑋 01 = 𝑋/𝑋 01 𝑋/Ω𝑋 𝑋/𝑋 01
≠ 𝜎D 𝑋/𝑋 01
𝛽3|𝑋 ~ 𝑁 𝛽,(𝑋/𝑋)01(𝑋/Ω𝑋) 𝑋/𝑋 01
9/30/20
Lecture 7 GR5411 by Seyhan Erden 23
Estimating error variance:
The error variance 𝜎D = 𝐸 𝑒WD is a moment, so a natural estimator is a moment estimator. If 𝑒W/𝑠 were observed, we would estimate 𝜎D by
1F1
𝜎s D = 𝑛 V 𝑒 WD = 𝑛 𝑒 / 𝑒
/ WX1
However, 𝑒W 𝑠 are not observed, hence we have to estimate them: F
𝜎8 D = 𝑛1 V 𝑒 ̂ WD = 𝑛1 𝑒 ̂ / 𝑒 ̂ = 𝑛1 𝑒 / 𝑀 𝑒 WX1
9/30/20 Lecture 7 GR5411 by Seyhan Erden 24
Estimating error variance:
Since,
𝑒̂=𝑀𝑦=𝑀 𝑋𝛽+𝜀 =𝑀𝑋𝛽+𝑀𝑒=𝑀𝑒 where𝑀=𝐼−𝑋 𝑋/𝑋 01𝑋/and𝑀𝑋=0 Then 𝑒̂/𝑒̂ = 𝑒/𝑀𝑒 since 𝑀 is idempotent.
Then we can show
𝜎s D − 𝜎8 D = 𝑛1 𝑒 / 𝑒 − 𝑛1 𝑒 / 𝑀 𝑒 = 𝑛1 𝑒 / 𝑃 𝑒 ≥ 0
Meaning the feasible estimator is smaller than the the idealized estimator. (Because 𝑃 is positive semi-definite and 𝑒/𝑃𝑒 is quadratic form)
9/30/20 Lecture 7 GR5411 by Seyhan Erden 25
Estimating 𝜎D:
Definition: The LS estimator of 𝜎D is
𝑠D = 𝑒̂/𝑒̂ D 𝑛−𝑘
Theorem: 𝑠 is unbiased Proof:
𝑒̂=𝑀𝑦=𝑀 𝑋𝛽+𝜀 =𝑀𝑋𝛽+𝑀𝜀=𝑀𝜀 where 𝑀 = 𝐼 − 𝑋(𝑋/𝑋)01𝑋/
hence 𝑀𝑋 = 0
Then 𝑒̂/𝑒̂ = 𝜀/𝑀𝜀 since 𝑀 is idempotent.
9/30/20 Lecture 7 GR5411 by Seyhan Erden 26
LS estimator of 𝜎D:
𝐸(𝑒̂/𝑒̂) = 𝐸(𝜀/𝑀𝜀)
Question: How do you take the expected value of a quadratic form?
Answer: By tracing it
𝐸 𝜀/𝑀𝜀
= 𝐸 𝑡𝑟𝑎𝑐𝑒(𝜀/𝑀𝜀)
= 𝐸 𝑡𝑟(𝑀𝜀𝜀/)
= 𝑡𝑟 𝑀 𝐸(𝜀𝜀/)
= 𝑡𝑟 (𝜎D𝑀)
=𝜎D𝑡𝑟𝐼− 𝑋/𝑋01𝑋/𝑋 = 𝜎D𝑡𝑟(𝐼 − 𝑋(𝑋/𝑋)01𝑋/)
= 𝜎D(𝑛 − 𝑘)
9/30/20
Lecture 7 GR5411 by Seyhan Erden 27