ECONOMETRICS I ECON GR5411
Lecture 10 – Efficiency of OLS Gauss Markov Theorem
by
Seyhan Erden Columbia University MA in Economics
Efficiency of OLS:
Under the Gauss-Markov conditions for multiple regression, the OLS estimator of 𝛽 is efficient among all linear conditionally unbiased estimators; that is, the OLS estimator is the best linear unbiased estimator (BLUE)
The Gauss-Markov conditions for multiple regression are
1. 𝐸 𝜀|𝑋 = 𝑂
2. 𝐸 𝜀𝜀+|𝑋 = 𝜎-𝐼
3. 𝑋 has full rank
10/12/20 Lecture 10 GR5411 by Seyhan Erden 2
Linear Conditionally Unbiased Estimator
We start by describing the class of linear unbiased estimators and by showing that OLS is in that class.
An estimator of 𝛽 is said to be linear if it is a linear function of 𝑌1, … , 𝑌4. Accordingly, the estimator 𝛽5 is linear in 𝑌 if it can be written in the form
𝛽5 = 𝐴 + 𝑌
where 𝐴 is 𝑛×𝑘 matrix of weights that may depend
on 𝑋 and nonrandom constants but not on 𝑌.
An estimator is conditionally unbiased if the mean of its conditional sampling distribution given 𝑋 is 𝛽. That is, 𝛽5 is conditionally unbiased if 𝐸 𝛽5|𝑋 = 𝛽.
10/12/20 Lecture 10 GR5411 by Seyhan Erden
3
OLS estimator given as
𝛽: = (𝑋+𝑋)=1𝑋+𝑌
is linear in 𝑌; specifically 𝛽: = 𝐴:+𝑌 where 𝐴:+ =
(𝑋+𝑋)=1𝑋+ or 𝐴: = 𝑋(𝑋+𝑋)=1
Using the first Gauss-Markov condition,
𝐸 𝜀|𝑋 = 𝑂, we can show that 𝛽: is conditionally unbiased:
𝛽: = (𝑋+𝑋)=1𝑋+𝑌 = 𝑋+𝑋 =1𝑋+ 𝑋𝛽 + 𝜀 : =𝛽+ 𝑋+𝑋=1𝑋+𝜀
𝐸 𝛽|𝑋 =𝛽+𝐸 𝑋+𝑋 =1𝑋+𝜀|𝑋
=𝛽+ 𝑋+𝑋=1𝑋+𝐸𝜀|𝑋 =𝛽
10/12/20 Lecture 10 GR5411 by Seyhan Erden 4
Gauss-Markov Theorem
Suppose Gauss-Markov conditions for multiple regression given as 1 – 3 above
hold. Then the OLS estimator 𝛽: is BLUE. That is, let 𝛽5 be a linear, conditionally unbiased estimator of 𝛽, and let 𝑐 be a nonrandom 𝑘 dimensional vector. Then
𝑣𝑎𝑟 𝑐′𝛽:|𝑋 ≤ 𝑣𝑎𝑟 𝑐′𝛽5|𝑋 for every nonzero vector 𝑐, where the
inequality holds with equality for all 𝑐 only
i f 𝛽5 = 𝛽:
10/12/20 Lecture 10 GR5411 by Seyhan Erden 5
Gauss-Markov Theorem
The Gauss-Markov theorem for multiple regression provides conditions under which the OLS estimator is efficient among the class of linear conditionally unbiased estimators.
A subtle point arises, however, because 𝛽: is a vector and its “variance” is a covariance matrix.
When the variance of an estimator is a matrix, just what does it mean to say that one estimator has a smaller variance than the other?
10/12/20 Lecture 10 GR5411 by Seyhan Erden 6
Gauss-Markov Theorem
The Gauss-Markov theorem handles this problem by comparing the variance of a candidate estimator of a linear combination of the elements of 𝛽 to the variance of the corresponding linear combination of 𝛽:.
Specifically, let 𝑐 be a 𝑘×1 vector, and consider the problem of estimating the linear combination
𝑐′𝛽 using the candidate estimator 𝑐′𝛽5 (where 𝛽5 is a linear conditionally unbiased estimator) on the one hand and 𝑐′𝛽: on the other hand.
10/12/20 Lecture 10 GR5411 by Seyhan Erden 7
Gauss-Markov Theorem
Because 𝑐′𝛽5 and 𝑐′𝛽: are both scalars and are both linear conditionally unbiased estimators of 𝑐′𝛽, it now makes sense to compare their variances.
The Gauss-Markov theorem says that 𝑐′𝛽: has the smallest conditional variance. Remarkably this is true no matter what the linear combination is. It is in this sense that OLS estimator is BLUE in multiple regression.
10/12/20 Lecture 10 GR5411 by Seyhan Erden 8
Proof of Gauss-Markov Theorem
Let 𝛽5 be a linear conditionally unbiased estimator of 𝛽 so that 𝛽5 = 𝐴+𝑌 and 𝐸 𝛽5|𝑋 = 𝛽
where 𝐴 is 𝑛×𝑘 matrix that can depend on 𝑋 and nonrandom constants.
We show that 𝑣𝑎𝑟 𝑐′𝛽: ≤ 𝑣𝑎𝑟 𝑐+𝛽5 for all 𝑘 dimensional vectors 𝑐, where the inequality holds with equality only if 𝛽5 = 𝛽:
Because 𝛽5 is linear, it can be written as 𝛽5=𝐴+𝑌=𝐴+ 𝑋𝛽+𝜀 = 𝐴+𝑋 𝛽+𝐴′𝜀
10/12/20 Lecture 10 GR5411 by Seyhan Erden 9
Proof of Gauss-Markov Theorem
𝛽5=𝐴+𝑌=𝐴+ 𝑋𝛽+𝜀 = 𝐴+𝑋 𝛽+𝐴′𝜀
By the first Gauss-Markov condition,
𝐸𝜀|𝑋 =𝑂
𝐸 𝛽5 | 𝑋 = 𝐴 + 𝑋 𝛽
so,
but because 𝛽5 is conditionally unbiased,
𝐸 𝛽5 | 𝑋 = 𝛽 = 𝐴 + 𝑋 𝛽 which implies that 𝐴+𝑋 = 𝐼
Thus, 𝛽5 = 𝛽 + 𝐴′𝜀
10/12/20 Lecture 10 GR5411 by Seyhan Erden 10
Proof of Gauss-Markov Theorem
Thus, 𝛽5 = 𝛽 + 𝐴′𝜀, so
𝑣𝑎𝑟 𝛽5|𝑋 = 𝑣𝑎𝑟 𝐴+𝜀|𝑋
= 𝐸 𝐴+𝜀𝜀+𝐴|𝑋 = 𝐴+𝐸 𝜀𝜀+|𝑋 𝐴
= 𝜎-𝐴+𝐴
where the third equality follows because 𝐴 can depend
on 𝑋 but not 𝜀 and the final equality follows from the second Gauss-Markov condition. That is, if 𝛽5 is linear and unbiased, then under the Gauss-Markov conditions,
and
𝐴+𝑋 = 𝐼
𝑣𝑎𝑟 𝛽5|𝑋 = 𝜎-𝐴+𝐴
10/12/20 Lecture 10 GR5411 by Seyhan Erden 11
Proof of Gauss-Markov Theorem
This result also apply to 𝛽: with 𝐴 = 𝐴: = 𝑋 𝑋+𝑋 =1, where 𝑋+𝑋 =1 exists by the third Gauss-Markov condition.
Now let 𝐴 = 𝐴: + 𝐷, so that 𝐷 is the difference between 𝐴 and 𝐴:.
Notethat:+ + =1 + 𝐴𝐴=𝑋+𝑋=1𝑋+𝐴 + =1
= 𝑋𝑋 𝑋𝑋𝑋𝑋 = 𝑋+𝑋 =1
and:+: +=1+ +=1 +=1 𝐴𝐴=𝑋𝑋𝑋𝑋𝑋𝑋=𝑋𝑋
10/12/20 Lecture 10 GR5411 by Seyhan Erden 12
Proof of Gauss-Markov Theorem
So,
Substituting 𝐴 = 𝐴: + 𝐷 into the formula for conditional variance yields
𝑣 𝑎 𝑟 𝛽5 | 𝑋 = 𝜎 – 𝐴 + 𝐴 +
=𝜎-𝐴:+𝐷 𝐴:+𝐷
=𝜎- 𝐴:+𝐴:+𝐴:+𝐷+𝐷+𝐴:+𝐷+𝐷 =𝜎- 𝑋+𝑋=1+𝜎-𝐷+𝐷
Since, 𝑣𝑎𝑟 𝛽:|𝑋 = 𝜎- 𝑋+𝑋 =1
10/12/20 Lecture 10 GR5411 by Seyhan Erden 13
𝐴:+𝐷=𝐴:+ 𝐴−𝐴: =𝐴:+𝐴−𝐴:+𝐴:=0
Proof of Gauss-Markov Theorem
Since, 𝑣𝑎𝑟 𝛽:|𝑋 = 𝜎- 𝑋+𝑋 =1 and we found
that
This implies,
𝑣 𝑎 𝑟 𝛽5 | 𝑋 − 𝑣 𝑎 𝑟 𝛽: | 𝑋 = 𝜎 – 𝐷 + 𝐷
The difference between the variances of two estimators of the linear combination 𝑐′𝛽 is
𝑣𝑎𝑟 𝑐′𝛽5|𝑋 − 𝑣𝑎𝑟 𝑐′𝛽:|𝑋 = 𝜎-𝑐+𝐷+𝐷𝑐 ≥ 0
𝑣 𝑎 𝑟 𝛽5 | 𝑋 = 𝜎 – 𝑋 + 𝑋 = 1 + 𝜎 – 𝐷 + 𝐷
10/12/20 Lecture 10 GR5411 by Seyhan Erden 14
Proof of Gauss-Markov Theorem
The inequality holds for all linear combinations 𝑐′𝛽, and the inequality holds with equality for all
nonzero 𝑐 only if 𝐷 = 0 – that is, if 𝐴 = 𝐴: or equivalently, 𝛽5 = 𝛽:. Thus 𝑐′𝛽: has the smallest
variance of all linear conditionally unbiased estimators of 𝑐′𝛽; that is, the OLS estimator is BLUE.
10/12/20 Lecture 10 GR5411 by Seyhan Erden 15
Irrelevant Variables in Regression
True model:
𝑦 = 𝑋𝛽 + 𝜀 (1)
Fitting a wrong model:
𝑦 = 𝑋𝛽 + 𝑍𝛾 + 𝑢 (2)
Assume 𝐸 𝜀|𝑋, 𝑍 = 0, 𝑍 does not belong in (1)
From (2) we get (recall FWL Theorem) 𝛽: = 𝑋 ′ 𝑀 N 𝑋 = 1 𝑋 ′ 𝑀 N 𝑦
But the true 𝑦 is given in (1), so
𝛽: = 𝑋 ′ 𝑀 N 𝑋 = 1 𝑋 ′ 𝑀 N 𝑋 𝛽 + 𝜀
= 𝛽 + 𝑋′𝑀N𝑋 =1𝑋+𝑀O𝜀
10/12/20 Lecture 10 GR5411 by Seyhan Erden 16
𝐸𝛽: =𝛽+𝐸 𝑋′𝑀N𝑋=1𝑋+𝑀O𝜀
Now, apply L.I.E to the 2nd term and keep in mind that 𝐸 𝜀|𝑋, 𝑍 = 0,
𝐸 𝑋′𝑀N𝑋=1𝑋′𝑀N𝜀
=𝐸𝐸 𝑋′𝑀N𝑋=1𝑋+𝑀O𝜀|𝑋,𝑍
=𝐸 𝑋′𝑀N𝑋=1𝑋+𝑀O𝐸𝜀|𝑋,𝑍 =0
Hence,
𝐸 𝛽: = 𝛽
It is easy to show that 𝛽: →Q 𝛽, 𝛽: is consistent. 10/12/20 Lecture 10 GR5411 by Seyhan Erden 17
𝑉𝑎𝑟 𝛽: |𝑋,𝑍 = 𝑉𝑎𝑟 𝑋′𝑀N𝑋 =1𝑋+𝑀O𝜀|𝑋,𝑍
= 𝐸 𝑋′𝑀N𝑋 =1𝑋+𝑀O𝜀 𝑋+𝑀O𝑋 =1𝑋+𝑀O𝜀 +|𝑋, 𝑍
= 𝑋′𝑀N𝑋 =1𝑋+𝑀O𝐸 𝜀𝜀+|𝑋,𝑍 𝑀O𝑋 𝑋′𝑀N𝑋 =1 = 𝑋′𝑀N𝑋 =1𝑋+𝑀O𝜎-𝐼𝑀O𝑋 𝑋′𝑀N𝑋 =1
=𝜎- 𝑋′𝑀N𝑋=1
𝛽: is not efficient since 𝑀N is positive semi definite.
Covariance matrix without 𝑍 is never larger than the one with 𝑍. Intuition: inclusion of irrelevant variables increases the variance of errors in OLS.
10/12/20 Lecture 10 GR5411 by Seyhan Erden 18