Alastair Hall ECON61001: Autumn 2020 Econometric Methods
Solutions to Problem Set for Tutorial 2
1.(a) Note that rank(X) = k implies rank(X1) = k1 (why?), and so (X1′ X1)−1 exists. By definition, γˆT = (X1′X1)−1X1′y.
Substituting for y, it follows that
γˆT = (X1′X1)−1X1′(Xβ0 + u)
= (X1′ X1)−1X1′ (X1β0,1 + X2β0,2 + u)
= β0,1 + (X1′ X1)−1X1′ X2β0,2 + (X1′ X1)−1X1′ u
Using CA2, it follows that
E[γˆT] = β0,1 + (X1′X1)−1X1′X2β0,2 + (X1′X1)−1X1′E[u],
and so from CA4, it follows that
E[γˆT] = β0,1 + (X1′X1)−1X1′X2β0,2.
γˆT is an unbiased estimator of β0,1 if β0,2 = 0 and so X2 does not belong in the model or X1′ X2 = 0 that is, if the included regressors (X1) and excluded regressors (X2) are orthogonal. The latter condition implies that the part of y that can be linearly predicted by X1 is linearly unrelated to the part of y that can be linearly predicted using X2.
1.(b) If k2 = 1 then we have
where δˆT is the OLS estimator of the regression coefficient from the regression of x2 on X1.
Thereforeitfollowsfrompart(a)thatE[γˆT] = β0,1 +δˆTβ0,2.
1.(c) Let β0,1 = [α1, α2]′. Given this model, the true value of returns to education is α2 and the estimated returns to education is given by the second element of γˆT , γˆT ,2 say. From part (b)
we have,
E[γˆT,2] = α2 + δˆT,2β0,2 (1) where δˆT,2 is the slope coefficient from the regression of ability on education. Therefore, the
bias of the estimator of the returns to education is
bias[γˆT,2] = E[γˆT,2] − α2 = δˆT,2β0,2.
It would be anticipated that δˆT,2 > 0 and β0,2 > 0, and so bias[γˆT,2] > 0. Therefore in this simple setting, we expect the omission of innate ability from the wage equation to cause the
1
(X′X)−1X′x =δˆ 1112T
estimator of the returns to education to be upward biased. (In this question, we maintained the assumption that X is fixed in repeated samples so that δˆT is a constant. This is, of course, unrealistic in this example. In lecture 4, we extend our analysis to stochastic regressors in which case we could repeat the analysis here subject to certain conditions only replacing δˆT by its expectation.)
2. Recall that for an l × l matrix tr(C) = li=1 Ci,i where Ci,j is the i − jth element of C. Now consider C = AB and D = BA where A and B are m × n and n × m, so that C is m × m
and D is n×n . By definition
and so
Similarly,
and so
n
Ai,j Bj,i
j=1
mn
tr(C) = Ai,jBj,i
i=1 j=1 m
Dj,j = Bj,iAi,j i=1
nm
tr(D) = Bj,iAi,j.
Ci,i =
j=1 i=1
Since we can reverse the order of the summations and of scalars, tr(D) = mi=1 nj=1 Ai,jBj,i =
tr(C).
3 . U s i n g βˆ T − β 0 = ( X ′ X ) − 1 X ′ u , w e h a v e
epT +1 = uT +1 − x′T +1(X′X)−1X′u.
where u is the T × 1 vector with tth element ut. To establish the desired result, it is easiest
to define the (T +1)×1 vector v whose tth element is ut, and a 1×(T +1) vector n′ given by
n′ = [− x′T +1(X′X)−1X′, 1].
Note that: epT+1 = n′v. We are given that v ∼ N(0, σ02IT+1), and so the prediction error is a linear combination of normal random variables, and so from Lemma 2.1 in the Lecture Notes
it follows that
Multiplying out, we obtain
epT+1 ∼ N(0,σ02n′n).
n′n = 1 + x′T+1(X′X)−1xT+1.
2
4.(a) Notice that equation (2) on the problem set represents the true model – that is, it reduces to equation (1) – if we put β1 = β1,0 and β2 = 0k2 ×1 (equal to a k2 × 1 vector of zeros). Therefore, equation (2) can be viewed as a correctly specified model (with the aforementioned values for the parameters), and so from lectures we have V ar[β ̃] = σ02(X′X)−1. Using the partitions of β and X we have
Var[β ̃] = Var[β ̃1] Cov[β ̃1,β ̃2] Cov[β ̃2, β ̃1] V ar[β ̃2]
and so V ar[β ̃1] = σ02V1 where V1 is the k1 × k1 matrix defined by (X′X)−1=V1 C.
C′ V2 Using the partition of X = (X1, X2), we have
′−1 X1′X1X1′X2−1 (XX)=X′XX′X ,
21 22
and so using the partitioned matrix inversion formula (Lemma 2.3 in the Lecture Notes), it follows that V1 = (X1′ M2X1)−1 which gives the desired result.
4.(b) By similar arguments to our analysis of OLS in lectures, V ar[βˆ1] = σ02(X1′ X1)−1. Since σ02 > 0, Var[β ̃1]−Var[βˆ1]ispsdiff(X1′M2X1)−1−(X1′X1)−1. Usingthehint,withA=(X1′M2X1)−1 and B = (X1′ X1)−1, it suffices to consider B−1 − A−1. We have
B−1 −A−1 = X1′X1 = X 1′ X 1 = X 1′ X 1
− X1′M2X1
− X1′ (IT − X2(X2′ X2)−1X2′ )X1 − X1′ X1 + X1′ X2(X2′ X2)−1X2′ X1
= X1′ X2(X2′ X2)−1X2′ X1 = C, say.
Notice that C = DD′ where D = X2(X2′ X2)−1X2′ X1 and so is psd by construction (see Tu-
torial 1 Question 3). Therefore, V ar[β ̃1] − V ar[βˆ1] is psd.
Note: this result implies that the inclusion of irrelevant regressors does not improve the effi-
ciency of OLS estimators. Combining the results in Questions 1 and 4 we have the following: • the inclusion of irrelevant regressors in general leads to less efficient OLS estimators;
• the exclusion of relevant regressors in general leads to biased OLS estimators.
The exception to these “in general” rules is where X1′ X2 = 0 – when X1 and X2 are linearly
unrelated – because then V ar[β ̃1] = V ar[βˆ1] and E[γˆT ] = β0,1.
3