Columbia University GR 5411 Econometrics I
MA in Economics
Seyhan Erden
SOLUTIONS TO Problem Set 4
due on Nov. 16th at 10am through Gradescope ___________________________________________________________________
1. Show that
where 𝛽& is the OLS estimator of 𝛽 in a regular regression model and 𝛽&’ is the restricted OLS
we covered in class.
Solution: Hansen Jan 2019 on page 265, Hansen Aug 2019 on page 257, Hansen Oct 2020 on page 203
2. Consider the regression model 𝑌 = 𝛽 + 𝛽 𝑋 + 𝑢 where 𝑢 = 𝑢8 and 𝑢 = 0.5𝑢 + 𝑢8 2 3 52 2 5 5 2 2;5 2
𝐶𝑜𝑣$𝛽&’,𝛽&−𝛽&’*=𝐸-$𝛽&−𝛽&’*$𝛽&’ −𝛽*./=0
for 𝑖 = 2,3, … , 𝑛. Suppose that 𝑢82 is i.i.d. with mean 0 and variance 1 and are distributed independently of 𝑋 for all 𝑖 and 𝑗.
A
(a) Derive and expression for 𝐸(𝑈𝑈′) = Ω.
(b) Explain how to estimate the model by GLS without explicitly inverting the matrix Ω (Hint: Transform the model so that the expression errors are 𝑢85, 𝑢8H, … , 𝑢8I.)
Solution:
(a)
The regression errors satisfy 𝑢5 = 𝑢85 and 𝑢2 = 0.5𝑢2;5 + 𝑢82 for i = 2, 3,…, n with the random variables 𝑢82 being i.i.d. with mean 0 and variance 1. For i > 1, continuing substituting ui – j = 0.5ui – j – 1 + 𝑢82;A ( j = 1, 2,…, i ≥ 2) and 𝑢5 = 𝑢85 into the expression ui = 0.5ui – 1 + 𝑢82 yields
ui =0.5ui−1+u!i
=0.5(0.5ui−2 +u!i−1)+u!i =0.52(0.5u +u! )+0.5u! +u!
i−3 i−4
∑i i − j
= 0.5 u!j.
j=1
i−2 + u!
i−1 ) + 0.52 u!
i
+ 0.5u!
= 0.53 (0.5u
=””
= 0.5i−1u!1 + 0.5i−2 u!2 + 0.5i−3u!3 +”+ 0.52 u!i−2 + 0.5u!i−1 + u!i
i−3
i−2
i−1
+ u! i
Though we get the expression u = ∑i 0.5i− j u! for i > 1, it is apparent that it also holds i j=1 j
for i = 1. Thus we can get mean and variance of random variables ui (i = 1, 2,…, n): ∑i i − j
Similarly we can get
=0.5ks2. i
𝑐𝑜𝑣(𝑢,𝑢 )=0.5N𝜎H
2 2;N 2;N
E(ui)= 0.5 E(u!j)=0, j=1
2 ∑i i−j 2 ∑i 2 i−j 1−(0.52)i σi =var(ui)= (0.5 ) var(u!j)= (0.5 ) ×1= 1−0.52 .
j=1 j=1
In calculating the variance, the second equality has used the fact that u!i is i.i.d.
Since 𝑢 = ∑2 0.52;A 𝑢8 we know for k > 0, 2 AL5 A
i+k i i+k
ui+k = ∑0.5i+k− j u!j = 0.5k ∑0.5i− j u!j + ∑ 0.5i+k− j u!j
j=1 =0.5 ui +
∑i + k j=i+1
j=1 j=i+1 0.5 u!j.
k
i + k − j
Because 𝑢82 is i.i.d., the covariance between random variables ui and ui+k is i+k
The column vector U for the regression error is ⎛ u1 ⎞
It is straightforward to show
æåö
cov(ui , ui+k ) = covçui , 0.5k ui + 0.5i+k- j u!
è j=i+1 ø
⎜ u ⎟ U=⎜2⎟.
⎜ u! ⎟ ⎝n⎠
j÷
⎛E(u2) E(uu)!E(uu)⎞ ⎜1121n⎟
⎜E(uu) E(u2)!E(uu)⎟ E(UU′)=⎜ 21 2 2n ⎟.
⎜ ” ” # ” ⎟ ⎝ E(unu1) E(unu2) ! E(un2) ⎠
Because E(ui) = 0, we have 𝐸(𝑢H) = var(ui) and E(uiuj) = cov(ui, uj). Substituting in the 2
results on variances and covariances, we have
⎛ σ2 0.5σ2 0.52σ2 0.53σ2 ! 0.5n−1σ2 ⎞ ⎜11111⎟
⎜ 0.5σ12 Ω=E(UU′)=⎜ 0.52σ12
σ2
0.5σ2
0.52σ 2
”
0.5σ2
σ32
0.5σ 32
”
0.52σ2 ! 0.5n−2σ2 ⎟ 0.5σ32 ! 0.5n−3σ32 ⎟
σ 42 ! 0.5n−4σ 42 ⎟ ” # ” ⎟
1-(0.52)i withs2 = .
⎜ 0.53σ 12 ⎜ ”
0.5n−4σ 2 ! σ 2 ⎟
⎜ 0.5n−1σ 2 ⎝1234n⎠
0.5n−2σ 2
0.5n−3σ 2
i 1-0.52
(b) The original regression model is
𝑌=𝛽+𝛽𝑋+𝑢 23522
Lagging each side of the regression equation and subtracting 0.5 times this lag from each side gives
𝑌 −0.5𝑌 =0.5𝛽 +𝛽 (𝑋 −0.5𝑋 )+𝑢 −0.5𝑢
2 2;5 3 5 2 2;5 2 2;5
for i = 2,…, n with 𝑢2 − 0.5𝑢2;5 = 𝑢82. Also 𝑌=𝛽+𝛽𝑋+𝑢
53555 with 𝑢5 = 𝑢85. Thus we can define a pair of new variables
$𝑌P,𝑋P ,𝑋P *=(𝑌 −0.5𝑌 ,0.5,𝑋 −0.5𝑋 ) 2 52 H2 𝑖 𝑖−1 𝑖 𝑖−1
for i = 2,…, n and $𝑌P ,𝑋P ,𝑋P * = (𝑌 ,1,𝑋 ) and estimate the regression equation 5 55 H5 5 52
𝑌P=𝛽𝑋P +𝛽𝑋P +𝑢8 2 352 5H2 𝑖
using data for i = 1,…, n. The regression error 𝑢82 is i.i.d. and distributed independently of 𝑋P2 thus the new regression model can be estimated directly by the OLS.
3. About missing data: Consider the regression model 𝑌 = 𝑋 𝛽 + 𝑢 , 𝑖 = 1, … , 𝑛, where all 222
variables are scalars and the constant term/intercept is omitted for convenience.
(a) Show that the least squares estimator of 𝛽 is unbiased and consistent under the usual
assumptions.
(b) Now suppose that some of the observations are missing. Let 𝐼2 denote a binary random
variable that indicates the non-missing observations; that is 𝐼2 = 1 if observation is not missing, and 𝐼2 = 0 if observation 𝑖 is missing. Assume that [𝐼2 , 𝑋2 , 𝑢2 ] for 𝑖 =
1,2, … , 𝑛 are iid random variable.
(i) Show that the OLS estimator can be written as
I;5I I;5I
𝛽&=UV𝐼𝑋𝑋.W UV𝐼𝑋𝑌W=𝛽+UV𝐼𝑋𝑋.W UV𝐼𝑋𝑢W 222 222 222 222
2L5 2L5 2L5 2L5
(ii) Suppose that the data are missing “completely at random” in the sense that
𝑃𝑟(𝐼2 = 1|𝑋2𝑢2) = 𝑝, where 𝑝 is a constant. Show that 𝛽& is unbiased and consistent.
(iii) Suppose that the probability that 𝑖\] observation is missing depends on 𝑋2 but not on 𝑢2; that is, 𝑃𝑟(𝐼2 = 1|𝑋2𝑢2) = 𝑝(𝑋2). Show that 𝛽& is unbiased and consistent.
(iv) Suppose that the probability that the 𝑖\] observation is missing depends on both 𝑋2 and 𝑢2; that is 𝑃𝑟(𝐼2 = 1|𝑋2𝑢2) = 𝑝(𝑋2𝑢2). Is 𝛽& is unbiased? Is 𝛽& is consistent? Explain.
(c) Suppose that 𝛽 = 1 and that 𝑋2 and 𝑢2 are mutually independent standard normal random variables [so that both 𝑋 and 𝑢 are distributed 𝑁(0,1)]. Suppose that 𝐼 = 1 when 𝑌 ≥ 0
2 2& & 2 2
but that 𝐼 = 0 when 𝑌 < 0. Is 𝛽 is unbiased? Is 𝛽 is consistent? Explain. 22
Solution:
(a) With no intercept 𝛽& formula is
∑ 𝑋 𝑌 ∑ 𝑋 𝑢 1 ∑ 𝑋2 𝑢2 𝛽&= 22=𝛽+ 22=𝛽+𝑛
∑𝑋H ∑𝑋H 1∑𝑋H 22𝑛2
Consistency follows by analyzing the averages in the expression:
1V𝑋𝑢 →a 𝐸(𝑋𝑢)=0, 1V𝑋H →a 𝐸(𝑋H)>0 𝑛2222𝑛22
and Slutsky’s theorem.
To show unbiasedness, note that
𝐸 d∑ 𝑋2𝑢2e = 𝐸 U𝐸 d∑ 𝑋2𝑢2 |𝑋5, … , 𝑋IeW
∑𝑋H ∑𝑋H 22
= 𝐸d∑𝑋2𝐸(𝑢2|𝑋5,…,𝑋I)e
∑𝑋H 2
= 𝐸 d∑ 𝑋2𝐸(𝑢2|𝑋2)e ∑𝑋H
2 =0
(b) (i) the result follows from the expression given in (a) and the definition of 𝐼2 (ii)-(iv)
As in (a), consistency follows by analyzing the sample averages in
Specifically, we have
1V𝐼𝑋𝑢 →a 𝐸(𝐼𝑋𝑢), 1V𝐼𝑋H →a 𝐸(𝐼𝑋H) 𝑛222 222 𝑛22 22
In parts (ii)-(iv), 𝐸(𝐼 𝑋H) > 0 22
In part (ii) 𝐸( 𝐼2𝑋2𝑢2) = 𝐸$𝐸(𝐼2𝑋2𝑢2|𝑋2𝑢2)* = 𝑝𝐸(𝑋2𝑢2) = 𝑝𝐸$𝐸(𝑋2𝑢2|𝑋2)* = 0
In part (iii) 𝐸( 𝐼2𝑋2𝑢2) = 𝐸$𝐸(𝐼2𝑋2𝑢2|𝑋2𝑢2)* = 𝐸(𝑝(𝑋2)𝑋2𝑢2) = 𝐸$𝐸(𝑝(𝑋2)𝑋2𝑢2|𝑋2)* = 0
In part (iv) 𝐸( 𝐼2𝑋2𝑢2) = 𝐸$𝐸(𝐼2𝑋2𝑢2|𝑋2𝑢2)* = 𝐸(𝑝(𝑋2𝑢2)𝑋2𝑢2) = 𝐸$𝐸(𝑝(𝑋2𝑢2)𝑋2𝑢2|𝑋2)* ≠ 0
So 𝛽& is consistent in (ii) and in (iii), but not consistent in (iv)
The unbiasness analysis proceeds as in part (a), but it is necessary to condition on both Ii and 𝑋2 when carrying out the law of iterated expectations. A key result is that
𝐸( 𝑢2|𝐼2, 𝑋2) = 0 in (ii) and (iii), but not in (iv). To see that 𝐸( 𝑢2|𝐼2, 𝑋2) = 0 in (ii) and (iii), notice that the conditional distribution of 𝑢 given 𝑋, 𝐼 can be written as
𝑓(𝑢|𝑋, 𝐼) = 𝑓(𝑢, 𝐼|𝑋) = 𝑓(𝐼|𝑋, 𝑢)𝑓(𝑢|𝑋) = 𝑓(𝐼|𝑋)𝑓(𝑢|𝑋) = 𝑓(𝑢|𝑋) 𝑓(𝐼|𝑋) 𝑓(𝐼|𝑋) 𝑓(𝐼|𝑋)
where the first equality is the definition of conditional density, the second is Bayes rule, and the third equality follows because 𝑃𝑟(𝐼 = 1|𝑋, 𝑢) does not depend on 𝑢 in (ii) and (iii). This implies that 𝐸(𝑢2|𝑋2𝐼2) = 𝐸(𝑢2|𝑋2) = 0. Unbiasness then follows in (ii) and (iii) using an argument analogous that in part (a)
∑ 𝐼 𝑋 𝑌 ∑ 𝐼 𝑋 𝑢 1 ∑ 𝐼2 𝑋2 𝑢2 𝛽& = 2 2 2 = 𝛽 + 2 2 2 = 𝛽 + 𝑛
∑ 𝐼 𝑋H ∑ 𝐼 𝑋H 1 ∑ 𝐼 𝑋H 22 22 𝑛22
(c) In this example, 𝐼 = 1 if 𝑌 ≥ 0, or 𝑋 + 𝑢 ≥ 0, or 𝑢 ≥ −𝑋 . Notice that 222222
𝐸(𝐼2𝑋2𝑢2|𝑋2) = 𝑋2𝐸(𝑢2|𝑢2 ≥ −𝑋2). A calculation based on the normal distribution shows
that
𝐸(𝑢|𝑢 ≥ −𝑋) = 𝜙(−𝑋)
1 − Φ(−𝑋)
where Φ is the standard normal CDF and 𝜙 is the standard normal density. Because 𝐸(𝑢|𝑢 ≥ −𝑋) ≠ 0, the OLS estimator is biased and inconsistent.
4. (Practice Question for recitation, you do not need to hand this in) Take the model
𝑦 = 𝑋𝛽 + 𝜀 𝐸(𝜀|𝑋) = 0
𝐸(𝜀𝜀.|𝑋) = Ω
Assume for simplicity that Ω is known. Consider OLS and GLS estimators 𝛽& = (𝑋′𝑋);5𝑋′𝑦
and
(a) Compute the conditional covariance between 𝛽& and 𝛽P
Since 𝐸$𝛽&|𝑋* = 𝐸((𝑋.𝑋);5𝑋.𝑦|𝑋)
= 𝐸((𝑋.𝑋);5𝑋.(𝑋𝛽 + 𝜀)|𝑋)
= 𝛽 + 𝐸((𝑋.𝑋);5𝑋.𝜀|𝑋) =𝛽
and
Similarly we can show that
𝐸$𝛽P|𝑋* = 𝐸((𝑋′Ω;5𝑋);5𝑋′Ω;5𝑦|𝑋)
= 𝐸((𝑋′Ω;5𝑋);5𝑋′Ω;5(𝑋𝛽 + 𝜀)|𝑋)
= 𝛽 + 𝐸((𝑋′Ω;5𝑋);5𝑋′Ω;5𝜀|𝑋) =𝛽
We can also find the following expressions
𝛽& = (𝑋′𝑋);5𝑋.𝑦 = (𝑋.𝑋);5𝑋.(𝑋𝛽 + 𝜀) = 𝛽 + (𝑋.𝑋);5𝑋.𝜀 𝛽& − 𝛽 = (𝑋.𝑋);5𝑋.𝜀
and
𝛽P = (𝑋′Ω;5𝑋);5𝑋′Ω;5𝑦
𝛽P = (𝑋′Ω;5𝑋);5𝑋′Ω;5𝑦 = (𝑋′Ω;5𝑋);5𝑋′Ω;5(𝑋𝛽 + 𝜀)
𝛽P − 𝛽 = (𝑋.Ω;5𝑋);5𝑋.Ω;5𝜀
Then
𝐶𝑜𝑣$𝛽&, 𝛽P|𝑋* = 𝐸 l$𝛽& − 𝛽*$𝛽P − 𝛽*.|𝑋m
𝐸 l$𝛽& − 𝛽*$𝛽P − 𝛽*.|𝑋m = 𝐸$((𝑋.𝑋);5𝑋.𝜀)((𝑋.Ω;5𝑋);5𝑋.Ω;5𝜀 ).|𝑋*
= 𝐸((𝑋.𝑋);5𝑋.𝜀𝜀.Ω;5𝑋(𝑋.Ω;5𝑋);5|𝑋) = (𝑋.𝑋);5𝑋.𝐸(𝜀𝜀.|𝑋)Ω;5𝑋(𝑋.Ω;5𝑋);5
= (𝑋.𝑋);5𝑋.ΩΩ;5𝑋(𝑋.Ω;5𝑋);5 = (𝑋.Ω;5𝑋);5
(b) Find the conditional covariance matrix for 𝛽& − 𝛽P, that is 𝐸 -$𝛽& − 𝛽P*$𝛽& − 𝛽P*.n𝑋/
First note that
Then
𝛽& − 𝛽P = [(𝑋.𝑋);5𝑋. − (𝑋.Ω;5𝑋);5𝑋.Ω;5]𝜀
= 𝛽 + (𝑋.Ω;5𝑋);5𝑋.Ω;5𝜀
𝐶𝑜𝑣 l$𝛽& − 𝛽P*|𝑋m = 𝐸 -$𝛽& − 𝛽P*$𝛽& − 𝛽P*.n𝑋/
= 𝐸([(𝑋.𝑋);5𝑋. − (𝑋.Ω;5𝑋);5𝑋.Ω;5]𝜀𝜀.[(𝑋.𝑋);5𝑋. − (𝑋.Ω;5𝑋);5𝑋.Ω;5].|𝑋) = 𝐸([(𝑋.𝑋);5𝑋. − (𝑋.Ω;5𝑋);5𝑋.Ω;5]𝜀𝜀.[𝑋(𝑋.𝑋);5 − Ω;5𝑋(𝑋.Ω;5𝑋);5]|𝑋) = [(𝑋.𝑋);5𝑋. − (𝑋.Ω;5𝑋);5𝑋.Ω;5]𝐸(𝜀𝜀.|𝑋)[𝑋(𝑋.𝑋);5 − Ω;5𝑋(𝑋.Ω;5𝑋);5] = [(𝑋.𝑋);5𝑋. − (𝑋.Ω;5𝑋);5𝑋.Ω;5]Ω[𝑋(𝑋.𝑋);5 − Ω;5𝑋(𝑋.Ω;5𝑋);5]
= [(𝑋.𝑋);5𝑋.Ω − (𝑋.Ω;5𝑋);5𝑋.][𝑋(𝑋.𝑋);5 − Ω;5𝑋(𝑋.Ω;5𝑋);5] = (𝑋.𝑋);5𝑋.Ω𝑋(𝑋.𝑋);5 − (𝑋.𝑋);5𝑋.ΩΩ;5𝑋(𝑋.Ω;5𝑋);5
−(𝑋.Ω;5𝑋);5𝑋.𝑋(𝑋.𝑋);5 + (𝑋.Ω;5𝑋);5𝑋.Ω;5𝑋(𝑋.Ω;5𝑋);5 = (𝑋.𝑋);5𝑋.Ω𝑋(𝑋.𝑋);5 − (𝑋.Ω;5𝑋);5 − (𝑋.Ω;5𝑋);5 + (𝑋.Ω;5𝑋);5
= (𝑋.𝑋);5𝑋.Ω𝑋(𝑋.𝑋);5 − (𝑋.Ω;5𝑋);5
5. (Attenuation) There are no known finite sample results for measurement error; all results are asymptotic. In this problem we will use a few simple asymptotic results for the classical regression model. The simplest case to analyzes that of a regression model with a single regressor and no constant term. Although it is unrealistic, it illustrates essential concepts Assume that the model,
𝑦∗ = 𝛽𝑥∗ + 𝜀 (1)
conforms to all the assumptions of the classical normal regression model. If data on 𝑦∗and 𝑥∗ are available then 𝛽 can be estimated by least squares. Suppose 𝑦∗ is 𝐿𝑛(𝑜𝑢𝑡𝑝𝑢𝑡⁄𝑙𝑎𝑏𝑜𝑟) and 𝑥∗ is 𝐿𝑛(𝑐𝑎𝑝𝑖𝑡𝑎𝑙⁄𝑙𝑎𝑏𝑜𝑟). Neither factor input can be measured with precision, so the observed 𝑦 and 𝑥 contain errors of measurement. We assume that
𝑦 = 𝑦∗ + 𝑣 𝑣~𝑁[0, 𝜎H] (2) x
𝑥 = 𝑥∗ + 𝑢 𝑢~𝑁[0, 𝜎H] (3) y
Assume as well that u and v are independent of each other.
(a) As a first step assume that only 𝑦∗ is measured with error, but 𝑥∗ is correctly measured. Does the result still conform to the assumption of the classical regression model? Show your work.
(b) Now assume both 𝑦∗ and 𝑥∗ contain measurement error. Show that least squares estimator of the slope parameter, 𝛽&z{|, will contain attenuation bias (the effect of biasing the coefficient towards zero is called attenuation)
(c) Prove that when only 𝑥 is measured with error, the squared correlation between 𝑦 and 𝑥 is less than that between 𝑦∗ and 𝑥∗(note the assumption 𝑦∗ = 𝑦 since it is measured with no error). Does the same hold true if 𝑦∗ is also measured with error?
Solution:
(a) Rearranging (2) as 𝑦∗ = 𝑦 − 𝑣 and substituting into (1), we get
𝑦 − 𝑣 = 𝛽𝑥∗ + 𝜀
𝑦 = 𝛽𝑥∗ + 𝜀 + 𝑣 = 𝛽𝑥∗ + 𝜀′
This result still conforms to the assumptions of classical regression models,
because, the new error term, 𝜀′, is sum of the original error term, 𝜀 and
𝑣~𝑁[0,𝜎H]. x
(b) Now consider the regression of 𝑦 on the observed 𝑥. Rearranging (3) as 𝑥∗ = 𝑥 − 𝑢 substituting into (1), we get
𝑦 = 𝛽(𝑥 − 𝑢 ) + 𝜀′ = 𝛽𝑥 + 𝜀′ − 𝛽𝑢 = 𝛽𝑥 + 𝑤 The regressor 𝑥 is correlated with the new error term, 𝑤 because
𝐶𝑜𝑣[𝑥, 𝑤] = 𝐶𝑜𝑣[(𝑥∗ + 𝑢), ( 𝜀′ − 𝛽𝑢)] = −𝛽𝜎H y
This result violates one of the central assumptions of the classical model, so we expect the least squares estimator, 𝛽&z{|, to be inconsistent.
(c) A logical solution to this one is simple. For 𝑦 and 𝑥∗,
𝐶𝑜𝑣H(𝑦,𝑥∗) 𝛽H$𝜎H∗*H 𝐶𝑜𝑟𝑟H(𝑦,𝑥∗) = =
𝐶𝑜𝑟𝑟H(𝑦, 𝑥) =
[(𝑉𝑎𝑟(𝑦))(𝑉𝑎𝑟(𝑥∗))] [(𝛽H(𝜎H∗)H +𝜎H)(𝜎H∗)] Ä
𝐶𝑜𝑣H(𝑦, 𝑥) = 𝐶𝑜𝑣H[(𝛽𝑥∗ + 𝜀), (𝑥∗ + 𝑢)] Å$𝑉𝑎𝑟(𝑦)*$𝑉𝑎𝑟(𝑥)*Ç Å$𝑉𝑎𝑟(𝑦)*$𝑉𝑎𝑟(𝑥)*Ç
= {𝐶𝑜𝑣[𝑦, 𝑥∗] + 𝐶𝑜𝑣[(𝛽𝑥∗ + 𝜀), 𝑢]}H Å$𝑉𝑎𝑟(𝑦)*$𝑉𝑎𝑟(𝑥)*Ç
The second term in the numerator is zero, since 𝑦 = 𝛽𝑥∗ + 𝜀 which is uncorrelated with u. Thus,
𝐶𝑜𝑟𝑟H(𝑦, 𝑥) = 𝐶𝑜𝑣H(𝑦, 𝑥) = 𝐶𝑜𝑣[𝑦, 𝑥∗] Å$𝑉𝑎𝑟(𝑦)*$𝑉𝑎𝑟(𝑥)*Ç Å$𝑉𝑎𝑟(𝑦)*$𝑉𝑎𝑟(𝑥)*Ç
The numerator is the same. The denominator is larger, since
$𝑉𝑎𝑟(𝑦)*$𝑉𝑎𝑟(𝑥)* = 𝑉𝑎𝑟(𝑦)(𝑉𝑎𝑟(𝑥∗) + 𝑉𝑎𝑟(𝑢))
so the squared correlation must be smaller.
If both variables are measured with errors, then we are comparing
to
𝐶𝑜𝑣H(𝑦∗, 𝑥∗) [(𝑉𝑎𝑟(𝑦∗))(𝑉𝑎𝑟(𝑥∗))]
𝐶𝑜𝑣H(𝑦, 𝑥) [(𝑉𝑎𝑟(𝑦))(𝑉𝑎𝑟(𝑥))]
The numerator is the covariance of (𝛽𝑥∗ + 𝜀′) with (𝑥∗ + 𝑢), so the numerator of the fraction is still 𝛽H$𝜎H∗*H. The denominator is still obviously larger, so the
same result holds when both variables are measured with error.
6. In 2005, 140 primary schools in Kenya received funding to hire an extra first grade teacher to reduce class sizes. In half of the schools (selected randomly), students were assigned to classrooms based on an initial test score (“tracking”); in the remaining schools the students were randomly assigned to classrooms. For their analysis, the authors restricted attention to the 121 schools which initially had a single first-grade class.
The key regression1 in the paper is
𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒2â = 𝛼 + 𝛾𝑇𝑟𝑎𝑘𝑖𝑛𝑔â + 𝑒2â (1)
where 𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒2â is the standardized test score (normalized to have mean 0 and variance
1) of student 𝑖 in school 𝑔, and 𝑇𝑟𝑎𝑘𝑖𝑛𝑔â is a dummy equal to 1 if school 𝑔 was tracking. (a) Using DDK2011 data set, run the OLS regression and interpret estimated effect of
tracking on test scores
. reg testscore tracking, cluster(schoolid) Linear regression
Number of obs
F(1, 120)
Prob > F
R-squared
Root MSE
= 5,795
= 3.20
= 0.0763
= 0.0048
= .9977
(Std. Err. adjusted for 121 clusters in schoolid) ——————————————————————————
| Robust
testscore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————- tracking| .1380913 .0772362 1.79 0.076 -.0148311 .2910136 _cons | -.0710354 .0543934 -1.31 0.194 -.1787304 .0366597 ——————————————————————————
The OLS estimates indicate that schools which tracked the students had an overall increase in test scores by about 0.14 standard deviations, which is quite meaningful.
(b) Now, estimate a more general version of the above regression 𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒 = 𝛼 + 𝛾𝑇𝑟𝑎𝑘𝑖𝑛𝑔 + 𝑥. + 𝑒 (2)
2â â 2â 2â
where 𝑥. is a set of control variables specific to the student, age, sex and initial test 2â
score. How does your answer to part (a) change. What is your concern about regression in (2)?
. reg testscore tracking agetest girl realpercentile , cluster(schoolid)
Linear regression
Number of obs F(4, 110) Prob>F R-squared Root MSE
= 5,269
= 137.65
= 0.0000
= 0.2412
= .86607
(Std. Err. adjusted for 111 clusters in schoolid) ——————————————————————————–
1 Duflo, Esther, Pascaline Dupas, and Michael Kremer (2011): “Peer effects, teacher incentives, and the impact of tracking: Evidence from a randomized evaluation in Kenya,” American Economic Review, 101, 1739-1774.
| Robust
testscore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
—————+—————————————————————- tracking| .1679085 .076352 2.20 0.030 .0165969 .3192202 agetest | -.0400188 .0133673 -2.99 0.003 -.0665097 -.0135279 girl| .0809526 .0285248 2.84 0.005 .0244232 .137482 realpercentile| .0171741 .000746 23.02 0.000 .0156956 .0186525 _cons | -.6442321 .1299825 -4.96 0.000 -.9018269 -.3866373 ——————————————————————————–
A difficulty with applying the classical regression framework is that student achievement is likely to be correlated within a given school. Student achievement may be affected by local demographics, individual teachers, and classmates, all of which imply dependence. These concerns, however, do not suggest that achievement will be correlated across schools, so it seems reasonable to model achievement across schools as mutually independent.
(c) Now extend the empirical analysis reported above. Do a regression of standardized test score (totalscore normalized to have zero mean and variance 1) on tracking, age, sex, being assigned to the contract teacher, and student’s percentile in the initial distribution. (hint: the Stata command is egen testscore = std(totalscore) and note that the sample size will be smaller as some observations have missing variables.) Calculate standard errors using both the conventional robust formula, and clustering based on the school.
. egen testscore = std(totalscore)
. * Regression with standard errors clustered at the school level:
. reg testscore tracking agetest girl etpteacher realpercentile, cluster(schoolid)
Linear regression
Number of obs F(5, 110) Prob>F R-squared Root MSE
= 5,269
= 123.24
= 0.0000
= 0.2493
= .86148
(Std. Err. adjusted for 111 clusters ——————————————————————————–
| Robust
testscore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
—————+—————————————————————- tracking| .1724935 .0762008 2.26 0.026 .0214815 .3235055 agetest | -.0407137 .0133107 -3.06 0.003 -.0670924 -.0143351 girl| .0812796 .0285141 2.85 0.005 .0247714 .1377878 etpteacher| .179798 .037461 4.80 0.000 .105559 .254037 realpercentile| .0173177 .0007209 24.02 0.000 .0158891 .0187463 _cons | -.7383267 .1296685 -5.69 0.000 -.9952992 -.4813541 ——————————————————————————–
. * Regression with robust standard errors:
. reg testscore tracking agetest girl etpteacher realpercentile, robust
Linear regression Number of obs F(5, 5263)
Prob>F R-squared Root MSE
= 5,269
= 361.77
= 0.0000
= 0.2493
= .86148
in schoolid)
——————————————————————————– | Robust
testscore | Coef. Std. Err. t P>|t| [95% Conf. Interval] —————+—————————————————————- tracking| .1724935 .0240226 7.18 0.000 .1253992 .2195878 agetest | -.0407137 .0084907 -4.80 0.000 -.0573591 -.0240684 girl| .0812796 .024089 3.37 0.001 .0340551 .1285041 etpteacher| .179798 .0237056 7.58 0.000 .1333252 .2262709 realpercentile| .0173177 .0004245 40.80 0.000 .0164855 .0181499 _cons | -.7383267 .0809566 -9.12 0.000 -.8970353 -.5796181 ——————————————————————————–
We see that cluster-robust standard errors for the coefficient of tracking are about three times higher than conventional robust standard errors. Only standard errors in front of sex are about the same using both methods. Difference between standard errors of age and student’s percentile in the initial distribution coefficients is somewhat higher. It allows to conclude that degree of correlation between test score and sex/age/ student’s percentile in the initial distribution is about the same in each school. On the other hand, there is significant difference in correlation between test score and being assigned to the contract teacher across schools.