R语言代写 Econ 419: Homework 2

Due 10/2 at 2:05 p.m.

Econ 419: Homework 2 Fall 2018 Michael P. Leung

1. (50 pts) Suppose the true outcome model is Y “ Xβ ` ε. However, we only observe a mismeasured version of X , denoted X ̃ “ X ` ν , and regress Y on X ̃ . Assume ν K ε K X and Erεs “ 0.

(a) Let βˆ be our OLS estimator. Show that
ˆ ` 1 ̆ ́1 1 ̃1 ̃ ́1 ̃1

β“β ́ pX`νqpX`νq pX`νqνβ`pXXq Xε.
(b) Let X1 be the first row of X and ν1 the first row of ν. Using part (a), show

that

ˆp` 1 1 ̆ ́1 1
β ÝÑ ErX1X1s ` Erν1ν1s ErX1X1sβ.

We can write the right side of the previous equation as Mβ. (This shows that the limit is a weighted sum of the unknown coefficients. Hence in the general multivariate case, it is generally impossible to sign omitted variable bias because it requires knowing β and M.)

` 1 ̆ ́1 1
(c) LetW “ pX`νqpX`νq pX`νqX. Usingpart(a),showthat

? ˆ ` ́1 1 ̆ ́1 ́1{2 1 npβ ́Wβq“ n pX`νqpX`νq n pX`νqε.

(d) Let σ2 “ Varpε1q. Using part (c), show that ?ˆd ́` ̆ ́1 ̄

npβ ́WβqÝÑN 0,σ2 ErX1X1s`Erν1ν1s .
(Hint: just follow the same steps as the derivation of the normal limit for

the OLS estimator.)

2. (50 pts) In this empirical exercise, we will learn how to evaluate the robustness of linear regression by assessing the degree of overlap and computing matching estimators.

We will use data from the National Supported Work Demonstration (NSW) and Current Population Survey (CPS) to look at the effect of a job training program on earnings. Load nsw.dta and use describe to see what the variables represent. The observations for which experimental equals one come from NSW data, and those for which it equals zero come from CPS data. NSW data comes from an experiment in which treated individuals were assigned to a job training program. Treatment took place in 1975. The variables re74, re75,

1

and re78 are measures of earnings in years 1974, 1975, and 1978 respectively, where re75 is measured prior to treatment assignment.

To complete this exercise, you will need to install several STATA programs. First input into STATA

net from http://fmwww.bc.edu/RePEc/bocode/i.
Scroll down and click isvar, and a new page will pop up. Click “(click here to

install)” to install the package. Next input
net from http://personalpages.manchester.ac.uk/staff/mark.lunt. Install the package propensity.

(a) We will first analyze the experimental sample. This means all your commands for this question must be restricted to observations only in this sample. To do this, use the if option, e.g. summarize y if x > 5.

  1. Regress earnings on treatment and all available controls, including pre-treatment outcomes. Comment on the results.
  2. The propensity score is PpD “ 1 | X q, where D is the treatment indica- tor and X the vector of controls. This measures assignment/selection into treatment on observables for each subpopulation X.
    A common way to detect differences between the treatment and control subpopulations (other than balance tests) is to plot the density of the propensity score for both groups. The idea is that, in a randomized experiment, assignment/selection into treatment is the same regardless of the subpopulation X because D K X.

    1. To estimate the propensity score, we regress D on X. However, to ensure that the fitted values from this regression are between 0 and 1, we use a logistic regression instead of a linear regression. To do this, use the same syntax as OLS except replace reg with logit.
    2. For each observation i, given Xi, compute the predicted value of Yi from the regression. Store the predictions in a new variable.1
    3. Plot the estimated densities of the propensity scores for the treat- mentandcontrolpopulations.2 Remembertorestrictyouranalysis to the experimental sample! Comment on the resulting graph.

1Hint: use predict.

2Hint: to estimate the densities of variables y and x on the same graph, use graph tw kdensity y || kdensity x, legend(label(1 “y”) label(2 “x”)). Combine this with the if option to restrict your analysis to the right subpopulations.

2

(b) The previous exercises established our experimental benchmark results. Let’s see what happens when we use some observational data instead. We will ignore observations in the control group of the experimental sample and use in their place the observations in the CPS sample as control. Your analysis below should be restricted to this new “NSW-CPS” sample.3

  1. Repeat part (a). Are your results similar to the experimental esti- mates? Come up with an economic story for why or why not.
  2. To further probe differences between treatment and control, run bal- ance tests on pretreatment outcomes. Discuss your results.
  3. Based on our previous results, we might want to try a matching es- timator to better pair treated units to similar control units, with the hope that units with similar observables will also have similar unob- servables, thereby reducing selection bias. Match treated and control units with similar propensity scores (estimated in part (a)). You can do this with the command gmatch.4
  4. Using only the subset of matched units (remembering to still restrict your analysis to the NSW-CPS sample), estimate the effect of treat- ment on earnings with OLS, and test the null that it’s different from zero. How does the result compare to parts (a) and (b)i.?
  5. The quality of the matching estimator depends on the quality of our matches. To assess the latter, plot the distribution of propensity scores for treatment and control. Discuss the result.

3Hint: generate a new indicator variable analogous to experimental and use this variable along with the if option in future commands.

4Hint: For a treatment variable D and control X, the command gmatch D X if [..], set(varname) generates a new variable varname. This assigns a unique number to each treated unit and then a corresponding number to each control unit matched with the treated unit. Un- matched control units are given an empty label. Note that gmatch creates a variable diff which you can ignore.

3