BST 6100 Homework 2
BST 6100 Homework 2
Due February 14, 2017
For problems 1 and 2, use the dataset in Lalonde1.RData. This data comes from a randomized study of
the National Supported Work program. Subjects were randomized to obtain occupational training (treat=1)
or not (treat=0), and their earnings three years later were recorded as the outcome. The current data set
includes only the male study participants; see the end of this assignment for more detail.
1. Check the balance of all pre-treatment covariates. Report the within-group proportions or means with
standard deviations as well as the effect size (this may be most easily done in a Table 1-type table).
Note any substantial discrepancies.
2. Compute the difference in average 1978 income between the treatment and control groups. Use
randomization inference to compute a p-value for the sharp null hypothesis of no individual-level effect
and display a histogram of the randomization distribution.
For problems 3 through 6, use the data set in Lalonde2.RData. This data includes the treated subjects
from the first data set, along with an random sample of control subjects drawn from the Current Population
Survey. This combination breaks the randomized design of the study and yields observational data. The
initial goal of Lalonde’s project was to compare results from observational studies to those from randomized
studies.
3. Investigate the design of the observational data – compare the means of all variables across the treatment
groups. Are there any major differences?
4. Are there any positivity concerns with this data set?
5. Describe the exchangeability of the treatment groups.
6. Using the observational data set:
a. Compare the average 1978 income in the treatment and control groups.
b. Run a linear regression to predict 1978 income using treatment and all pre-treatment variables as
predictors. What is the difference in income uniquely attributable to treatment?
c. Compare your results from a and b to those from problem 4 above.
Note: The variables in the Lalonde data sets are listed below. All except RE78 were recorded pre-treatment.
• treat: Treatment (occupational training) indicator
• age: Age at time of entry into occupational training
• educ: Years of education at time of occupational training
• black: Indicator for Black race
• hisp: Indicator for Hispanic ethinicity
• married: Indicator for marriage status (married or not)
• nodegree: Indicator for no high school diploma
• RE75: Earnings in 1975 (start of training)
• RE78: Earnings in 1978 (follow up)
1