EEP/IAS 118 – Problem Set 5¶
Due Friday, December 4 at 11:59PM.¶
Submit materials as one combined pdf on Gradescope. All work can be completed in this notebook. Make sure to run (shift + enter) all your answer cells before submission to make sure all your output is displayed.
Exercise 1. The Effect of Minimum Wage on Employment – Difference-in-Differences¶
Background¶
In this exercise, we are going to look at a classic paper in the labor economics literature (note that we are only giving you a subset of the data, so your results are not going to match the results in the paper). This paper answers a very important (and often controversial!) question in economic policy: does increasing the minimum wage increase unemployment (or conversely, reduce employment)? Proponents of minimum wage laws point to the benefits for individuals who remain employed in low wage jobs. Opponents of minimum wage laws argue that the increases in labor costs result in higher unemployment because employers hire less employees to offset the cost increases. Card and Krueger (1994) test this latter hypothesis using the minimum wage increase in New Jersey that went into effect in 1992. They surveyed fast food establishments in New Jersey and Pennsylvania both before and after the policy came into effect, collecting information on wages, employment and prices. While the interviews all occurred in 1992, the post-period interviews are coded as 1993 in your data for the sake of simplicity. This data is then used to obtain a difference-in-differences estimate of the effect of minimum wage laws on employment.
The dataset is saved as minwage_data.csv and contains the following variables:
Variable Name
Description
$store\_id $
Unique Store ID
$year $
Year
$state $
Dummy =1 if store is located in New Jersey, =0 for Pennsylvania
$shore $
Dummy =1 if store is located on New Jersey Shore, =0 otherwise
$empft $
Number of full-time employees in a store
$emppt $
Number of part-time employees in a store
$nmgrs $
Number of managers in a store
$wage\_st $
Starting wage in a store
$pentree $
Price of an entree
$fte $
Number of full-time equivalent employees in a store ($empft+0.5emppt+nmgrs$)
Question 1.¶
Generate a summary table with two columns and two rows. There should be two columns: one for New Jersey (Treatment column) and one for Pennsylvania (Control column) and two rows: one for the pre-period (year 1992), one for the post-period (years 1993). Within each cell, compute the mean number of full-time equivalent employees (the variable $fte$).¶
In [5]:
library(haven)
library(tidyverse)
minwage <- read.csv("minwage_data.csv")
head(minwage)
minwage92 = subset(minwage, year == 1992)
minwage93 = subset(minwage, year == 1993)
minwagewide = merge(minwage92, minwage93, by = c("state"), suffixes = c("92","93"))
head(minwagewide)
A data.frame: 6 × 10
store_id
year
state
shore
empft
emppt
nmgrs
wage_st
pentree
fte
1
449
1992
0
0
0
25
3
4.87
0.00
15.5
2
462
1992
0
0
0
0
0
0.00
0.00
0.0
3
443
1992
0
0
4
19
4
4.25
1.80
17.5
4
496
1992
0
0
10
15
3
5.00
0.73
20.5
5
46
1992
0
0
30
15
3
0.00
0.52
40.5
6
488
1992
0
0
0
0
0
0.00
0.00
0.0
A data.frame: 6 × 19
state
store_id92
year92
shore92
empft92
emppt92
nmgrs92
wage_st92
pentree92
fte92
store_id93
year93
shore93
empft93
emppt93
nmgrs93
wage_st93
pentree93
fte93
1
0
449
1992
0
0
25
3
4.87
0
15.5
449
1993
0
15.0
15
5
5.00
0.73
27.5
2
0
449
1992
0
0
25
3
4.87
0
15.5
462
1993
0
1.0
25
4
4.75
0.89
17.5
3
0
449
1992
0
0
25
3
4.87
0
15.5
443
1993
0
5.0
20
4
4.35
1.81
19.0
4
0
449
1992
0
0
25
3
4.87
0
15.5
496
1993
0
10.0
10
3
4.75
0.89
18.0
5
0
449
1992
0
0
25
3
4.87
0
15.5
46
1993
0
3.5
35
3
4.30
0.94
24.0
6
0
449
1992
0
0
25
3
4.87
0
15.5
488
1993
0
5.0
15
3
4.75
0.94
15.5
Add your written answer for Question 1 here
Question 2.¶
State the difference-in-differences estimator for the change in full-time equivalent employees in terms of the following quantities $\bar Y_{NJ, pre}, \bar Y_{NJ, post}, \bar Y_{Penn, pre}, \bar Y_{Penn, post}$, where $\bar Y$ refers to the mean of $fte$. Using the means reported in part 1, calculate a value for the estimator you just proposed.¶
In [20]:
#Input your code for Question 2 here
Add your written answer for Question 2 here
Question 3.¶
Let’s proceed with estimating the difference-in-difference esstimation via a regression:¶
(a) Write an equation that will give you the difference-in-differences estimator for the impact of the minimum wage increase on full time equivalent employees.¶
Add your written answer for Question 3a here
(b) Give the economic interpretation for each of the coefficients you wrote in part (a)¶
Add your written answer for Question 3b here
(c) Perform the estimation.¶
In [21]:
# Input your code for Question 3c here
(d) What do you conclude from the results of your estimation? Confirm that the results in this part are approximately the same as those in Question 2.¶
Add your written answer for Question 3d here
Question 4.¶
In this question, we will explore the identifying assumptions for the difference-in-differences estimator.¶
(a) What key assumption do you need to make for your regression in part 3 to estimate the causal effect of minimum wage laws?¶
Add your written answer for Question 4a here.
(b) What additional data might you need to provide evidence for this assumption?¶
Add your written answer for Question 4b here.
Question 5.¶
Let’s say that we wanted to estimate the effect of minimum wage laws on full-time equivalent employment ($fte$), but we only had data from New Jersey. Using a simple difference in means, estimate and interpret the effect of minimum wage laws on full-time employment in New Jersey. Make sure to test for significance.¶
In [22]:
#Input your code for Question 5 here
Add your written answer for Question 5 here
Question 6.¶
In no more than 3 sentences, compare your conclusion in Question 4 to your conclusion in Question 5. If you draw different conclusions from the results of Questions 4 and 5, what might explain this difference (i.e., why is one estimator preferable)?¶
Add your written answer for Question 6 here
Exercise 2: Schoolbus Replacements and Attendance – Panel Regression¶
Background¶
In this problem, we will look at the effect of replacing highly-polluting school buses on students’ health. Many school districts in California, particularly less wealthy school districts, have school buses that are many decades old. These buses do not have many of the pollution controls that are now standard in vehicles, exposing the students that ride them to high concentrations of pollutants. In 2006, the state of California passed a proposition that allocated funds towards replacing the oldest of these school buses with new models that had adequate pollution controls. We have data for these replacements for the years 2009-2012, with the number of replacements per year more or less increasing over the sample period. This data is combined with attendance data from all school districts in California over the same period to test the impact of reducing pollution exposure through bus replacements on student health. Attendance is used to measure student health because students who are chronically ill are often absent from school. The full dataset is described in detail below.
The dataset Schoolbuses_PS5.dta is an unbalanced panel of 209 school districts for the years 2009-2012, and contains the following variables:
Variable Name
Description
$district\_code $
Unique School District Identifier
$year $
Year
$bus\_replace $
Number of Buses Replaced
$attendance $
Percent of students in attendance
$gifted $
Numberof students in the Gifted Student Program
$white $
Number of White Students
$college $
Number of Students with Parents that Attended College
$advtgd $
Number of Students from Higher Socio-Economic Backgrounds
$fleet\_size $
Number of Buses in the District Fleet
$pupils\_trans $
Average Number of Students Traveling per Day
$enrollment$
Number of Students Enrolled in the District
Some summary statistics are provided below
In [23]:
schooldata <- read_dta("Schoolbuses_PS5.dta")
head(schooldata)
# Summary Stats
busrep <- summarize(schooldata, mean = mean(bus_replace),
sd= sd(bus_replace),
min= min(bus_replace),
max = max(bus_replace))
enroll <- summarize(schooldata, mean = mean(enrollment),
sd = sd(enrollment),
min = min(enrollment),
max = max(enrollment))
fleet <- summarize(schooldata, mean = mean(fleet_size, na.rm = TRUE),
sd = sd(fleet_size, na.rm = TRUE),
min = min(fleet_size, na.rm = TRUE),
max = max(fleet_size, na.rm = TRUE))
ss <- rbind(busrep, enroll, fleet)
sumstats <- cbind(c("Bus Replacements", "Enrollment", "Fleet Size"), ss)
names(sumstats)[1] <- "Variable"
print('Sum Stats')
sumstats
A tibble: 6 × 14
district_code
year
bus_replace
attendance
gifted
white
college
advtgd
fleet_size
pupils_trans
enrollment
Latitude
Longitude
rand
5271548
2012
4.907253
96.82945
0
93
0
54
4
316.6
387
40.06913
-122.1892
0.001363096
561556
2009
0.000000
91.62920
121
509
126
442
6
238.3
863
38.07851
-120.5525
0.003052204
1363115
2011
0.000000
95.46870
236
177
390
984
15
437.5
4009
32.78138
-115.5507
0.003440836
1463297
2012
11.750881
105.28787
1
19
2
30
4
20.0
66
36.80253
-118.1967
0.005196966
4770359
2010
0.000000
103.56757
11
19
0
1
1
24.0
37
41.91049
-122.5622
0.005227630
961945
2011
2.060723
93.98731
0
244
57
163
4
204.0
394
38.60527
-120.7119
0.007334597
[1] “Sum Stats”
A data.frame: 3 × 5
Variable
mean
sd
min
max
Bus Replacements
2.524550
3.476494
0
12.92825
Enrollment
964.702000
1505.856060
7
12931.00000
Fleet Size
5.073948
5.039842
0
30.00000
Question 1.¶
You think that it might be important to control for the year in your regression of attendance on bus replacements. First, generate year dummy variables $(yr_{2009}, yr_{2010}, yr_{2011}, yr_{2012})$. Next, estimate the following equation for school attendance.¶
\begin{align*} attendance_{it} = \beta_0+ \beta_1 bus\_replace_{it} + \beta_2white_{it} &+ \beta_3college_{it} + \beta_4advtgd_{it} + \beta_5gifted_{it} \ \ \ \ \ \ (1) \\ &+ \delta_1yr_{2010} + \delta_2yr_{2011} + \delta_3yr_{2012} + u_{it} \end{align*}
(a) Estimate the model and report your results.¶
In [24]:
#Input your code for Question 1 here
Add any written answer for Question 1a here
(b) Give the meaning (economic interpretation) of $\beta_0$ and $\delta_{1}$¶
Add your written answer for Question 1b here
(c) Interpret (SSS) your estimate for $\beta_1$.¶
Add your written answer for Question 1c here
(d) Why is the year 2009 dummy excluded?¶
Add your written answer for Question 1d here
Question 2.¶
Consider now the following (unobserved) fixed effects model:¶
\begin{align*} attendance_{it} =\beta_0+ \beta_1 bus\_replace_{it} + \beta_2white_{it} + \beta_3college_{it} + \beta_4advtgd_{it}+ \beta_5gifted_{it} + \boldsymbol{\delta}_t+\mathbf{a_i} +u_{it} \ \ \ (2) \end{align*}
(a) What is the interpretation of the vector of fixed effects terms $\mathbf{a_{i}}$?¶
Add your written answer for Question 2a here
(b) Why are we adding these fixed effects, as opposed to estimating model (2)? In other words, what do these fixed effects control for in the regression?¶
Add your written answer for Question 2b here
(c) Estimate the model and interpret $\hat \beta_1$ (remember SSS)¶
In [25]:
# Input your code for Question 2c here
Add your written answer for Question 2c here
(d) Comment specifically on how the size of $\hat \beta_1$ changes from model (1) to model (2) and explain why you might have expected it to go in this direction.¶
Add your written answer for Question 2d here
Question 3.¶
What are the assumptions necessary for the parameters of model (2) to be unbiased? Do you think they are likely to hold? Whatever position you take, give your argument.¶
Add your written answer for Question 3 here
Downloading your Notebook¶
Download a PDF copy of your notebook by using __File > Print Preview.
Then print and save as PDF.
In [ ]: