Problem Set 3
Problem Set 3
Behavioral Economics, Boston College
Your name here
Set-up and background
The assignment is worth 100 points. There are 27 questions. You should have the following packages installed:
library(tidyverse)
library(patchwork)
library(fixest)
In this problem set you will summarize the paper “Do Workers Work More if Wages Are High? Evidence from a Randomized Field Experiment” (Fehr and Goette, AER 2007) and recreate some of its findings.
1 Big picture
[Q1] What is the main question asked in this paper?
[your answer here]
[Q2] Recall the taxi cab studies where reference dependence is studied using observational data. What can an experimental study do that an observational study can’t?
[your answer here]
[Q3] Summarize the field experiment design.
[your answer here]
[Q4] Summarize the laboratory experiment design. Why was it included with the study?
[your answer here]
[Q5] Summarize the main results of the field experiment.
[your answer here]
[Q6] Summarize the main results of the laboratory experiment.
[your answer here]
[Q7] Why are these results valuable? What have we learned? Motivate your discussion with a real-world example.
[your answer here]
2 Replication
Use theme_classic() for all plots.
2.1 Correlations in revenues across firms
For this section please use dailycorrs.csv.
dailycorrs = read_csv(“../data/fehr_goette_2007/dailycorrs.csv”)
[Q8] The authors show that earnings at Veloblitz and Flash are correlated. Show this with a scatter plot with a regression line and no confidence interval. Title your axes and the plot appropriately. Do not print the plot but assign it to an object called p1.
# your code here
[Q9] Next plot the kernel density estimates of revenues for both companies. Overlay the distributions and make the densities transparent so they are easily seen. Title your axes and the plot appropriately. Do not print the plot but assign it to an object called p2.
# your code here
[Q11] Now combine both plots using library(patchwork) and label the plots with letters.
# your code here
2.2 Tables 2 and 3
For this section please use tables1to4.csv.
# your code here
2.2.1 Table 2
On page 307 the authors write:
“Table 2 controls for individual fixed effects by showing how, on average, the messengers’ revenues deviate from their person-specific mean revenues. Thus, a positive number here indicates a positive deviation from the person-specific mean; a negative number indicates a negative deviation.”
[Q12] Fixed effects are a way to control for heterogeneity across individuals that is time invariant. Why would we want to control for fixed effects? Give a reason how bike messengers could be different from each other, and how these differences might not vary over time.
[your written answer here]
[Q13] Create a variable called totrev_fe and add it to the dataframe. This requires you to “average out” each individual’s revenue for a block from their average revenue: \(x_i^{fe} = x_{it} – \bar{x}_i\) where \(x_i^{fe}\) is the fixed effect revenue for \(i\).
# your code here
[Q14] Use summarise() to recreate the findings in Table 2 for “Participating Messengers” using your new variable totrev_fe. (You do not have to calculate the differences in means.)
In addition to calculating the fixed-effect controled means, calculate the standard errors. Recall the standard error is \(\frac{s_{jt}}{\sqrt{n_{jt}}}\) where \(s_{jt}\) is the standard deviation for treatment \(j\) in block \(t\) and \(n_{jt}\) are the corresponding number of observations.
(Hint: use n() to count observations.) Each calculation should be named to a new variable. Assign the resulting dataframe to a new dataframe called df_avg_revenue.
# your code here
[Q15] Plot df_avg_revenue. Use points for the means and error bars for standard errors of the means.
To dodge the points and size them appropriately, use
geom_point(position=position_dodge(width=0.5), size=4)
To place the error bars use
geom_errorbar(aes(
x=block,
ymin = [MEAN] – [SE], ymax = [MEAN] + [SE]),
width = .1,
position=position_dodge(width=0.5))
You will need to replace [MEAN] with whatever you named your average revenues and [SE] with whatever you named your standard errors.
# your code here
[Q16] Interpret the plot.
[your written answer here]
2.2.2 Table 3
[Q17] Recreate the point estimates in Model (1) in Table 3 by hand (you don’t need to worry about the standard errors). Assign it to object m1. Recreating this model requires you to control for individual fixed effects and estimate the following equation where \(\text{H}\) is the variable high, \(\text{B2}\) is the second block (block == 2) and \(\text{B3}\) is the third block (block == 3):
\[
y_{ijt} – \bar{y}_{ij} = \beta_1 (\text{H}_{ijt} – \bar{\text{H}}_{ij}) + \beta_2 (\text{B2}_{ijt} – \bar{\text{B2}}_{ij}) + \beta_3 (\text{B3}_{ijt} – \bar{\text{B3}}_{ij}) + (\varepsilon_{ijt} – \bar{\varepsilon}_{ij})
\]
# your code here
[Q18] Now recreate the same point estimates using lm and assign it to object m2. You are estimating the model below where where \(\text{F}_i\) is the dummy variable for each messenger (fahrer). Make sure to cluster the standard errors at the messenger level. (Use lmtest and sandwhich for this.)
\[
y_{ijt} – \beta_0 + \beta_1 \text{H}_{ijt} + \beta_2 \text{B2}_{ijt} + \beta_3 \text{B3}_{ijt} + \sum_{i=1}^{n} \alpha_i \text{F}_i + \varepsilon_{ijt}
\]
# your code here
[Q20] Now use feols to recreate Model (1), including the standard errors. Assign your estimates to the object m3. You are estimating the model below where where \(\alpha_i\) is the individual intercept (i.e. the individual fixed effect):
\[
y_{ijt} = \alpha_i + \beta_1 \text{H}_{ijt} + \beta_2 \text{B2}_{ijt} + \beta_3 \text{B3}_{ijt} + \varepsilon_{ijt}
\]
# your code here
[Q21] Compare the estimates in m1, m2 and m3. What is the same? What is different? What would you say is the main advantage of using felm()?
[your written answer]
[Q22] Explain why you need to cluster the standard errors.
[your written answer]