ECON 322: Econometric Analysis 1 Final data project: Winter 2018
General instructions
This last assignment is due on Wednesday April 4 before 11:30pm on Learn. It is a small research project and you will be evaluated on your ability to correctly use the different concepts covered during the term. It is approximately worth 3 assignments (8.5 points out of 25). The drop box will not close, so you will be allowed to submit the project late. The rule for late submissions is as follows: if you are between 0.01 and 60 minutes late you get a 2/10 penalty, between 60.01 and 120 minutes late you get 5/10 penalty, and you get a 10/10 penalty if you submit the project more than two hours late. To avoid penalties, do not wait at the last minute to upload it. Notice that there is no justification for not submitting this assignment (I remind you that doctor notes for being sick around the due date is not accepted for assignments).
For the final assignment, I want it to be organized like a report. I want the codes and R output along with the comments and discussion in the same pdf file. If you upload your document in any other format (.doc, .docx, …), I will not mark it. If you want to see what I expect from you, download the document FinalProjectSolution W17.pdf that I uploaded in the folder “Assignment and Exam Solutions”. It is the suggested solution from last year project. If you only put the codes and output with no discussion (one sentence it not considered a discussion), you get 0 out of 10 points. To obtain the full mark, you need to justify what you are doing (choice of the model, tests, etc.) analyze the results (interpretation of the coefficients, discussion about the validity of your results, etc.), and show me that you know how to use the different concepts used in class. The more concepts you use the higher will be your mark.
The project
The goal of this project is to test whether wearing seat belts has an impact on the number of traffic fatalities. The data file seatbelts.rda for this project is in the Final Project folder of Learn. It was used by A. Cohen and L. Einav in 2003 in their research paper “The Effects of Mandatory Seat Belt Laws on Driving Behavior and Traffic Fatalities”.
The data set is a panel of 51 states (actually, 50 states plus the District of Columbia), running from 1983 to 1997. The variables are:
• year: indicating year.
• state: indicating US state (abbreviation).
• fatalities: number of fatalities per million of traffic miles.
• seatbelt: seat belt usage rate, as self-reported by state population surveyed. • speed65: Is there a 65 mile per hour speed limit?
• speed70: Is there a 70 (or higher) mile per hour speed limit?
• drinkage: Is there a minimum drinking age of 21 years?
• alcohol: Is there a maximum of 0.08 blood alcohol content?
Econ 322 Final Project Page 1 of 5
- income: median per capita income (in current US dollar).
- age: mean age.
- enforce: indicating seat belt law enforcement (“no”, “primary”, “secondary”). The definition of the different enforcement levels is given on the Governors Highway Safety website. Basically, primary enforcement means that officers can issue a ticket for not wearing a seat belt even if there is no other traffic infraction. For secondary enforcement, there must be another traffic infraction before officers can issue a ticket for not wearing a seat belt.
Part I
In the first part of the project, we want to estimate a model year by year. It is not the best way when we have a panel because it is more efficient to use all the data in a single model, but since we have not covered how to estimate panel data models, it is a good way to start.
As you can see, the proportion of states that adopted a seat belt law went from 0% in 1983 to 100% in 1997. Not all States, however, chose the same level of enforcement.
law <- matrix(data$enforce, nrow=15) ## create a matrix num-years x num-states no <- rowSums(law==”no”)
prim <- rowSums(law==”primary”)
sec <- rowSums(law==”secondary”)
ylim <- range(c(no, sec, prim)) plot(1983:1997,no, xlab="year", ylab="states", type="l", col=1,ylim=ylim,
main="number of states with seat belt laws") lines(1983:1997,sec,type="l",col=2) lines(1983:1997,prim,type="l",col=3) legend("topright", c("no", "secondary", "primary"), col=1:3, lty=1)
number of states with seat belt laws
no secondary primary
1984 1988 1992 1996
year
Econ 322
Final Project
Page 2 of 5
states
0 10 30 50
It may therefore be difficult to base the model selection on the first few years because the number of states with a seat belt law was too small. Therefore, use the year 1987 only to select your model. We want to compare the effect of the law on the number of fatalities. For now, just distinguish States with and without seat belt law. For that, create a dummy variable equals to 1 if there is a seat belt law and 0 otherwise:
data$law <- as.numeric(data$enforce != "no")
The dependent variable is fatalities and we want to measure the effect of law on it by controlling for the right variables and by using the appropriate functional form. The selection process should include the following (no necessarily in that order).
- Discussion on which variable should be included and why.
- Discussion on how each variable should enter the model (in log, with interactions, squared, etc.).
It may not be obvious for all variables, but try your best.
- Estimate the model (or models if you have more than one in mind)
- Test for correct specification (Chapter 9), homoscedasticity (Chapter 8).
- Any other things to look for before going to the interpretation part?
- Interpret the result and discuss the possible weakness of the model.
Once your model is selected, estimate the effect of the law on fatalities, for all years. To present the results, produce one graph on which the estimated effect of law and its confidence interval are presented in a time series format. If your model has interactions between law and other variables, you need to compute the average partial effect of law for each year and its confidence interval. Discuss the results.
Hint: Here is an example of how to do it for the simplest possible model:
form <- fatalities~law res <-vector() for (y in 1983:1997)
{
}
matplot(1983:1997, res, lty=c(2,1,2), col=c(2,1,2), lwd=2, type=”l”, xlab=”year”, ylab=expression(beta[1]),
main=”Effect of Seat Belt law on traffic\n fatalities”)
abline(h=0)
reg <- lm(form, subset(data,year==y)) conf <- confint(reg, 2) ans <- c(conf[1], coef(reg)[2], conf[2]) res <- rbind(res, ans)
Econ 322 Final Project Page 3 of 5
Effect of Seat Belt law on traffic fatalities
Part II
1984 1988
1992 1996
year
We have not learned how to estimate models with panel data, but we will ignore it in this part and do as if it was cross-sectional data. You should have realized in Part I that the sample size may be too small to identify the effect of the law on fatalities (it is not too late to add that to your previous discussion). One benefit from using panel data is the sample size. Since we have 51 states and 15 years, the sample size is equal to 765 when all years are used. There are, however, issues to take into consideration.
The main problem with panel data is that the year and state dimension may hide unobserved heterogeneity that are relevant to the analysis. If we do not control for these unobserved characteristics, we may obtain biased estimators. We can control for unobserved year and state heterogeneity by controlling for year and state indicators (or dummies). In R, it simply means that we have to add year and state in the regression. Dummy variables for years and states will automatically be created. Adding such dummy variables is called “adding year and state fixed effects” to the model. Notice that a model that incorporates these fixed effects will have 64 more coefficients (can you guess why it is not 66?). However, we are not interested by their values, so we do not print them in the final report. We only print the important coefficients, and add a comment that says that year and/or state fixed effects are included. Another issue with panel data is the computation of the coefficient standard errors (covered in Chapter 8) and testing. It is very likely that you will need to compute robust standard errors and perform robust tests. Use the same model you selected in Part I, and estimate it using all observations. Compare the effect of the seat belt law on the number of traffic fatalities when (i) no year nor state fixed effects are included, (ii) only year fixed effect is included, and (iii) both year and state fixed effects are included. You can test if the model is correctly specified and test for heteroscedasticity once more, as the conclusion may differ when all years are used, but do not change your model (for simplicity, but if you want to try other things, go ahead, it is your project). Present and interpret the results. Which of the three models do you trust the most and why? Conclude with a discussion on the main finding of your study. Do you think it is a valid result? Can you think of a way to improve
Econ 322 Final Project Page 4 of 5
β1 −0.010 0.005
your model?
Hint: Here is how you print your results without the year and state fixed effect, using stargazer:
res <- lm(fatalities~law+year+state, data=data) stargazer(res, type="text", omit=c("year","state"), digits=5)
## ## ===============================================
## ## ## ## -----------------------------------------------
## law ## ## ## Constant ##
-0.00059* (0.00031)
0.03181*** (0.00066)
Dependent variable: ---------------------------
## ## -----------------------------------------------
## Observations ## R2 ## Adjusted R2 ## Residual Std. Error ## F Statistic
765 0.87794 0.86659
0.00225 (df = 699) 77.35217*** (df = 65; 699)
## =============================================== ## Note: *p<0.1; **p<0.05; ***p<0.01
fatalities
Part III
The main question is whether wearing a seat belt makes drivers feel safer and, as a result, be more careless. Just using a law dummy variable could therefore be the wrong variable to use. For this part, consider the model of Part II (same controls and same functional form) with the following variations:
• Instead of law, use the variable enforce (R with create a dummy for primary and one for secondary enforcement). Do you see a difference between the two levels of enforcement? Explain.
• Drivers can always choose not to wear a seat belt even if it is required. Therefore, use the variable seatbelt instead of law. Interpret the new result (the coefficient of seatbelt). Also, using the effects package, show on a graph how the seat belt usage affects fatalities (you can also do any other analysis if we want). Interpret.
Econ 322 Final Project Page 5 of 5