E471: Econometric Theory and Practice I Spring 2020
Instructions:
Assignment 2
Due: Thursday, February 27, 2020, 1:00 pm
• Please upload an electronic copy of your answers to Canvas combining all results in one pdf/word file. If there is any handwritten part, please scan it and include it with the rest of the answers.
• Please also upload your code to Canvas before the due time. The code accounts for 50% of the points of the empirical questions.
• You are allowed to collaborate in groups, but required to write up answers and code independently. Direct copying will be treated as cheating. Please write the names of your collaborators at the beginning of your work, if any.
Questions:
1. This question is a sequel to Question 1 in Assignment 1. Reall that you downloaded the file Assignment1Data.csv, which contains three series: CITCRP, MARKET, and RKFREE with data from January 1978 to December 1987.
Recall the regression
rc,i − rf,i = β0 + β1(rm,i − rf,i) + εi. (1) The capital asset pricing model (CAPM) suggests that β0 should be zero. Using the R
chunks from our lectures, extend your program to:
(a) compute a p-value for the null hypothesis H0 : β0 = 0;
(b) and to construct 90%, 95%, and 99% confidence intervals for β0 and β1.
2. Suppose that E[U|X] = 0, E[X2] = μXX, and V[U|X] = σ2. We used the following two results in our derivations for the OLS estimator:
(a) Show that E[UX] = 0.
(b) Show that V[UX] = E[U2X2] = σ2μXX
3. Consider the following regression model
Yi=βXi+Ui, β=1, E[Xi2]=14, V(Ui|Xi)=1
and the sample size is n = 100. We are interested in testing the null hypothesis that β = βH versus the alternative that β ̸= βH using a t-statistic of the form
t = βˆ − β H . σˆβˆ
(a) What is the correct critical value to guarantee that the hypothesis test has a type-I error of 5%?
(b) Define the type-II error of a hypothesis test.
(c) Suppose an econometrician tests the “false” hypothesis that β = βH = 1.2 (the
“truth” is that β = 1). What is the type-II error associated with this test?
(d) Repeat the calculation for βH = 1.6, βH = 1.8, and βH = 2.
(e) What is the power of a test?
(f) True or False: The t-test has less power against alternative hypotheseis that are far away from the null hypothesis. Hint: use your previous results to answer this question.
Page 2
4. You are working for a life insurance company and are preparing for a briefing of the board. Your economic intuition tells you that the best predictor of life insurance holdings is income. You gather the relevant data (family life insurance and family income, both in thousands of dollars) and want to analyze it, running a regression
lifeinsi = β0 + β1incomei + ui The data set is provided in the file Assignment2Data.csv.
(a) Estimate the above regression model. What is your point estimate for β1? What is the interpretation of that estimate?
(b) Provide a 95% confidence interval for the coefficient on income.
(c) One of the managers suggests that the industry rule of thumb is that people buy five dollars life insurance for each additional dollar of their income. Another manager disagrees and says it could be more or less. You want to examine the difference of opinion.
What null and alternative hypotheses would you use here to discriminate between these hypotheses? Test the hypothesis, using a 5% type I error and interpret the results.
(d) Calculate a p-value for the test in (c).
(e) The company wants to offer life insurance to low income households. The chairman asks you how much life insurance would a household with an income of 20,000 dollars buy. Calculate a point prediction.
(f) Explain why it is better to consider an interval prediction rather than a point prediction.
(g) Calculate 90%, 95%, and 99% prediction intervals for the life insurance holding of a family that earns 20,000 dollars.
Page 3
5. The goal is to replicate Table 2 of Acemoglu, Johnson, and Robinson (AER, 2001). You can find the article at
http://www.aeaweb.org/articles.php?doi=10.1257/aer.91.5.1369
and the data at
http://economics.mit.edu/faculty/acemoglu/data/ajr2001
Download the zip file for the replication of Table 2, and extract the file maketable2.dta into your work directory. The data set is provided in STATA format which can be read by R (see below).
(a) Load the data using the following R chunk.
wholeworld = foreign::read.dta(“maketable2.dta”)
head(wholeworld)
## shortnam africa lat_abst avexpr logpgp95 other asia
0 1 00 01 00 01 10
##1 AFG
##2 AGO
##3 ARE
##4 ARG
##5 ARM
##6 AUS
## loghjypl baseco ##1 NA NA ##2 -3.4112 1 ##3 NA NA ##4 -0.8723 1 ##5 NA NA ##6 -0.1708 1
0 0.3667 NA NA 10.13675.3647.771 00.26677.1829.804 00.37786.3869.133 0 0.4444 NA 7.682 00.30009.3189.898
(b) How many observations are in the “whole world” sample? What does “NA” mean?
(c) What is the “base sample” considered in the paper? (you have to read the text, e.g., section II.A., to answer this question!) The following R chunk generates the base sample:
basesample = wholeworld[is.na(wholeworld[,9])==FALSE,]; # delete NA head(basesample)
## shortnam africa lat_abst avexpr logpgp95 other asia ##2 AGO10.13675.3647.77100 ##4 ARG00.37786.3869.13300 ##6 AUS00.30009.3189.89810 ## 12 BFA 1 0.1444 4.455 6.846 0 0 ## 13 BGD 0 0.2667 5.136 6.877 0 1 ## 16 BHS 0 0.2683 7.500 9.285 0 0
##
## 2
## 4
## 6
## 12 -3.5405 1
## 13 -2.0636 1
loghjypl baseco
-3.4112 1
-0.8723 1
-0.1708 1
Page 4
## 16 NA 1
(d) For the whole world sample and the base sample generate a scatter plot of log per capita GDP (y-axis) versus expropriation risk (x-axis).
(e) Here is some R chunk for the estimation of specification (1) in Table 2.
olsspec1 <- lm(wholeworld$logpgp95 ~ wholeworld$avexpr)
summary(olsspec1)
##
## Call:
## lm(formula = wholeworld$logpgp95 ~ wholeworld$avexpr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.902 -0.316 0.138 0.422 1.441
##
## Coefficients:
##
## (Intercept)
## wholeworld$avexpr
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##
## Residual standard error: 0.718 on 109 degrees of freedom
## (52 observations deleted due to missingness)
## Multiple R-squared: 0.611, Adjusted R-squared: 0.608
## F-statistic: 171 on 1 and 109 DF, p-value: <2e-16
Estimate Std. Error t value Pr(>|t|)
4.6261 0.3006 15.4 <2e-16 ***
0.5319 0.0406 13.1 <2e-16 ***
How many observations are used in this regression: fewer than in the whole world sample? Why? Does the number of observations used in your estimation match the number of observations reported in the paper?
(f) Now write a chunk of R code that estimates specification (2) in Table 2. Do you get the same point estimate? Do you get the same standard error estimate?
(g) Follow lecture slides and compute White’s heteroskedasticity consistent standard errors. Are they bigger or smaller than those under homoskedastic assumption?
(h) Now replicate the estimation results for specifications (3) - (8). Report point esti- mates, standard error estimates, R2, and number of coefficients up to four significant digits (the paper only reports two significant digits).
Page 5