Zhenhao Gong Homework 1 Econ 3313 Spring 2021
Question 1 (Hypothesis Testing) (1 point)
As shift manager at a local fast food place, you are responsible for ensuring quality control. You do not want to weigh all the frozen hamburger patties that get delivered by your supplier to make sure they weigh three ounces on average, so you choose 100 patties at random. You calculate that the sample mean weight of patties is 2.6 ounces and the standard deviation of the weight of hamburger patties to be 0.4 ounces.
1. Construct a hypothesis testing to test that the population mean weight of patties is equal to 3 ounces using 5% significant level. (0.4 points)
The hypothesis testing we can construct for this test is: H0 : μ = 3
The test statistic for this test is:
t = sY /√n = 0.4/√100 = −10.
H1 : μ̸=3. Y ̄ − μ 2 . 6 − 3
The critical value at the 5% significant level is 1.96. Since | − 10| > 1.96, we reject the null hypothesis at the 5% significant level.
2. Generate a 95% confidence interval for the average weight of all the frozen hamburger patties. (0.3 points)
A 95% confidence interval for the population mean weight is: ̄s
CI=Y ±tα/2√n
= 2.6±1.96× √
0.4 100
3. You are particularly concerned about the population weight being under 3 ounces, is there any evidence for concern? Explain using the confidence interval you got before. (0.3 points)
Yes there is concern. We are 95% confidence that the population mean lies between 2.521 ounces and 2.678 ounces. This is significantly below 3 ounces.
= [2.521, 2.678].
1
Zhenhao Gong Homework 1 Econ 3313 Spring 2021
Question 2 (Simple linear regression model) (1 point)
You have obtained a sub-sample of 1744 individuals from the Current Population Survey (CPS) and are interested in the relationship between weekly earnings and age. The regression, using heteroskedasticity-robust standard errors, yielded the following result:
Earn = 232 + 4.20 × Age, R2 = 0.15, SER = 287.21 where Earn and Age are measured in dollars and years respectively.
1. What is the regression’s weekly earnings prediction for someone who is 18 years old? 32 years old? (0.2 points)
232 + 4.2 × 18 = $307.6; 232 + 4.2 × 32 = $366.4;
2. The current weekly earnings of a man is 415.96. What is the regression’s prediction for the man’s age? What is the regression’s prediction for the man’s weekly earnings after two years late? (0.3 points)
415.96 − 232 = 44; 4.2
415.96 + 4.2 × 2 = 424.36.
3. Why should age matter in the determination of earnings? Do the results suggest that there is a guarantee for earnings to rise for everyone as they become older?
(Use R2 to explain) (0.3 points)
Age may be a proxy for “experience”, which in itself can approximate “on the job training”. The results do not suggest that there is a guarantee for earnings to rise for everyone as they become older since the regression R2 does not equal 1. Instead the result holds “on average”.
4. The average age in this sample is 34 years. What is average weekly income in the sample? (Hint: use the formula Y ̄ = βˆ0 + βˆ1X ̄) (0.2 points)
232 + 4.2 × 34 = 374.8.
2
Zhenhao Gong Homework 1 Econ 3313 Spring 2021
Question 3 (Multiple linear regression model) (1 point)
The cost of attending your college has once again gone up. Although you have been told that education is investment in human capital, which carries a return of roughly 10% a year, you (and your parents) are not pleased. One of the administrators at your university/college does not make the situation better by telling you that you pay more because the reputation of your institution is better than that of others. To investigate this hypothesis, you collect data randomly for 100 national universities and liberal arts colleges from the 2000-2001 U.S. News and World Report annual rankings. Next you perform the following regression
Cost =7, 311.17 + 3, 985.20 × Reputation − 0.20 × Size + 8, 406.79 × Dpriv (2058.63.4) (664.58) (0.13) (2, 154.85)
− 416.38 × Dlibart − 2, 376.51 × Dreligion, R2 = 0.72, SER = 3, 773.35 (1, 121.92) (1, 007.86)
where Cost is Tuition, Fees, Room and Board in dollars, Reputation is the index used in U.S. News and WorldReport (based on a survey of university presidents and chief academic officers), which ranges from 1 (”marginal”) to 5 (”distinguished”), Size is the number of undergraduate students, and Dpriv, Dlibart, and Dreligion are binary variables indicating whether the institution is private, a liberal arts college, and has a religious affiliation. The numbers in parentheses are heteroskedasticity-robust standard errors.
1. Interpret the results. Do the coefficients have the expected sign? (0.2 points)
An increase in reputation by one category, increases the cost by roughly $3, 985.20. The smaller the size of the college/university, the higher the cost. An decrease of 10,000
students results in a $2, 000 higher cost.
Private schools charge roughly $8, 406 more than public schools.
A school with a religious affiliation is approximately $2, 376 cheaper, presumably due to subsidies, and a liberal arts college also charges roughly $416 less.
There are no observations close to the origin, so there is no direct interpretation of the intercept.
Other than perhaps the coefficient on liberal arts colleges, all coefficients have the expected sign.
3
Zhenhao Gong Homework 1 Econ 3313 Spring 2021
2. What is the forecasted cost for a liberal arts college, which has no religious affiliation, a size of 1,500 students and a reputation level of 4.5? (All liberal arts colleges are private.) (0.2 points)
The forecasted cost for a liberal arts college, which has no religious affiliation, a size of 1,500 students and a reputation level of 4.5 is:
7,311.17+4.5×3,985.20+1,500×0.2+8,406.79×1
− 416.38 × 1 − 2, 376.51 × 0 = $33, 534.98
3. Indicate whether or not the coefficients of Reputation and Dpriv are significantly dif- ferent from zero.(Use 1 % significant level) (0.2 points)
Reputation:
Dpriv:
t= 3,985.20−0 =5.99>2.58, 664.58
t= 8,406.79−0 =3.9>2.58 2, 154.85
The coefficient of Reputation and Dpriv are statistically significant different from zero at 1% levels.
4. Construct a 99% confidence interval for the coefficient of Size. (0.2 points) A 99% confidence interval for the coefficient of Size is:
CI(βSize) = βˆSize ± 2.58 × SE(βˆSize) = −0.2 ± 2.58 × 0.13
= [−0.535, 0.135].
5. You want to test simultaneously the hypotheses that βsize = 0 and βDilbert = 0. Your regression package returns the F-statistic of 1.23. Can you reject the null hypothesis at 5% significant level? Why? (0.2 points)
The 5% critical value for F-statistics when q = 2 is 3.00. Hence you cannot reject the null hypothesis in this case.
4
Zhenhao Gong Homework 1 Econ 3313 Spring 2021
Question 4 (Nonlinear regression model) (1 point)
Earnings functions attempt to find the determinants of earnings, using both continuous and binary variables. One of the central questions analyzed in this relationship is the returns to education.
1. Collecting data from 253 individuals with the years of education range from 6 to 20, you estimate the following log-linear model
̄2 ln(Earnings)=0.54+0.083×Educ,R =0.234
(0.14) (0.011)
where Earnings is average hourly earnings and Educ is years of education.
What is the effect of an additional year of schooling? If you had a strong belief that years of high school education were different from college education, how would you modify the equation? (0.2 points)
One additional year of education carries an 8.3 percent increase, or a return, on earnings. You would need additional data to see if this coefficient was different for high school versus college education. Including both variables in the regression would then allow you to test for equality of the coefficients.
2. You also estimate the following log-log model
̄2
ln(Earnings)=0.54+4.3×lnEduc,R =0.341 (0.24) (0.46)
What is the percentage change on earnings for an individual corresponding to a change in years of schooling from 12 to 16? Compare the log-linear model in (a) and log-log model in (b), which one is better? Why? (0.4 points)
The percentage change in years of schooling from 12 to 16 is
16 − 12 × 100 = 33.33%, 12
and we know that in this log-log regression model, a 1% increase in years of schooling is estimated to correspond to a 4.3% increase in earnings. Hence, 33.33% increase in years of schooling is estimated to correspond to a 4.3 × 33.33% = 143.33% increase in earnings. Log-log model is better than log-linear model since it has a better fit according to R ̄2.
5
Zhenhao Gong Homework 1 Econ 3313 Spring 2021
3. You read in the literature that there should also be returns to on-the-job training. To approximate on-the-job training, researchers often use the experience variable Exper to represent the years of employment. You incorporate the experience variable into your original regression
2
What is the effect of an additional year of experience for a person who has 12 years of education? Construct a hypothesis test to test whether the population regression model above is linear or nonlinear? (Hint: we just need to test whether the coefficient of Exper2 is significant at 5 % significant level) (0.4 points)
ln(Earnings) = − 0.01 + 0.101 × Educ + 0.033 × Exper − 0.0005 × Exper (0.16) (0.012) (0.006) (0.0001)
R2 = 0.34, SER = 0.405
(i) Since we know that
ln(Y + ∆Y ) − ln(Y ) ∼= ∆Y , Y
let Y = Earnings, so
∆ Earnings
= 0.033 × (Exper + 1) − 0.0005 × (Exper + 1)2
− (0.033 × Exper − 0.0005 × Exper2) = 0.0325 − 0.001 × Exper.
Since we don’t know what is the years of experience for this person, we can’t compute the effect of an additional year of experience for a person who has 12 years of education.
(ii) To test whether the population regression model above is linear or nonlinear, we only need to test whether β3 is statistically significant different from zero.
TestH0 :β3 =0vs. H1 :β3 ̸=0at5%level:
t-statistic= βˆ3 −0 = −0.0005−0 =−5, so|t|>1.96,
so we reject the null hypothesis β4 = 0 at the 5% level. Hence, the population regression model above is more likely to be nonlinear.
Earnings
SE(βˆ3 ) 0.0001
6
Zhenhao Gong Homework 1 Econ 3313 Spring 2021
Question 5 (Empirical Exercises) (1 point)
The data file insurance, which contains the insurance data for full-time, full-year workers, ages 18–64. In this exercise, you will investigate how the control factors in the data such as age, the number of children etc, can effect a worker’s insurance charges.
(Please submit the R codes you wrote for this question.)
1. Run a regression of insurance charges (charges) on age (age) and the number of chil- dren (children). Write down the regression model with estimated coefficients, the cor- responding stand errors, and the adjusted R2. Use the model to predict the insurance charge of a worker who is 30 years old and have 2 children. What is the corresponding 95% confidence interval for her predicted insurance charges? (0.3 points)
2. Run a quadratic regression of insurance charges (charges) on age (age), squared age (age2), and the number of children (children). Write down the regression model with estimated coefficients, the corresponding stand errors, and the adjusted R2. Use the model to predict the insurance charge of a worker who is 30 years old and have 2 children. What is the corresponding 95% confidence interval for her predicted insurance charges? (0.3 points)
3. Run a linear log regression of insurance charges (charges) on age (age), log age (log age), and the number of children (children). Write down the regression model with estimated coefficients, the corresponding stand errors, and the adjusted R2. Use the model to pre- dict the insurance charge of a worker who is 30 years old and have 2 children. What is the corresponding 95% confidence interval for her predicted insurance charges? (0.3 points)
4. Which one of three models above is the best for the prediction? Why? (0.1 points)
(Check the answers for this question on the comments of R script.)
7