Final Assignment
Please do NOT forget to write your name here.
ECON 2220-A
Please submit your work at 4:05 p.m. sharp on Wednesday, April 10, 2019 to me at the beginning of the class. It is important to note that late submissions will NOT be accepted.
1
1.
2.
3. 4.
5.
6.
Instructions
Please type your assignment by using a word processor. If you choose to write, then please write clearly in ink.
It is also very important that you will hand in all your software outputs (e.g., tables and figures, etc.)
Please find the Stata installation instructions on your course outline.
Students are encouraged to use Stata to do this assignment. To learn Stata basics, please view the video clip on the course outline.
If you are not sure how to do regression using a software, please watch the videos posted in the course outline and cuLearn.
Also please pay a close attention to academic plagiarism. If I find two or more papers that look very similar, there will be a heavy penalty (a maximum of 50 points will be deducted from all the papers that look very similar.)
1
2 Questions
1. The Stata file (equity.dta) contains data on the two financial variables: S&P 500 price index (Y ) and dividend (X). The sample size is 91 observations.
(a) Use Stata to generate a graph of X on Y .
(b) Regress Y on X, then clearly explain all the regression outputs. Why do you think this
relationship is significant?
(c) Fit the data by using a semi-log model with the aid of Stata. Does this semi-log model perform better than the model in (b)? Why or why not?
(d) Given the estimated semi-log model, please calculate the slope and elasticity for X at X = 2.5.
(e) Given the estimated semi-log model, please test for serial correlation in the residuals by performing the Durbin-Watson test. Is there any heteroscedasticity problem? Please use both graphical and statistical methods to address this concern.
2. The Excel file (amazon.xls) contains data on the two variables, sales (Y ) and income (X) of a sample of 10 observations from 1995 to 2004.
(a) Use Stata to generate descriptive statistics of X and Y . Explain the statistical results that you have just obtained.
(b) Construct a scatter diagram.
(c) Find the least-squares regression line of Y on X by using Stata.
(d) Find the least-squares regression line of X on Y by using Stata.
(e) Test the null hypothesis at the 0.05 significance level that the regression coefficient of the population regression is 0.0 versus the alternative hypothesis that the regression coefficient exceeds 0.0. Perform the test without the aid of the computer software as well as with the aid of the computer software.
2
(f) Find the 95% confidence interval for the regression coefficient without the aid of the computer software as well as with the aid of the computer software.
(g) Construct the prediction interval for the conditional mean of Y for a given X = Xp = 5 with the aid of the software.
(h) ConductdiagnosticchecksfortheresidualsbyusingtheDurbin-Watsontest,theBreusch- Pagan test, and the Jarque-Bera test. Please clearly explain your conclusions.
3. The Excel file (prices.xls) contains data on three variables, gas price (Y ), natural gas (X), and electricity (Z) of a sample of 10 observations from 1996 to 2005.
(a) Find the least-squares regression equation of Y on X and Z with the aid of Stata. [Hint: the regression equation is Y = β0 + β1X + β2Z + ε.]
(b) Find the coefficient of determination, R2, with the aid of the computer software. What does the number in R2 imply?
(c) Find the value of the F test. What does the F test say about the overall significance of this model?
(d) Test the null hypothesis at the 0.05 significance level that the regression coefficient of the population regression, β1, is 0.0 versus the alternative hypothesis that the regression coefficient exceeds 0.0.
(e) Test the null hypothesis at the 0.05 significance level that the regression coefficient of the population regression, β2, is 0.0 versus the alternative hypothesis that the regression coefficient exceeds 0.0.
(f) Based on the estimated equation Y = b0 + b1X + b2Z, determine the estimated value of Y from the given values of X and Z.
(g) Estimate the gas price at the natural gas of 9 and the electricity of 8.
(h) Calculate the VIF for each independent variable. Is there any multi-collinearity problem in this regression? Please explain.
3
(i) You can also build a regression model with one independent variable, say either X or Z. Use the four important specification criteria given in Chap. 6 to work out the best regression model. Please explain your answer in detail.
4. The Stata data file (elemapi.dta) contains data on 21 school-related variables in 400 U.S. schools. Please refer to the text file (elemapi description.txt) for a description of these vari- ables.
(a) Regress api00 on meals, acs k3, avg ed, and full with the aid of Stata. [Hint: the regression equation is Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + ε.]
(b) Find the coefficient of determination, R2, with the aid of the computer software. What does the number in R2 imply?
(c) Find the value of the F test. What does the F test say about the overall significance of this model?
(d) Test the null hypothesis at the 0.05 significance level that the regression coefficient of the population regression, β1, is 0.0 versus the alternative hypothesis that this regression coefficient is not 0.0.
(e) Test the null hypothesis at the 0.05 significance level that the regression coefficient of the population regression, β2, is 0.0 versus the alternative hypothesis that this regression coefficient is not 0.0.
(f) Test the null hypothesis at the 0.05 significance level that the regression coefficient of the population regression, β3, is 0.0 versus the alternative hypothesis that this regression coefficient is not 0.0.
(g) Test the null hypothesis at the 0.05 significance level that the regression coefficient of the population regression, β4, is 0.0 versus the alternative hypothesis that this regression coefficient is not 0.0.
(h) Use Variance Inflation Factor (VIF) to detect potential multi-collinearity. 4
(i) Regress api00 on meals, acs k3, and full with the aid of Stata. [Hint: the regression equation is Y = β0 + β1X1 + β2X2 + β3X3 + ε.] Use all your knowledge about multi- ple regression, compare this regression model with the model used in part (a). Which regression model perform better? Why?
5