“`{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)
options(tinytex.verbose = TRUE)
options(htmltools.dir.version = FALSE)
library(knitr) opts_chunk$set( fig.align=”center”,fig.height=4, dpi=300, cache=T, echo=T)
options(“getSymbols.warning4.0″=FALSE)
“`
Front Matter
This assignment is due SUNDAY MARCH 29TH AT 11:59PM ON D2L. The point total is 142 points.
This Rmarkdown document is the template for you to use to complete your homework. TASKS ARE SET OUT IN A SEPARATE DOCUMENT, but are to be done in chunks in this template as specified. Questions following each task will refer back to the results from the previous code chunk.
PART 1
Task 1.1 (8 points)
“`{r Task11, out.width=’60%’, message=F}
“`
Question 1.1 (5 points)
(a) (1 point) How many observations are in the data?
(b) (2 points) We are interested in test scores. Which variable(s) in would be our outcome of interest?
(c) (2 points) Using the output from the command, what county has the most observations in the data?
Task 1.2 ( points)
“`{r Task12, out.width=’90%’}
“`
Question 1.2 (7 points)
(a) (2 points) Using your first two plots, does there appear to be a relationship between higher student:teacher ratios and math scores? What about reading scores?
(b) (2 points) Using your last two plots, does there appear to be a relationship between higher income and math scores? What about reading scores?
(c) (3 points) Using the final plot, does it appear that some counties have higher income or higher student:teacher ratios (or both)? Discuss.
Task 1.3 (10 points) –
“`{r Task13}
“`
Question 1.3 (8 points)
(a) (3 points) What is the coefficient on and what does it mean? Note that is in percentage points (you can see the range using )
(b) (5 points) What potential omitted variables might bias this coefficient? That is, is there something unobserved correlated with that might also be correlated with ?
Task 1.4 (3 points)
“`{r Task14}
“`
Question 1.4 (5 points)
(a) (2 points) What is the new coefficient on ? Is it larger or smaller?
(b) (1 point) What is the base county level?
(c) (2 points) What is the expected reading score for an observation in Sonoma County, at a schools with a value of 25%?
————————————————————————
PART 2
“`{r Task20, echo=T}
Set up the data generating process
NN = 1000
All the true betas and variances for the first equation
beta0 = 15 betaD = 1.25 betaX1 = -.25 betaX2 = .15 betaUO = .66 sigma2u = 4
All the true deltas and variances for the second equation
delta0 = .10 deltaUO = .35 deltaZ = .20 sigma2v = .20
And let’s set our random number generator
set.seed(4202020)
Creating the data, P2.
First, all of the exogenous variables:
P2 = data.frame(X1 = rnorm(n = NN, mean=2, sd=1), X2 = rpois(n = NN, lambda=3), UO = rnorm(n = NN, mean = 0, sd=2), Z = rnorm(n = NN, mean = 0, sd=2), u = rnorm(n = NN, mean = 0, sd = sqrt(sigma2u)), v = rnorm(n = NN, mean = 0, sd = sqrt(sigma2v)))
Add in the endogenous variables: D and Y
P2D = delta0 + deltaZ * P2Z + deltaUO_P2UO + P2v P2Y = beta0 + betaX1 * P2X1 + betaX2_P2X2 + betaD * P2D + betaUO*P2UO + P2u ### Note that Y includes the unobserved variable UO, and that D is a function of UO as well.
“`
Task 2.1 (10 points)
“`{r Task21, echo=T}
“`
Question 2.1 (16 points)
(a) (5 points) What is the coefficient on D in our naive regression and what does it mean?
(b) (5 points) We can think of UO as being “in the error term”. Using what we learned in and the known values of δUO, βUO, what is the sign of the bias?
(c) (2 points) Is the true value of βD within the 95% Confidence Interval of our estimate for βD?
(d) (2 points) Are the true values of βx1, βx2 within the 95% Confidence Intervals of our naive estimates?
(e) (2 points) Why would expect (or not expect) (c) to be true?
Task 2.2 (17 points)
“`{r Task22, echo=T}
“`
Question 2.2 (17 points)
(a) (3 points) Is Dhat exogenous or endogenous? Why?
(b) (3 points) Do you think Dhat is correlated with D given your scatterplot from Task 2.2.?
(c) (3 points) What is the coefficient on D in the second stage and what is its interpretation?
(d) (3 points) How does it compare to the true value of βD that we created in Task 2.0? Is the true value within the 95% Confidence Interval?
(e) (3 points) Given the 2SLS (two stage least squares) method we used, why would βD be unbiased?
(f) (2 points) Are the true values of βx1, βx2 within a 95% Confidence Interval of our estimates?
Postword
Don’t forget to render this to .html, then open it with your browser and print it to .pdf. Do NOT turn in your Rmarkdown file. I’ll be able to see your code in the chunk outputs.