—
title: “EC420 – PS #3”
author: “Your Name Here”
date: “3/16/2020″
output:
html_document:
df_print: paged
—
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(tinytex.verbose = TRUE)
options(htmltools.dir.version = FALSE)
library(knitr)
opts_chunk$set(
fig.align=”center”,fig.height=4,
dpi=300,
cache=T,
echo=T)
options(“getSymbols.warning4.0″=FALSE)
“`
### Front Matter
This assignment is due **Sunday March 29th at 11:59pm on D2L**. The point total is 142 points.
This Rmarkdown document is the template for you to use to complete your homework. **Tasks are set out in a separate document**, but are to be done in chunks in this template as specified. Questions following each task will refer back to the results from the previous code chunk.
# Part 1
### Task 1.1 (8 points)
“`{r Task11, out.width=’60%’, message=F}
“`
### Question 1.1 (5 points)
(a) (1 point) How many observations are in the data?
(b) (2 points) We are interested in test scores. Which variable(s) in \texttt{CASchools} would be our outcome of interest?
(c) (2 points) Using the output from the \texttt{table(…)} command, what county has the most observations in the data?
### Task 1.2 ( points)
“`{r Task12, out.width=’90%’}
“`
### Question 1.2 (7 points)
(a) (2 points) Using your first two plots, does there appear to be a relationship between higher student:teacher ratios and math scores? What about reading scores?
(b) (2 points) Using your last two plots, does there appear to be a relationship between higher income and math scores? What about reading scores?
(c) (3 points) Using the final plot, does it appear that some counties have higher income or higher student:teacher ratios (or both)? Discuss.
### Task 1.3 (10 points) –
“`{r Task13}
“`
### Question 1.3 (8 points)
(a) (3 points) What is the coefficient on \texttt{calworks} and what does it mean? Note that \texttt{calworks} is in percentage points (you can see the range using \texttt{range(CASchools\$calworks)})
(b) (5 points) What potential omitted variables might bias this coefficient? That is, is there something unobserved correlated with \texttt{calworks} that might also be correlated with \texttt{read}?
### Task 1.4 (3 points)
“`{r Task14}
“`
### Question 1.4 (5 points)
(a) (2 points) What is the new coefficient on \texttt{calworks}? Is it larger or smaller?
(b) (1 point) What is the base county level?
(c) (2 points) What is the expected reading score for an observation in Sonoma County, at a schools with a \texttt{calworks} value of 25\%?
—-
# Part 2
“`{r Task20, echo=T}
## Set up the data generating process
NN = 1000
## All the true betas and variances for the first equation
beta0 = 15
betaD = 1.25
betaX1 = -.25
betaX2 = .15
betaUO = .66
sigma2u = 4
## All the true deltas and variances for the second equation
delta0 = .10
deltaUO = .35
deltaZ = .20
sigma2v = .20
## And let’s set our random number generator
set.seed(4202020)
## Creating the data, P2.
### First, all of the exogenous variables:
P2 = data.frame(X1 = rnorm(n = NN, mean=2, sd=1),
X2 = rpois(n = NN, lambda=3),
UO = rnorm(n = NN, mean = 0, sd=2),
Z = rnorm(n = NN, mean = 0, sd=2),
u = rnorm(n = NN, mean = 0, sd = sqrt(sigma2u)),
v = rnorm(n = NN, mean = 0, sd = sqrt(sigma2v)))
### Add in the endogenous variables: D and Y
P2$D = delta0 + deltaZ*P2$Z + deltaUO*P2$UO + P2$v
P2$Y = beta0 + betaX1*P2$X1 + betaX2*P2$X2 + betaD*P2$D + betaUO*P2$UO + P2$u
### Note that Y includes the unobserved variable UO, and that D is a function of UO as well.
“`
### Task 2.1 (10 points)
“`{r Task21, echo=T}
“`
### Question 2.1 (16 points)
(a) (5 points) What is the coefficient on $D$ in our naive regression and what does it mean?
(b) (5 points) We can think of $UO$ as being “in the error term”. Using what we learned in \textit{partialling out} and the known values of $\delta_{UO},\beta_{UO}$, what is the sign of the bias?
(c) (2 points) Is the true value of $\beta_D$ within the 95\% Confidence Interval of our estimate for $\beta_D$?
(d) (2 points) Are the true values of $\beta_{x1},\beta_{x2}$ within the 95\% Confidence Intervals of our naive estimates?
(e) (2 points) Why would expect (or not expect) (c) to be true?
### Task 2.2 (17 points)
“`{r Task22, echo=T}
“`
### Question 2.2 (17 points)
(a) (3 points) Is $Dhat$ exogenous or endogenous? Why?
(b) (3 points) Do you think $Dhat$ is correlated with $D$ given your scatterplot from Task 2.2.\ref{22c}?
(c) (3 points) What is the coefficient on $D$ in the second stage and what is its interpretation?
(d) (3 points) How does it compare to the true value of $\beta_{D}$ that we created in Task 2.0? Is the true value within the 95\% Confidence Interval?
(e) (3 points) Given the 2SLS (two stage least squares) method we used, why would $\beta_D$ be unbiased?
(f) (2 points) Are the true values of $\beta_{x1},\beta_{x2}$ within a 95\% Confidence Interval of our estimates?
## Postword
Don’t forget to render this to .html, then open it with your browser and print it to .pdf. Do **not** turn in your Rmarkdown file. I’ll be able to see your code in the chunk outputs.