Assignment 10, Question 3
suppressMessages(library(“AER”))
Part (a)
By the result of Question 2(a), the true value of \(\beta\) is given by \(\beta=\left(EX_{i}X_{i}^{\prime}\right)^{-1}EX_{i}g\left(X_{i}\right)\). In this case, \(EX_{i}X_{i}^{\prime}=\begin{pmatrix} 1 & E X_{i,2} \\ E X_{i,2} & E X_{i,2}^2\end{pmatrix}=\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\). Next, \(EX_{i}g\left(X_{i}\right)=\begin{pmatrix} E X_{i,2}^3 \\ EX_{i,2}^4 \end{pmatrix}\). By the symmetry of the standard normal distribution around zero, \(E X_{i,2}^3=0\). To compute\(EX_{i,2}^4\), we can use the MGF of the \(N(0,1)\) distribution: \(M(t)=\exp( t^2 /2)\). The fourth derivative of the MGF at \(t=0\) is equal to \(3\). Hence, \(EX_{i,2}^4=3\). We now have: \(\beta= \begin{pmatrix} 0\\ 3\end{pmatrix}\).
Part (b)
Custom function to generate data:
data_sim <- function(n){
x2<-rnorm(n,0,1)
v<-runif(n,-10,10)
y=x2^3+v
data<-list(Y=y,X=x2)
return(data)
}
Generate data:
D=data_sim(2000)
y=D$Y
x2=D$X
Part (c)
Run the OLS regression
m=lm(y~x2)
m$coefficients
## (Intercept) x2
## -0.08369941 3.05545370
Part(d)
Homoskedastic standard errors:
coeftest(m)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.083699 0.138117 -0.606 0.5446
## x2 3.055454 0.138488 22.063 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Heteroskedastic standard errors:
coeftest(m,vcov=hccm(m,type="hc0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.083699 0.137884 -0.607 0.5439
## x2 3.055454 0.179058 17.064 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
• The heteroskedastic standard error for the slope porameter is larger.
Part (e)
A grid of values for the regressor $X_{i,2}
grid=seq(-4,4,0.05)
The corresponding values of \(g(X_i)\):
g=grid^3
The regression line with \(\beta_1\) and \(\beta_2\):
reg_line=0+3*grid
The estimated regression line:
est_reg_line=summary(m)$coefficients[1,1]+summary(m)$coefficients[2,1]*grid
Plotting:
plot(grid,g,type="l",col="red",ylim=c(-70,70),xlab="regressor",ylab="dependent variable")
lines(grid,reg_line,col="blue")
lines(grid,est_reg_line,col="black")
legend(0,-20,legend=c("True function","True regression","Estimated regression"),col=c("red","blue","black"),lty=1)

• The linear approximation appears to work well for the values of the regressor in the (-2,2) range.
• Since the regressor has a standard normal distribution, most of the observations would fall within the range.
Part (f)
Plotting the squared residuals agains the regressor:
plot(x2,(m$residuals)^2)

• The residuals appear heteroskedastic: the second moment of the residuals as a function of the regressor is higher for larger positive or negative values of the regressor.
• The residuals \(U_i\) include the approximation error \(g(X_i)-X_i'\beta\). According to the results of Question 2(d), \(E(U_i^2\mid X_i)\) depends on \((g(X_i)-X_i'\beta)^2\). From the graph in part (e), we can see that the magnitude of the approximation error is larger for larger positive/negative values of the regressor. This explains larger \(\hat{U}_i^2\) for larger positive/negative values of the regressor.
Part (g)
R=10^4
n=20
T=rep(0,R)
for (r in 1:R){
data=data_sim(n)
m=lm(data$Y ~ data$X)
ct=coeftest(m,vcov=hccm(m,type="hc0"))
T[r]=(ct[2,1]-3)/ct[2,2]
}
Plot the distribution:
low=min(T)
high=max(T)
B=max(-low,high)+0.2
hist(T,breaks=seq(-B,B,0.2),xlab="T-statistic values",main="The simulated distribution of the T statistic",freq=FALSE,ylim=c(0,0.4))
x=seq(-6,6,0.01)
f=exp(-x^2/2)/sqrt(2*pi)
lines(x,f,col="red")

• The simulated distribution of \(T\) has thicker tails than the standard normal distribution.
• Moreover, the distribution of \(T\) is also skewed to the left.
Part (h)
alpha=c(0.01,0.05,0.10)
P_right=rep(0,3)
P_left=rep(0.3)
for (j in 1:3){
P_right[j]=sum(T>qnorm(1-alpha[j]))/R
P_left[j]=sum(T
cbind(alpha,P_right)
## alpha P_right
## [1,] 0.01 0.0320
## [2,] 0.05 0.0771
## [3,] 0.10 0.1226
• For both events, \(T
P_left[j]=sum(T
cbind(alpha,P_right)
## alpha P_right
## [1,] 0.01 0.0040
## [2,] 0.05 0.0366
## [3,] 0.10 0.0796
• The simulated distribution of \(T\) is still somewhat skewed to the left, but to a much smaller extent.
• The sumulated probabilities for the tail events are now much closer to the values of \(\alpha\).
• The normal approximation appears to be much more accurate.