—
title: ‘Assignment 1: Simulations’
output: pdf_document
—
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
“`
In this first assignment, I want you to analyze the properties of a few estimators. I want you to create a Rmd file, compile it and upload both the Rmd and PDF files before Sunday October 3 at 10:00pm. I usually do not answer emails during the weekend, so do not start at the last minute in case you need help. You are allowed to discuss with each other, but I expect each copy to be unique.
If some simulations are too long to compile, you can generate the solution and save it into an .Rda file and load the file in your document to present the result. This will make it easier for you to compile the PDF. If you choose to proceed this way, include the code for the simulation with the option `eval=FALSE`. Here is an example. You first show the simulation code without running it (you can run it once on your side to get the result):
“`{r, eval=FALSE}
Xvec <- numeric()
for (i in 1:10000)
{
x <- rnorm(5000)
Xvec[i] <- mean(x)
}
save(Xvec, file="A1Res.rda")
```
Then, you print the result:
```{r}
load("A1Res.rda")
mu <- mean(Xvec)
rmse <- sqrt(mean(Xvec^2))
rmse
```
If you use this method, upload all .rda files needed to compile the PDF.
*Important:* Before you start any simulation, set the seed to your student number. Also, set the number of iterations to 2,000 for all your simulations.
## Estimator of the covariance
We saw that the unbiased estimator of the variance, $S^2$, is less efficient than $\hat{\sigma}^2=(n-1)S^2/n$. We also saw in a simulation that the RMSE of $\hat{\sigma}^2$ is smaller. In this question, we want to see if we get the same result for the following estimators of the covariance:
\[
S_{xy} = \frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})
\]
\[
\sigma_{xy} = \frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})
\]
Set the sample size to $n=25$ and try three different pairs of distributions for $X$ and $Y$. For each pair, report the RMSE of both estimators. Interpret your result.
## Estimator of the mean for non-identically distributed samples.
We saw that $\bar{X}$ is the best linear unbiased estimator of the mean, which means that it is has the smallest variance among all estimators that can be defined as:
\[
\hat{\mu} = \frac{\sum_{i=1}^n w_ix_i}{\sum_{i=1}^n w_i}
\]
However, this is true only for iid samples. In your simulation, consider a sample of 100 observations with $x_i\sim iid(0,\sigma_1^2)$ for $i=1,...,50$ and $x_i\sim iid(0,\sigma_2^2)$ for $i=51,...,100$.
- In the first simulation, compare the bias, standard error and RMSE for $\bar{X}$ and $\hat{\mu}$ using $w_i=1/\sigma^2$ for $i=1,...,50$ and $w_i =1/\sigma_2^2$ for $i=51,...,100$. We also assume that $\sigma_i^2$'s are known. You can try a few distributions and see if it affects the properties.
- In the second simulation, we use the same weights but we assume that the $\sigma_i^2$'s are unknown. Therefore, we replace them by estimators. Since it is a different estimator, we call it $\tilde{\mu}$. Using the same realized samples as in the previous simulation, compare the bias, standard error and RMSE of $\tilde{\mu}$ and $\bar{x}$.
Do we see a difference when the true variances are replaced by estimates? Explain.
## Asymptotic distribution
For this question, we want to analyze the large sample properties of the fourth moment estimator:
\[
\hat{m}_4 = \frac{1}{n}\sum_{i=1}^n X_i^4
\]
To analyze the properties, use two distributions: one with the first 8 moments being finite and one with the 8$^{th}$ moment being either infinite or undetermined. Then:
- Compare the consistency of $\hat{m}_4$ for the two distributions when $n$ goes from 10 to 2000. Explain your result.
- Compare the distributions of $\sqrt{n}(\hat{m}_4-m_4)$ for $n=10,...,2000$. Do they seem to be asymptotically normal? Explain why.
If you want the true $m_4$, you can get it from the web, or you can compute it numerically. Since the definition is $\int f(x)x^4 dx$, where f(x) is the density, we could get it in R using the `integrate` function. For example, the fourth moment of the N(0,4) is:
```{r}
f <- function(x) dnorm(x, 0, 2)*x^4
m4 <- integrate(f, -Inf, Inf)
m4
```
It is therefore `r m4$value`.
## The Delta Method
Suppose $X>0$ and we want to show that:
\[
\sqrt{n}(\bar{X}^3-\mu^3) \to N(0, \sigma^2(3\mu^2)^2)
\]
First, use the Delta method to show that it does converge to $N(0, \sigma^2(3\mu^2)^2)$. Then, use a simulation to see if the distribution gets closer and closer to this normal distribution when $n$ increases. Interpret your result (e.g. is it a good approximation in small samples?).
For your population, you can choose any distribution with $\mu\neq 0$ that satisfies the CLT for $\bar{X}$, but you cannot use the normal distribution. If you want, you can try more than one distribution and compare the results. Theoretically, what happens if $\mu=0$? Can you very that numerically?
## Hypothesis tests
We want to test the Null hypothesis $H_0:\mu=c$ against the alternative $H_1:\mu\neq c$ at $\alpha$. We know that when the population is known to be normal, we have the following:
\[
\mbox{Test 1:}~~\frac{\sqrt{n}(\bar{X}-\mu)}{S} \sim t_{n-1}
\]
If the distribution of the population is unknown, we rely on the asymptotic distribution:
\[
\mbox{Test 2:}~~\frac{\sqrt{n}(\bar{X}-\mu)}{\hat{\sigma}} \to N(0,1)
\]
where $\hat{\sigma}^2=(n-1)S^2/n$. Answer the following question:
– Suppose $x_i \sim N(\mu, \sigma^2)$. Using a simulation, compare the power curve of Test 1 and Test 2 for $\alpha=5\%$. You can choose any $\mu$ and $\sigma^2$. Set the sample size to 25.
– Answer the same question when $x_i$ is not normal. Choose any distribution.
What happens when n=300? Explain.
# Bootstrap
Here, we want to see how good is the Bootstrap method to estimate the standard deviation of
\[
\hat{m}_4 = \frac{1}{n}\sum_{i=1}^n X_i^4
\]
For this simulation, assume that $x_i\sim N(0,1)$. Since we know all moments of the normal distribution, it implies that $Var(\hat{m}_4)=96/n$ (prove it). Also, set the sample size of each sample to 20 and the number of Bootstrap samples to 500.
– Estimate the bias of the Bootstrap estimator of the standard deviation.
– Using the 2,000 $\hat{m}_4$ and 2,000 Bootstrap estimates of the standard deviation, estimate the probability of rejecting $H_0:m_4=3$ against $H_0:m_4\neq 3$ using the asymptotic distribution when the size of the test is $\alpha$ = 1\%, 5\% and 10=\%. The asymptotic distribution is:
\[
\frac{\sqrt{n}(\hat{m}_4-3)}{SD(\hat{m}_4)} \to N(0,1)
\]