—
title: “Numerical Simulations”
author: “Pierre Chausse”
date: “15/09/2021”
output: pdf_document
—
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, out.width = “60%”, fig.align = “center”)
“`
# Introduction
In this document, I want to introduce you to numerical simulations. In the first lecture, we covered a lot of topics related to sampling theory. You do have to understand the theory, but I don’t want you to focus too much on it. This is meant to be a review to prepare you for the next topics. Instead, I want to show you how to use simulations to illustrate the different results I presented. I think it is a good way to understand the results and it is a very useful tool that we will often use in this course. In fact, it will be the topic of next assignment.
# What is a numerical simulation?
Simulations are used to derive the properties of estimators when they cannot be derived analytically. For example, we saw in class that the distribution of $\bar{X}$ is unknown unless we assume that our sample comes from a normally distributed population. Since the distribution is a property in repeated samples, we can simulate what we would obtain if we had the opportunity of collecting multiple samples. In general, suppose we are interested in the properties of T(X), a function of the sample $X=\{x_1,x_2,…,x_n\}$. To obtain the properties, we collect many samples from a given population (to be determined) and we save all realizations of $T(X)$. If we compute a large enough number of realizations, we can use them to approximate the properties of $T(X)$. The general algorithm is:
> ## Simulation algorithm
>
> 1. We determine a DGP (Data generating process). This DGP is basically a specification for our population of interest. It can be as simple as a distribution for the sample X or as difficult as a complete model for a causality study.
> 2. We generate N datasets (with a given sample size) from the selected DGP.
> 3. We compute the statistics of interest T(X), where X is either a random vector or a random dataset.
> 4. We analyze the properties of T(X) by approximating the infinite population of $T(X)$ by its N realizations.
## How do we code simulations
Since we are generating multiple samples and are computing $T(X)$ each time, it is natural to construct a simulation using loops. $T(X)$ could be a scalar, a vector or even a matrix. Since we have to store $N$ $T(X)$, we need to create the appropriate object to store it. Here are the possible options (we assume that the function f(x) generates $T(X)$ and g(n) generates our sample):
### $T(X)$ is a scalar:
“`r
Tvec <- numeric(N) # We create a numerical vector of N elements
# or #
Tvec <- numeric() # We create an empty numerical vector.
for (i in 1:N)
{
x <- g(n)
Tvec[i] <- f(x)
}
```
### $T(X)$ is a vector:
Here we want to store $T(X)$ in a matrix. Each realization could be on each row of the matrix or on each column. The choice is yours. In the following, they are stored on each row (we suppose that $T(X)$ has l elements).
```r
Tmat <- matrix(nrow=N, ncol=l) # We create a Nxl matrix of zeros
for (i in 1:N)
{
x <- g(n)
Tmat[i,] <- f(x)
}
```
Another option is to stack each $T(X)$ on top of each other using `rbind`:
```r
Tmat <- numeric() # This command just creates Tmat so that it exists
for (i in 1:N)
{
x <- g(n)
Tmat <- rbind(Tvec, f(x))
}
```
### $T(X)$ is a matrix or a more complicated object:
This is not really something we will use in this course. But if you want to know, you have the option of storing the results in an array (3D matrix) or a list. In the following, I show how to do it with a list:
```r
Tmat <- list() # This command just creates an empty list
for (i in 1:N)
{
x <- g(n)
Tmat[[i]] <- f(x) # [[]] are used to select elements of a list
}
```
### Can we avoid loops? (optional)
Unless you program in C, C++ or Fortran, it is recommended to avoid loops when it is possible. However, it may require a little more thinking. Suppose we want to analyze the properties of $\bar{X}$ for $n=40$ and $x_i\sim U(-1,1)$. Suppose we want N=1,000. You can create a $40\times 1000$ matrix. In that case, each column is a sample. This can be done using the following code:
```{r}
X <- matrix(runif(40000), 40, 1000)
```
Then we can simply compute the mean of each column:
```{r}
Tvec <- colMeans(X)
```
Of course, this approach requires more RAM because you need to store a 40x1000 matrix. This is certainly not a problem, but if you want n=2000 and N=10000, your computer may not have enough RAM. In that case, you may have to rely on loops. Just to see the difference, you can compare the computational time of the two approaches (written in a function for simplicity:
```{r}
f1 <- function()
{
Tvec <- numeric()
for (i in 1:1000)
Tvec[i] <- mean(runif(40,01,1))
}
f2 <- function()
{
Tvec <- colMeans(matrix(runif(40000,40,1000)))
}
(t1 <- system.time(f1()))
(t2 <- system.time(f2()))
```
It is about `r round(t1[3]/t2[3],0)` times faster without the loop. But, who cares if it takes `r t1[3]` seconds instead of `r t2[3]` seconds? In that particular case, it is not worth the effort of avoiding the loop. Note that the system time computed above changes each time I recompile the Rmd file. It depends on many factors (e.g. which other software are running at the same time). It will also be different for you because you have a different system. If you want a more precise measure of the relative time, the function `benchmark` from the #rbenchmark# package measures the relative time by repeating the functions 100 times:
```{r, eval=FALSE}
library(rbenchmark)
benchmark(loop=f1(), matrix=f2())
```
In the above function, I use `colMeans` to appy the function `mean` to each column. In fact, the function calls the function `apply`. This is a very useful function when you want to run a function on each column or row. The following lines are equivalent:
```{r}
A <- matrix(rnorm(1000),200,5)
colMeans(A)
apply(A,2,mean) # applies mean to the second dimension (the columns)
colSums(A)
apply(A,2,sum) # applies sum to the second dimension (the columns)
```
You can even write more complicated functions:
```{r}
apply(A,2,function(x) c(mean(x), sd(x), length(x), min(x),max(x)))
```
### Can we use parallel programing?
If you have multiple cores on your computer or have access to a super computer, it may be beneficial to do some parallel programming: doing more than one estimation simultaneously. There are more than one way to do it in R. I will show you the easiest one. Unfortunately, it does not work on Windows operating systems. Consider the following function:
```{r}
f <- function(n)
{
x <- runif(n, -1,1)
mean(x)
}
f(40)
f(40)
```
The function generate a random vector of size $n$ from a $U(-1,1)$ distribution and returns its sample mean. You can send that function to multiple cores simulataneously using the 'mclapply' function from the #parallel# package:
```{r}
library(parallel)
res <- mclapply(1:1000, function(i) f(40), mc.cores=8) # my laptop has 8 cores
```
Since res is a list, we can convert the list using the following:
```{r}
Tvec <- simplify2array((res))
```
I used this function all the time in my research. When one estimation takes several seconds and I use my 32-core workstation, the simulation time is substantially reduced. It is worth it in this case? Not really.
# Estimating the bias
## The sample standard deviation
For this exercise, we want to estimate the bias of $s$. We saw that $s^2$ is an unbiased estimator for the variance, but $s$ is biased since it is a non linear transformation of $s^2$. For that purpose, our DGP is a chi-square distribution with 8 degrees of freedom (the variance is 16, which implies $\sigma=4$). In the following, I use a function that returns the bias for a given sample size using 2000 replications. You do not have to use a function, but it is easier this way since we will try different sample sizes.
```{r}
f <- function(n)
{
Tvec <- numeric()
for (i in 1:2000)
{
x <- rchisq(n, 8)
Tvec[i] <- sd(x)
}
mean(Tvec)-4 ## estimated bias
}
```
Then, we can try with n=10:
```{r}
set.seed(112233) # To make it reproducible
f(10)
```
Therefore, it underestimates the standard deviation. Let see what happens if we increase the sample size:
```{r}
n <- seq(10,200,by=5)
bias <- numeric()
for (i in 1:length(n))
bias[i] <- f(n[i])
plot(n, bias, type='l', main="Bias of the standard deviation S")
abline(h=0, lty=2, col="blue")
```
We can see that the bias seems to converge to 0. In other words, $s$ is asymptotically unbiased. The negative bias that we observe is the result of the Jensen's inequality: when $f(x)$ is concave, $E(f(x))
“`{r}
v <- apply(Xbar, 1, var)
plot(v[-(1:10)], type='l', main="Variance of the sample mean", ylab="variance",
xlab="n") # I remove n=1,...,10 to better see
```
We can also estimate the rate of convergence of the variance. If the rate is $n^\alpha$ or $O(n^{\alpha})$, we should have the relationship $V=Cn^{\alpha}$, which implies $\log(V)=\alpha_0+\alpha\log(n)$. This can be done using a regression:
```{r}
lm(log(v)~log(1:2000))
```
It shows that the variance converges to 0 at the same rate as $n^{-1}$. Some students were asking why we always write $\sqrt{n}(\bar{X}-\mu)$. We do that when we want to prevent the variance from going to 0. The variance of this expression is $nVar(\bar{X})$. Since the variance is $O(n^{-1})$, $nVar(\bar{X})$ is $O(1)$, which means that it does not converge to 0. We do it when we want the asymptotic distribution. The asymptotic distribution of $(\bar{X}-\mu)$ is a "degenerate" distribution because it is 0 with probability 1. We can see what happens using a graph:
```{r}
v <- apply(Xbar, 1, var)*(1:2000)
plot(v[-(1:10)], type='l', main="Variance of the sample mean times n", ylab="variance",
xlab="n") # I remove n=1,...,10 to better see
```
As a last exercise, you can use the definition of convergence in probability and compute the probability of $|\bar{X}-\mu|>\epsilon$ for some epsilon:
“`{r}
p1 <- rowMeans(abs(Xbar)>0.1)
p2 <- rowMeans(abs(Xbar)>0.08)
p3 <- rowMeans(abs(Xbar)>0.05)
p4 <- rowMeans(abs(Xbar)>0.03)
matplot(1:2000, cbind(p1,p2,p3,p4), type=’l’, col=1:4,xlab=”n”,ylab=”Prob.”,
main=”Probability of the mean being greater than e”, lty=1)
legend(“topright”, c(“e=0.1″,”e=0.8″,”e=0.5″,”e=0.3”), col=1:4, lty=1, bty=’n’)
“`
We see that the probability converges to 0 for the three values, which is a necessary condition for convergence in probability. In a simulation, if you set $\epsilon$ to a too small value, you won’t observe convergence because $n$ will not be high enough.
# Convergence in Distribution
To test the CLT, it is a little harder, because we have to compare densities. We can use normality tests to get a sense of how close the distribution is from the normal, but we won’t do it here. I just want to present a few visualization methods.
## Distribution of the sample mean
We know that $\bar{X}$ is $N(\mu, \sigma^2/n)$ when the sample comes from a normally distributed population with mean $\mu$ and variance $\sigma^2$. We can verify by comparing the densities. In the following, $\mu=5$, $\sigma^2=4$, $n=20$ and $N=5000$.
“`{r}
X <- sapply(1:5000, function(i) rnorm(20, 5, 2))
Xbar <- apply(X,2,mean)
hist(Xbar, freq=FALSE) # We want the density not the fequency
curve(dnorm(x, 5, 2/sqrt(20)), 3, 7, add=TRUE, col=2, lty=2, lwd=2)
```
The histogram does look normal. We can also use a smoothed density:
```{r}
plot(density(Xbar), main="Density of Xbar")
curve(dnorm(x, 5, 2/sqrt(20)), 3, 7, add=TRUE, col=2, lty=2, lwd=2)
```
Or we can use the QQ-plot that compares the quantiles of $\bar{X}$ with the quantiles of the normal distribution. Is it is normal, the points follow the straight line:
```{r}
qqnorm(Xbar)
qqline(Xbar)
```
You can also compare some quantiles of the normal with the the empirical quantiles of $\bar{X}$:
```{r}
q <- c(.01,.05,.1,.99,.95,.99)
rbind(quantile(Xbar, q), qnorm(q, 5, 2/sqrt(20)))
```
We can see how close they are from each other.
For non-normal observations, we need to rely on the asymptotic distribution. In that case, we use the CLT which states that $\sqrt{n}(\bar{X}-\mu) \to N(0,\sigma^2)$. It implies that $\bar{X}\approx N(\mu, \sigma^2/n)$. There is a more useful result that we can use. Since $\hat{\sigma}^2$ converges in probability to $\sigma^2$, then
\[
\frac{\sqrt{n}(\bar{X}-\mu)}{\hat{\sigma}} \to \frac{1}{\sigma}N(0,\sigma^2) = N(0,1)
\]
This is from: If $c_n$ converges in probability to $c$ and $x_n$ converges in distribution to $\xi$, then $c_nx_n$ converges in distribution to $c\xi$. We use this result to justify
\[
\frac{\sqrt{n}(\bar{X}-\mu)}{\hat{\sigma}} \approx N(0,1)
\]
How good is the approximation? Let's try with n=20 and $x_i=1$ with probability 0.5 and $x_i=0$ with probability 0.5:
```{r}
X <- sapply(1:5000, function(i) rbinom(20, size=1, prob=0.5))
z <- apply(X, 2, function(x) sqrt(20)*(mean(x)-0.5)/sd(x))
q <- c(.01,.05,.1,.99,.95,.99)
rbind(quantile(z, q), qnorm(q, 0, 1))
```
We can see that the quantiles are different when n=20. If we try with n=100 we get the following:
```{r}
X2 <- sapply(1:5000, function(i) rbinom(100, size=1, prob=0.5))
z2 <- apply(X2, 2, function(x) sqrt(100)*(mean(x)-0.5)/sd(x))
q <- c(.01,.05,.1,.99,.95,.99)
rbind(quantile(z2, q), qnorm(q, 0, 1))
```
The quantiles get a little closer to the N(0,1) when the sample size increases. In fact, it is not bad at all even with such small sample. Let's look at the QQ-plot
```{r}
qqnorm(z2)
qqline(z2)
```
In the graph, we see that the quantiles deviate from the normal distribution at the tails of the distribution. This is important, because when we perform hypothesis tests, we are interested in tails.