CS计算机代考程序代写 c++ Fortran algorithm Numerical Simulations

Numerical Simulations

Pierre Chausse

15/09/2021

Introduction

In this document, I want to introduce you to numerical simulations. In the first lecture, we covered a lot of
topics related to sampling theory. You do have to understand the theory, but I don’t want you to focus too
much on it. This is meant to be a review to prepare you for the next topics. Instead, I want to show you how
to use simulations to illustrate the different results I presented. I think it is a good way to understand the
results and it is a very useful tool that we will often use in this course. In fact, it will be the topic of next
assignment.

What is a numerical simulation?

Simulations are used to derive the properties of estimators when they cannot be derived analytically. For
example, we saw in class that the distribution of X̄ is unknown unless we assume that our sample comes from
a normally distributed population. Since the distribution is a property in repeated samples, we can simulate
what we would obtain if we had the opportunity of collecting multiple samples. In general, suppose we are
interested in the properties of T(X), a function of the sample X = {x1, x2, …, xn}. To obtain the properties,
we collect many samples from a given population (to be determined) and we save all realizations of T (X). If
we compute a large enough number of realizations, we can use them to approximate the properties of T (X).
The general algorithm is:

Simulation algorithm

1. We determine a DGP (Data generating process). This DGP is basically a specification for
our population of interest. It can be as simple as a distribution for the sample X or as
difficult as a complete model for a causality study.

2. We generate N datasets (with a given sample size) from the selected DGP.
3. We compute the statistics of interest T(X), where X is either a random vector or a random

dataset.
4. We analyze the properties of T(X) by approximating the infinite population of T (X) by its

N realizations.

How do we code simulations

Since we are generating multiple samples and are computing T (X) each time, it is natural to construct a
simulation using loops. T (X) could be a scalar, a vector or even a matrix. Since we have to store N T (X), we
need to create the appropriate object to store it. Here are the possible options (we assume that the function
f(x) generates T (X) and g(n) generates our sample):

T (X) is a scalar:

Tvec <- numeric(N) # We create a numerical vector of N elements # or # Tvec <- numeric() # We create an empty numerical vector. for (i in 1:N) { x <- g(n) Tvec[i] <- f(x) } T (X) is a vector: Here we want to store T (X) in a matrix. Each realization could be on each row of the matrix or on each column. The choice is yours. In the following, they are stored on each row (we suppose that T (X) has l elements). Tmat <- matrix(nrow=N, ncol=l) # We create a Nxl matrix of zeros for (i in 1:N) { x <- g(n) Tmat[i,] <- f(x) } Another option is to stack each T (X) on top of each other using rbind: Tmat <- numeric() # This command just creates Tmat so that it exists for (i in 1:N) { x <- g(n) Tmat <- rbind(Tvec, f(x)) } T (X) is a matrix or a more complicated object: This is not really something we will use in this course. But if you want to know, you have the option of storing the results in an array (3D matrix) or a list. In the following, I show how to do it with a list: Tmat <- list() # This command just creates an empty list for (i in 1:N) { x <- g(n) Tmat[[i]] <- f(x) # [[]] are used to select elements of a list } Can we avoid loops? (optional) Unless you program in C, C++ or Fortran, it is recommended to avoid loops when it is possible. However, it may require a little more thinking. Suppose we want to analyze the properties of X̄ for n = 40 and xi ∼ U(−1, 1). Suppose we want N=1,000. You can create a 40× 1000 matrix. In that case, each column is a sample. This can be done using the following code: 2 X <- matrix(runif(40000), 40, 1000) Then we can simply compute the mean of each column: Tvec <- colMeans(X) Of course, this approach requires more RAM because you need to store a 40x1000 matrix. This is certainly not a problem, but if you want n=2000 and N=10000, your computer may not have enough RAM. In that case, you may have to rely on loops. Just to see the difference, you can compare the computational time of the two approaches (written in a function for simplicity: f1 <- function() { Tvec <- numeric() for (i in 1:1000) Tvec[i] <- mean(runif(40,01,1)) } f2 <- function() { Tvec <- colMeans(matrix(runif(40000,40,1000))) } (t1 <- system.time(f1())) ## user system elapsed ## 0.037 0.000 0.036 (t2 <- system.time(f2())) ## user system elapsed ## 0.001 0.000 0.001 It is about 36 times faster without the loop. But, who cares if it takes 0.036 seconds instead of 0.001 seconds? In that particular case, it is not worth the effort of avoiding the loop. Note that the system time computed above changes each time I recompile the Rmd file. It depends on many factors (e.g. which other software are running at the same time). It will also be different for you because you have a different system. If you want a more precise measure of the relative time, the function benchmark from the #rbenchmark# package measures the relative time by repeating the functions 100 times: library(rbenchmark) benchmark(loop=f1(), matrix=f2()) In the above function, I use colMeans to appy the function mean to each column. In fact, the function calls the function apply. This is a very useful function when you want to run a function on each column or row. The following lines are equivalent: A <- matrix(rnorm(1000),200,5) colMeans(A) ## [1] -0.004863901 -0.054034170 -0.074240232 0.027931203 -0.064889274 apply(A,2,mean) # applies mean to the second dimension (the columns) ## [1] -0.004863901 -0.054034170 -0.074240232 0.027931203 -0.064889274 colSums(A) ## [1] -0.9727802 -10.8068340 -14.8480465 5.5862406 -12.9778547 apply(A,2,sum) # applies sum to the second dimension (the columns) 3 ## [1] -0.9727802 -10.8068340 -14.8480465 5.5862406 -12.9778547 You can even write more complicated functions: apply(A,2,function(x) c(mean(x), sd(x), length(x), min(x),max(x))) ## [,1] [,2] [,3] [,4] [,5] ## [1,] -0.004863901 -0.05403417 -0.07424023 0.0279312 -0.06488927 ## [2,] 0.934366230 0.95020050 1.00383562 1.0327412 1.08296687 ## [3,] 200.000000000 200.00000000 200.00000000 200.0000000 200.00000000 ## [4,] -2.671902412 -2.98612968 -2.59447978 -2.4786914 -2.95567114 ## [5,] 2.448701870 3.01068154 2.22828940 2.6380261 2.63241659 Can we use parallel programing? If you have multiple cores on your computer or have access to a super computer, it may be beneficial to do some parallel programming: doing more than one estimation simultaneously. There are more than one way to do it in R. I will show you the easiest one. Unfortunately, it does not work on Windows operating systems. Consider the following function: f <- function(n) { x <- runif(n, -1,1) mean(x) } f(40) ## [1] -0.1271196 f(40) ## [1] 0.09172116 The function generate a random vector of size n from a U(−1, 1) distribution and returns its sample mean. You can send that function to multiple cores simulataneously using the ‘mclapply’ function from the #parallel# package: library(parallel) res <- mclapply(1:1000, function(i) f(40), mc.cores=8) # my laptop has 8 cores Since res is a list, we can convert the list using the following: Tvec <- simplify2array((res)) I used this function all the time in my research. When one estimation takes several seconds and I use my 32-core workstation, the simulation time is substantially reduced. It is worth it in this case? Not really. Estimating the bias The sample standard deviation For this exercise, we want to estimate the bias of s. We saw that s2 is an unbiased estimator for the variance, but s is biased since it is a non linear transformation of s2. For that purpose, our DGP is a chi-square distribution with 8 degrees of freedom (the variance is 16, which implies σ = 4). In the following, I use a function that returns the bias for a given sample size using 2000 replications. You do not have to use a function, but it is easier this way since we will try different sample sizes. 4 f <- function(n) { Tvec <- numeric() for (i in 1:2000) { x <- rchisq(n, 8) Tvec[i] <- sd(x) } mean(Tvec)-4 ## estimated bias } Then, we can try with n=10: set.seed(112233) # To make it reproducible f(10) ## [1] -0.1918991 Therefore, it underestimates the standard deviation. Let see what happens if we increase the sample size: n <- seq(10,200,by=5) bias <- numeric() for (i in 1:length(n)) bias[i] <- f(n[i]) plot(n, bias, type='l', main="Bias of the standard deviation S") abline(h=0, lty=2, col="blue") 50 100 150 200 − 0 .1 5 − 0 .1 0 − 0 .0 5 0 .0 0 Bias of the standard deviation S n b ia s We can see that the bias seems to converge to 0. In other words, s is asymptotically unbiased. The negative bias that we observe is the result of the Jensen’s inequality: when f(x) is concave, E(f(x)) < f(E(x)). In this case, E( √ S2) ≤ √ E(S2) = σ. It is equal only if S2 = 0 or if the variable is not random. The unbiasedness of S2 We can repeat the exercise for S2 (the true variance is 16): f2 <- function(n) { Tvec <- numeric() for (i in 1:2000) 5 { x <- rchisq(n, 8) Tvec[i] <- var(x) } mean(Tvec)-16 ## estimated bias } n <- seq(10,200,by=5) bias <- numeric() for (i in 1:length(n)) bias[i] <- f2(n[i]) plot(n, bias, type='l', main=expression("Bias of"~Sˆ2)) abline(h=0, lty=2, col="blue") 50 100 150 200 − 0 .5 − 0 .3 − 0 .1 0 .1 Bias of S2 n b ia s We can see that we do not get exactly 0. Since we are estimating the bias, the measure is a random variable with a different realization every time to re-run the simulation. For simulations, N is the sample size, so the precision increases with N. Efficiency of estimators To compare the efficiency of estimators, we use the same procedure, but we compute the standard deviation instead. We never compute the variance because it is easier to interpret the standard deviation. It has the same units as the random variable. We saw that X̄ is BLUE. It means that the variance of any other estimator in the form µ̃ = ∑n i=1 ωixi, with ∑n i=1 ωi = 1, is smaller than the variance of X̄. Let’s test that result. For this simulation, the sample size is set to 30, X ∼ t5 and N = 5000. We also set ωi to 1/20 for i = 1, ..., 15 and to 1/60 for i = 16, ..., 30 (verify that the sum is indeed 1). In the following, I use the apply function: set.seed(112233) w <- c(rep(1/20,15), rep(1/60,15)) X <- matrix(rt(30*5000,5), 30, 5000) Xbar <- apply(X, 2, mean) muTilde <- apply(X, 2, function(x) sum(x*w)) c(Xbar=sd(Xbar), muTilde=sd(muTilde)) ## Xbar muTilde ## 0.2371467 0.2636798 6 We can see that the standard deviation of X̄ is smaller than the one for µ̃. Therefore, X̄ is more efficient. Of course, it only proves the superiority of X̄ for this particular vector of weights and this particular distribution. Mean Squared Error (MSE) When we compare estimators, especially if one of them or both are biased, we have to compare the MSE. In fact, we usually compute the root mean squared error (RMSE) for the same reason we compute the standard error. It is simply the square root of the MSE. The formula is RMSE = √√√√ 1 n n∑ i=1 (θ̂ − θ)2 , where θ is the population value. We want to compare the following estimators of the variance: S2 = 1 n− 1 n∑ i=1 (xi − x̄)2 σ̂2 = 1 n n∑ i=1 (xi − x̄)2 For this simulation, the sample size is set to 20, X ∼ t5 and N = 5000. The population variance is therefore 10. We first compute the two estimators: set.seed(112233) X <- matrix(rt(20*5000,5), 20, 5000) S2 <- apply(X, 2, sd) sigma2 <- apply(X,2,function(x) mean((x-mean(x))ˆ2)) Then, we compare the RMSE: rmse1 <- sqrt(mean((S2-10)ˆ2)) rmse2 <- sqrt(mean((sigma2-10)ˆ2)) c(S2=rmse1, Sigma2=rmse2) ## S2 Sigma2 ## 8.752529 8.462327 We can see that σ̂2 is better than S2, because its RMSE is smaller. This is an example of a biased estimator that dominates an unbiased one. This is hard to prove analytically because the variance of the estimators depends on the distribution of X. We know that the variance of σ̂2 is smaller, but we do not know how smaller it is. Consistency of estimators Proving consistency is harder, because it is a property when n goes to infinity. It is impossible to have such large sample size in practice. However, we can observe some trends as n increases. 7 Consistency of X̄ Let’s analyze the propertied for n = 1, ..., 2000 and $N=2000. We can do it using the cumulative sum function cumsum: set.seed(112233) X <- sapply(1:2000, function(i) rnorm(2000)) Xbar <- apply(X,2,cumsum) Xbar <- Xbar/(1:2000) Here, I use two new concepts. The sapply is like a loop. It goes from 1 to 2000, computes rnorm(2000) and put the results side by side. It is similar to the preview approach of using matrix, but I wanted to teach you another way. I also use an element by element operation on a matrix and a vector. This is allowed in R. Since the length of the vector is equal to the number of rows of the matrix, the element by element operation is applied to each column. Therefore, each column contains 2000 = barX’s for n=1„,2000 and we have 2000 such columns. First, let’s compare the sample means for a few n’s: par(mar=c(5.1, 4.1, 4.1, 8.1), xpd=TRUE) plot(Xbar[20,], main="Illustrating the consistency of the sample mean", ylab="", xlab="Iterations") points(Xbar[200,], col=2, pch=21, bg=2) points(Xbar[1000,], col=3, pch=21, bg=3) points(Xbar[2000,], col=4, pch=21, bg=4) legend("topright", c("n=20", "n=200", "n=1000", "n=2000"), pt.bg = 1:4, col=1:4, pch=c(1,21,21,21), inset=c(-0.22,0), bty='n') 0 500 1000 1500 2000 − 0 .5 0 .0 0 .5 Illustrating the consistency of the sample mean Iterations n=20 n=200 n=1000 n=2000 The graph shows that the values of X̄ gets more and more concentrated around its mean, which is what we mean by convergence in probability: the probability of |X̄ − µ| > � is smaller and smaller when n increases.
We saw that it is sufficient to prove that the variance goes to zero, so another way to show convergence is to
plot the variance as a function of the sample size:
v <- apply(Xbar, 1, var) plot(v[-(1:10)], type='l', main="Variance of the sample mean", ylab="variance", xlab="n") # I remove n=1,...,10 to better see 8 0 500 1000 1500 2000 0 .0 0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 Variance of the sample mean n va ri a n ce We can also estimate the rate of convergence of the variance. If the rate is nα or O(nα), we should have the relationship V = Cnα, which implies log(V ) = α0 + α log(n). This can be done using a regression: lm(log(v)~log(1:2000)) ## ## Call: ## lm(formula = log(v) ~ log(1:2000)) ## ## Coefficients: ## (Intercept) log(1:2000) ## -0.003359 -0.996205 It shows that the variance converges to 0 at the same rate as n−1. Some students were asking why we always write √ n(X̄ − µ). We do that when we want to prevent the variance from going to 0. The variance of this expression is nV ar(X̄). Since the variance is O(n−1), nV ar(X̄) is O(1), which means that it does not converge to 0. We do it when we want the asymptotic distribution. The asymptotic distribution of (X̄ − µ) is a “degenerate” distribution because it is 0 with probability 1. We can see what happens using a graph: v <- apply(Xbar, 1, var)*(1:2000) plot(v[-(1:10)], type='l', main="Variance of the sample mean times n", ylab="variance", xlab="n") # I remove n=1,...,10 to better see 0 500 1000 1500 2000 0 .9 8 1 .0 0 1 .0 2 1 .0 4 1 .0 6 Variance of the sample mean times n n va ri a n ce As a last exercise, you can use the definition of convergence in probability and compute the probability of 9 |X̄ − µ| > � for some epsilon:
p1 <- rowMeans(abs(Xbar)>0.1)
p2 <- rowMeans(abs(Xbar)>0.08)
p3 <- rowMeans(abs(Xbar)>0.05)
p4 <- rowMeans(abs(Xbar)>0.03)
matplot(1:2000, cbind(p1,p2,p3,p4), type=’l’, col=1:4,xlab=”n”,ylab=”Prob.”,

main=”Probability of the mean being greater than e”, lty=1)
legend(“topright”, c(“e=0.1″,”e=0.8″,”e=0.5″,”e=0.3”), col=1:4, lty=1, bty=’n’)

0 500 1000 1500 2000

0
.0

0
.2

0
.4

0
.6

0
.8

1
.0

Probability of the mean being greater than e

P
ro

e=0.1
e=0.8
e=0.5
e=0.3

We see that the probability converges to 0 for the three values, which is a necessary condition for convergence
in probability. In a simulation, if you set � to a too small value, you won’t observe convergence because n will
not be high enough.

Convergence in Distribution

To test the CLT, it is a little harder, because we have to compare densities. We can use normality tests to
get a sense of how close the distribution is from the normal, but we won’t do it here. I just want to present a
few visualization methods.

Distribution of the sample mean

We know that X̄ is N(µ, σ2/n) when the sample comes from a normally distributed population with mean
µ and variance σ2. We can verify by comparing the densities. In the following, µ = 5, σ2 = 4, n = 20 and
N = 5000.
X <- sapply(1:5000, function(i) rnorm(20, 5, 2)) Xbar <- apply(X,2,mean) hist(Xbar, freq=FALSE) # We want the density not the fequency curve(dnorm(x, 5, 2/sqrt(20)), 3, 7, add=TRUE, col=2, lty=2, lwd=2) 10 Histogram of Xbar Xbar D e n si ty 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0 .0 0 .2 0 .4 0 .6 0 .8 The histogram does look normal. We can also use a smoothed density: plot(density(Xbar), main="Density of Xbar") curve(dnorm(x, 5, 2/sqrt(20)), 3, 7, add=TRUE, col=2, lty=2, lwd=2) 3 4 5 6 0 .0 0 .2 0 .4 0 .6 0 .8 Density of Xbar N = 5000 Bandwidth = 0.07412 D e n si ty Or we can use the QQ-plot that compares the quantiles of X̄ with the quantiles of the normal distribution. Is it is normal, the points follow the straight line: qqnorm(Xbar) qqline(Xbar) 11 −4 −2 0 2 4 3 .5 4 .5 5 .5 6 .5 Normal Q−Q Plot Theoretical Quantiles S a m p le Q u a n til e s You can also compare some quantiles of the normal with the the empirical quantiles of X̄: q <- c(.01,.05,.1,.99,.95,.99) rbind(quantile(Xbar, q), qnorm(q, 5, 2/sqrt(20))) ## 1% 5% 10% 99% 95% 99% ## [1,] 3.962516 4.262441 4.428501 6.057918 5.746840 6.057918 ## [2,] 3.959626 4.264399 4.426873 6.040374 5.735601 6.040374 We can see how close they are from each other. For non-normal observations, we need to rely on the asymptotic distribution. In that case, we use the CLT which states that √ n(X̄ − µ) → N(0, σ2). It implies that X̄ ≈ N(µ, σ2/n). There is a more useful result that we can use. Since σ̂2 converges in probability to σ2, then √ n(X̄ − µ) σ̂ → 1 σ N(0, σ2) = N(0, 1) This is from: If cn converges in probability to c and xn converges in distribution to ξ, then cnxn converges in distribution to cξ. We use this result to justify √ n(X̄ − µ) σ̂ ≈ N(0, 1) How good is the approximation? Let’s try with n=20 and xi = 1 with probability 0.5 and xi = 0 with probability 0.5: X <- sapply(1:5000, function(i) rbinom(20, size=1, prob=0.5)) z <- apply(X, 2, function(x) sqrt(20)*(mean(x)-0.5)/sd(x)) q <- c(.01,.05,.1,.99,.95,.99) rbind(quantile(z, q), qnorm(q, 0, 1)) ## 1% 5% 10% 99% 95% 99% ## [1,] -2.516611 -1.902379 -1.370810 2.516611 1.902379 2.516611 ## [2,] -2.326348 -1.644854 -1.281552 2.326348 1.644854 2.326348 We can see that the quantiles are different when n=20. If we try with n=100 we get the following: X2 <- sapply(1:5000, function(i) rbinom(100, size=1, prob=0.5)) z2 <- apply(X2, 2, function(x) sqrt(100)*(mean(x)-0.5)/sd(x)) q <- c(.01,.05,.1,.99,.95,.99) rbind(quantile(z2, q), qnorm(q, 0, 1)) 12 ## 1% 5% 10% 99% 95% 99% ## [1,] -2.459865 -1.612757 -1.202676 2.459865 1.612757 2.459865 ## [2,] -2.326348 -1.644854 -1.281552 2.326348 1.644854 2.326348 The quantiles get a little closer to the N(0,1) when the sample size increases. In fact, it is not bad at all even with such small sample. Let’s look at the QQ-plot qqnorm(z2) qqline(z2) −4 −2 0 2 4 − 2 0 2 4 Normal Q−Q Plot Theoretical Quantiles S a m p le Q u a n til e s In the graph, we see that the quantiles deviate from the normal distribution at the tails of the distribution. This is important, because when we perform hypothesis tests, we are interested in tails. 13 Introduction What is a numerical simulation? Simulation algorithm How do we code simulations T(X) is a scalar: T(X) is a vector: T(X) is a matrix or a more complicated object: Can we avoid loops? (optional) Can we use parallel programing? Estimating the bias The sample standard deviation The unbiasedness of S^2 Efficiency of estimators Mean Squared Error (MSE) Consistency of estimators Consistency of \bar{X} Convergence in Distribution Distribution of the sample mean

Related Posts