Computer Lab Week 11: solutions
STAT221
Bootstrap Bootstrapping Quantiles
One of the major benefits of bootstrapping is that it does not require assumptions about the population distribution, or any statistical theory for the distribution of the statistic. So it’s useful when the statistic has an unknown or complicated sampling distribution. Here we will explore the sampling distribution of various quantiles, for which there is no general result for their sampling distribution.
1. Simulate n = 200 observations from a standard normal distribution. Use the quantile function to estimate the 10%, 25%, 50%, 75% and 90% quantiles of the distribution, using this sample data. These are often referred to as the empirical quantiles. Also calculate the theoretical quantiles for the standard normal (for example by using qnorm).
2. Use bootstrapping to estimate a 90% confidence interval for each of these quantiles.
3. Produce a density histogram of each of the bootstrapped quantiles. Notice how the sample median is close to symmetric, but the others are less symmetric. In fact, as you go further into the tail the sampling distribution will get less and less symmetric. You may want to try some really extreme quantiles, e.g. 5% and 95%. Add vertical lines to represent the empirical and theoretical quantiles, along with the 90% confidence interval. Think about why the median might be more symmetric than the quantiles further out in the tails.
4. Repeat the above for n = 200 observation from a standard exponential distribution, again focussing on the behaviour of the sampling distributions for the 10% and 90% quantiles.
## 1)
n = 100
x = rnorm(n)
p = c(0.1, 0.25, 0.5, 0.75, 0.9) (equant = quantile(x, probs=p))
##
## -1.4463648 -0.7612589 -0.0930005 0.7521282 1.4182357
10% 25% 50% 75% 90%
(tquant = qnorm(p))
## [1] -1.2815516 -0.6744898 0.0000000 0.6744898 1.2815516
## 2)
N = 10000
P10 = P25 = P50 = P75 = P90 = numeric(N) for (i in 1:N){
xstar = sample(x, replace = TRUE) quants = quantile(xstar, probs = p) P10[i] = quants[1]
P25[i] = quants[2]
P50[i] = quants[3]
P75[i] = quants[4]
P90[i] = quants[5]
}
1
P10.ci = quantile(P10, probs = c(0.05, 0.95)) P25.ci = quantile(P25, probs = c(0.05, 0.95)) P50.ci = quantile(P50, probs = c(0.05, 0.95)) P75.ci = quantile(P75, probs = c(0.05, 0.95)) P90.ci = quantile(P90, probs = c(0.05, 0.95))
## 3)
hist(P10, freq=FALSE)
abline(v = c(equant[1], tquant[1]), col=c(“red”,”blue”)) abline(v = P10.ci)
Histogram of P10
−2.0 −1.5 −1.0
P10
hist(P25, freq=FALSE)
abline(v = c(equant[2], tquant[2]), col=c(“red”,”blue”)) abline(v = P25.ci)
2
Density
0123
Histogram of P25
−1.4 −1.2
−1.0 −0.8 −0.6 −0.4 −0.2
P25
Histogram of P50
hist(P50, freq=FALSE)
abline(v = c(equant[3], tquant[3]), col=c(“red”,”blue”)) abline(v = P50.ci)
−0.6 −0.4 −0.2
0.0 0.2 0.4 0.6
P50
3
Density Density
0.0 0.5 1.0 1.5 2.0 2.5 0 1 2 3 4
hist(P75, freq=FALSE)
abline(v = c(equant[4], tquant[4]), col=c(“red”,”blue”)) abline(v = P75.ci)
Histogram of P75
0.2 0.4 0.6
0.8 1.0 1.2
hist(P90, freq=FALSE)
abline(v = c(equant[5], tquant[5]), col=c(“red”,”blue”)) abline(v = P90.ci)
4
P75
Density
012345
Histogram of P90
1.0
1.5 2.0
P90
## 4)
## same with xe = rexp(n)
5
Density
0.0 0.5 1.0 1.5 2.0 2.5