Resampling Methods
Chapter 5
March 11, 2020
1
1 5.1. Validation set approach
2 5.2. Cross validation
3 5.3. Bootstrap
2
• An important statistical tool.
• Pretending the data as population and repeatedly draw sample from the data.
• Main task: assess the validity/accuracy of statistical methods and models.
• Cross-validation: estimate the test error of models • Bootstrap: quantify the uncertainty of estimators
About Resampling
3
Validation and Cross validation
• Validation set approach.
• LOOCV (Leave-one-out cross valiation) • K-fold cross validation.
4
• Sampling with replacement (typically) n times, n is the sample size of data.
• Especially useful in statistical inference to quantify the uncertainty of estimates (can be even more accurate than normal approximation).
• All purpose resampling procedure.
• Used in ensemble methods of machine learning, for example,
• bagging,
• random forest.
Bootstrap
5
Training error is not sufficient enough
• training error easily computable with training data.
• because of possibility of over-fit, it cannot be used to properly
assess test error.
• It is possible to ”estimate” the test error, by, for example, making adjustments of the training error.
• The adjusted R-squared, AIC, BIC, etc serve this purpose.
• These methods rely on certain assumptions and are not for general purpose.
6
Test error: cross-validation
• Test error would be also easily computable, if test data are well designated.
• Normally we are just given … data.
• Shall have to create “test data” for the purpose of computing
test error.
• Artificially separate data into “training data” and “test data” for validation purpose is called cross-validation.
• The “test data” here should be more accurately called validation data or hold out data, meaning that they not used in training.
• Model fitting only uses the training data.
7
Ideal scenario for performance assessment
• In a “data-rich” scenario, we can afford to separate the data into three parts:
• training data: used to train various models.
• validation data: used to assess the models and identify the
best.
• test data: test the results of the best model.
• Usually, people also call validation data or hold-out data as test data.
8
Validation set approach
Figure: 5.1. A schematic display of the validation set approach. A set of n observations are randomly split into a training set (shown in blue, containing observations 7, 22, and 13, among others) and a validation set (shown in beige, and containing observation 91, among others). The statistical learning method is fit on the training set, and its performance is evaluated on the validation set.
9
• A non-linear relationship between mpg and horsepower
• mpg ∼ horsepower + horsepower2 is better than mpg ∼
horsepower.
• Should we add higher terms into the model?, like as cubic or even higher?
• One can check the p-values of regression coeffeicients to answer the question.
• In fact, a model selection problem, and we can use validation set approach.
Example: Auto Data
10
• randomly split the 392 observations into two sets:
• a training set containing 196 of the data points, and a
validation set containing the remaining 196 observations.
• fit various regression models on the training sample
• The validation set error rates result from evaluating their performance on the validation sample.
• Here we MSE as a measure of validation set error, are shown in the left-hand panel of Figure 5.2.
Example: Auto Data
11
Validation set approach
2 4 6 8 10 2 4 6 8 10
Degree of Polynomial Degree of Polynomial
Figure: 5.2. The validation set approach was used on the Auto data set in order to estimate the test error that results from predicting mpg using polynomial functions of horsepower. Left: Validation error estimates for a single split into training and validation data sets. Right: The validation method was repeated ten times, each time using a different random split of the observations into a training set and a validation set. This illustrates the variability in the estimated test MSE that results from this approach.
12
Mean Squared Error
16 18 20 22 24 26 28
Mean Squared Error
16 18 20 22 24 26 28
Example: Auto Data
• The validation set MSE for the quadratic fit is considerably smaller than for the linear fit.
• validation set MSE for the cubic fit is actually slightly larger than for the quadratic fit.
• This implies that including a cubic term in the regression does NOT lead to better prediction than simply using a quadratic term.
• Repeat the process of randomly splitting the sample set into two parts, we will get a somewhat different estimate for the test MSE.
13
Example: Auto Data
• A quadratic term has a dramatically smaller validation set MSE than the model with only a linear term.
• Not much benefit in including cubic or higher-order polynomial terms in the model.
• Each of the ten curves results in a different test MSE estimate for each of the ten regression models considered.
• No consensus among the curves as to which model results in the smallest validation set MSE.
• Based on the variability among these curves, all that we can conclude with any confidence is that the linear fit is not adequate for this data.
• The validation set approach is conceptually simple and is easy to implement.
14
• The validation estimate of the test error rate can be highly variable, depending on the random split.
• Only a subset of the observations–the training set are used to fit the model.
• Statistical methods tend to perform worse when trained on fewer observations.
• The validation set error rate may tend to overestimate the test error rate for the model fit on the entire data set.
A summary
15
Cross validation: overcome the drawback of validation set approach
• Our ultimate goal is to produce the best model with best prediction accuracy.
• Validation set approach has a drawback of using ONLY training data to fit model.
• The validation data do not participate in model building but only model assessment.
• A “waste” of data.
• We need more data to participate in model building.
16
Another drawback of validation set approach
• It may over-estimate the test error for the model with all data used to fit.
• Statistical methods tend to perform worse when trained on fewer observations.
• The validation set error rate may tend to overestimate the test error rate for the model fit on the entire data set.
• The method of cross validation could overcome these drawbacks, by effectively using EVERY data point in model building!
17
The leave-one-out cross-validation
• Suppose the data contain n data points.
• First, pick data point 1 as validation set, the rest as training set. fit the model on the training set, evaluate the test error, on the validation set, denoted as say MSE1.
• Second, pick data point 2 as validation set, the rest as training set. fit the model on the training set, evaluate the test error on the validation set, denoted as say MSE2.
• ….. (repeat the procedure for all data point.)
• Obtain an estimate of the test error by combining the MSEi , i = 1, …, n:
1 n
CV(n) = n
MSEi
i=1
18
Figure: 5.3. A schematic display of LOOCV. A set of n data points is repeatedly split into a training set (shown in blue) containing all but one observation, and a validation set that contains only that observation (shown in beige). The test error is then estimated by averaging the n resulting MSE’s. The first training set contains all but observation 1, the second training set contains all but observation 2, and so forth.
LOOCV
19
Advantages of LOOCV
• Far less bias, since the training data size (n − 1) is close to the entire data size (n).
• One single test error estimate (thanks to the averaging), without the variablity validation set approach.
• A disadvantage: could be computationally expensive since the model need to be fit n times.
• The MSEi may be too much correlated.
20
LOOCV
10−fold CV
LOOCV applied to Auto data:
2 4 6 8 10
Degree of Polynomial
2 4 6 8 10
Degree of Polynomial
Figure: 5.4. Cross-validation was used on the Auto data set in order to estimate the test error that results from predicting mpg using polynomial functions of horsepower. Left: The LOOCV error curve. Right: 10-fold CV was run nine separate times, each with a different random split of the data into ten parts. The figure shows the nine slightly different CV error curves.
21
Mean Squared Error
16 18 20 22 24 26 28
Mean Squared Error
16 18 20 22 24 26 28
Complexity of LOOCV in linear model?
• Consinder linear model:
yi =xTi β+εi, i=1,..,n
and the fitted values yi = xTi βˆ, where βˆ is the least squares estimate of β based on all data (xi,yi),i = 1,…,n.
• Using LOOCV, the
CV(n) = (yi − yˆ(i))2
1n ni
i=1
where yˆ(i ) = xT βˆ(i ) is the model predictor of yi based on the
ii
linear model fitted by all data except (xi , yi ) (delete one), i.e.,
βˆ(i) is the least squares estimate of β based on all data but (xi,yi).
22
Simple Formula of LOOCV in linear model
• Looks to be complicated to compute least squares estimate n times.
• Easy formula:
1 n y − y ˆ 2 CV(n) = i i ni=1 1−hi
where yˆ is the fitted values of least squares method based on i
all data. hi is the leverage.
23
Recall: Leverage
• Recall the hat matrix
H = X(XT X)−1XT as yˆ = Hy.
Let hij = xTi (XT X)−1xj be the (i, j) elements of H.
• The leverage of the i-th observation is just the i-th diagonal
element of H, denoted as hii .
• A high leverage implies that observation is quite influential.
Note that the average of hii is (p + 1)/n.
• E.g., if hii is greater than 2(p + 1)/n, twice of the average, is generally considered large.
24
Simplicity of LOOCV in linear model
• One fit (with all data) does it all!
• The prediction error rate (in terms of MSE) is just weighted
average of the least squares fit residuals.
• High leverage point gets more weight in prediction error estimation.
25
Fast computation of cross-validation I
• The leave-one-out cross-validation statistic is given by
1 N
CV =
wheree =y −yˆ ,theobservationsaregivenbyy ,…,y ,and
[ei/(1−hi)]2,
where e = y − yˆ is predicted value obtained when the model is
CV = N estimated with all data included.
e2 , [i]
N
i=1
[i] i [i] 1 N
yˆ is the predicted value obtained when the model is estimated [i]
with the ith case deleted.
• Suppose we have a linear regression model Y = Xβ + e. The
βˆ=(XTX)−1XTYandH=X(XTX)−1XT isthehatmatrix.It has this name because it is used to compute Yˆ = Xβˆ = HY. If the diagonal values of H are denoted by h1, . . . , hN , then the leave-one-oout cross-validation statistic can be computed using
1 N
iii
i=1
26
Fast computation of cross-validation II
Proof
• Let X[i] and Y[i] be similar to X and Y but with the ith row deleted
in each case. Let xTi be the ith row of X and let βˆ[i] = (XT[i]X[i])−1XT[i]Y[i]
be the estimate of β without the ith case. Then e[i] = yi − xTi βˆ[i].
• NowXT[i]X[i] =(XTX−xixTi )andxTi (XTX)−1xi =hi. Sobythe
Sherman-Morrison-Woodbury formula,
T −1 T −1 (XTX)−1xixTi (XTX)−1 (X[i]X[i]) =(X X) + 1−h .
i
27
Fast computation of cross-validation III
Proof
• Also note that XT[i]Y[i] = XT Y − xi yi . Therefore
ˆ T − 1 ( X T X ) − 1 x i x Ti ( X T X ) − 1 T
β[i]=(XX) +
ˆ (XTX)−1xi
(XY−xiyi) [yi(1−hi)−xi β+hiyi]
=β− 1−h i
Tˆ
1−h i
• Thus
= βˆ − (XT X)−1xi ei /(1 − hi ) e [ i ] = y i − x Ti βˆ [ i ]
=y −xT βˆ−(XTX)−1xe/(1−h) iiiii
=ei +hiei/(1−hi)=ei/(1−hi)
28
K-fold cross validation
• Divide the data into K subsets, usually of equal or similar sizes (n/K ).
• Treat one subset as validation set, the rest together as a training set. Run the model fitting on training set. Calculate the test error estimate on the validation set, denoted as MSEi , say.
• Repeat the procedures over every subset.
• Average over the above K estimates of the test errors, and
obtain
1 K
CV(K) = K
• LOOCV is a special case of K-fold cross validation, actually n-fold cross validation.
MSEi
i=1
29
K-fold cross validation
Figure: 5.5. A schematic display of 5-fold CV. A set of n observations is randomly split into five non-overlapping groups. Each of these fifths acts as a validation set (shown in beige), and the remainder as a training set (shown in blue). The test error is estimated by averaging the five resulting MSE estimates.
30
K-fold cross validation
• CommonchoicesofK:K=5orK=10.
• Advantage over LOOCV: 1. computationally lighter, espeically for complex model with large data. 2. Likely less variance (to be addressed later)
• Advantage over validation set approach: Less variability resulting from the data-split, thanks to the averaging.
31
2 5 10 20 2 5 10 20 2 5 10 20
Flexibility Flexibility Flexibility
Figure: 5.6. True and estimated test MSE for the simulated data sets in Figures 2.9 (left), 2.10 ( center), and 2.11 (right). The true test MSE is shown in blue, the LOOCV estimate is shown as a black dashed line, and the 10-fold CV estimate is shown in orange. The crosses indicate the minimum of each of the MSE curves.
32
Mean Squared Error
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Mean Squared Error
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Mean Squared Error
0 5 10 15 20
Figure 2.9
0 20 40 60 80 100 2 5 10 20
X Flexibility
Figure: 2.9. Left: Data simulated from f, shown in black. Three estimates of f are shown: the linear regression line (orange curve), and two smoothing spline fits (blue and green curves). Right: Training MSE (grey curve), test MSE (red curve), and minimum possible test MSE over all methods (dashed line). Squares represent the training and test MSEs for the three fits shown in the left-hand panel.
33
Y
2 4 6 8 10 12
Mean Squared Error
0.0 0.5 1.0 1.5 2.0 2.5
Figure 2.10
0 20 40 60 80 100 2 5 10 20
X Flexibility
Figure: 2.10. Details are as in Figure 2.9, using a different true f that is much closer to linear. In this setting, linear regression provides a very good fit to the data.
34
Y
2 4 6 8 10 12
Mean Squared Error
0.0 0.5 1.0 1.5 2.0 2.5
Figure 2.11
0 20 40 60 80 100 2 5 10 20
X Flexibility
Figure: 2.11. Details are as in Figure 2.9, using a different f that is far from linear. In this setting, linear regression provides a very poor fit to the data.
35
−10 0 10 20
Y
Mean Squared Error
0 5 10 15 20
Special interests in the complexity parameter at miminum test error
• A family of models indexed by a parameter, usually represeting flexibity or complexity of models.
• Such parameter is often called tuning parameter; it could even be a number of variables.
• Example: Order of polynomials of horsepower in the Auto data example.
• Example: Penalization parameters in ridge, lasso, etc (to be addressed in the next chapter.)
• Intend to find the best model within this family, i.e., to find the value of this tuning parameter.
• Care less of the actual value of the test error.
• In the above simulated data, all of the CV curves come close to identify the correct level of flexibility.
36
Bias variance trade-off
• In terms of bias of estimation of test error: validation set appraoch has more bias due to smaller size of training data; LOOCV is nearly unbiased; K-fold (e.g, K=5 or 10) has intermediate bias.
• In view of bias, LOOCV is most preferred; and K-fold cross validation next.
• But, K-fold cross validation has smaller variance than that of LOOCV.
• The n traing sets LOOCV are too similar to each other. As a result, the trained models are too postively correlated.
• The K training sets of K-fold cross validation are much less similar to each other.
• As a result, the K-fold cross validation generally has less variance than LOOCV.
37
Figure: ELS Fig 7.14. Conditional prediction-error ErrT , 10-fold cross-validation, and leave-one-out cross-validation curves.
38
Cross validation for classification
• MSE is a popular criterio to measure predition/estimation accuracy for regression.
• There are other criteria.
• For classification with qualitative response, a natural choice is:
1 for incorrect classification and 0 for correct classification.
• For LOOCV, this leads to Erri = I(yi ̸= yˆ(i)), where y(i) is the
classification of i-th observation based on model fitted not using i-th observation.
• Then,
1 n
CV(n) = n
Erri
which is just the average number of incorrect classifaciton..
ii
i=1
39
oo oooo
o
o oo o o
oo
oo o oooo
oo o
o o o ooo
ooooooo oo
oooo ooo o oooooo o
oo oo oo o o ooooooo
oo ooooo oo oo o o o o
Example
X2
40
ooo o ooo oo o o oo
oooooooo o o o o ooooo
oo o o oooooo
oo
o
oo oo o
ooo ooo o ooooooo
oo ooo ooo
oo oooo ooo oo
ooo o o o oo o
o oooo oo o
o
o
o
o
X1
oo
Figure 2.13. A simulated data set consisting of 100 observations in each of two groups, indicated in blue and in orange. The purple dashed line represents the Bayes decision boundary. The orange background grid indicates the region in which a test observation will be assigned to the orange class, and the blue background grid indicates the region in which a test observation will be assigned to the blue class.
41
42
oo o o
o o ooo o
oo o
o o
o
oo ooo
o
oo ooo
o
o
o
ooooo oo
o ooo o o ooooo oo o o o oo o o
o
o
o
ooooo oo
o ooo o o ooooo oo o o o oo o o
Degree=1 Degree=2
oo oo oo o o oo o o
oo oo o o oo o o o oo o
oo
o o ooo o o o ooo o
o o o o o ooo o oo o o o o o ooo o oo oooo oooo
oooooooo o oooooooo o o oooo o o oooo o
o oo o o o oo o o oo ooo oo o oo ooo oo o
oo
o o o o o
ooo o
o oo o o o o o o
o
oo ooo o o o o o o o oo o o o o
o o
o
o
o
oooo
o
ooooo ooooo
o o o oo o o
o o ooo o o o o
o o ooo o
o oo oo
o o oooo o
o oo oo
o o o oooo o
oo
o ooo
oo
o ooo
oo o
o o
o oo o
o o ooo oo o oo o
ooo oooooooo
o o oo
oo oo oo
Degree=3 Degree=4
oo oo oo o o oo o o
oo oo o o oo o o o oo o
oo
o o ooo o o o ooo o
o o o o o ooo o oo o o o o o ooo o oooo oooo
oo o
oooooooo o oooooooo o o oooo o o oooo o
o oo o o o oo o
oo ooo oo o oo ooo oo o
oo
o o o o o
ooo o
o oo o o o o o o
o
oo ooo o o o o o o o oo o o o o
o o
o
o
o
oo o o
o o ooo o
oo o
o o
oooo
o
oo ooo
o
oo ooo
o
ooooo ooooo
o o o oo o o
o o ooo o o o o
o o ooo o
o oo oo
o o oooo o
o oo oo
o o o oooo o
oo
o ooo
oo
o ooo
oo o
o o
o oo o
o o ooo oo o oo o
ooo oooooooo
o o oo
oo oo oo
FIGURE 5.7. Logistic regression fits on the two-dimensional classification data displayed in Figure 2.13. The Bayes decision boundary is represented using a purple dashed line. Estimated decision boundaries from linear, quadratic, cubic and quartic (degrees 1 to 4) logistic regressions are displayed in black. The (TRUE) test error rates for the four logistic regression fits are respectively 0.201, 0.197, 0.160, and 0.162, while the Bayes error rate is 0.133.
43
Remark about the simulated example.
• The previous example is simulated.
• The true population distribution is known.
• The figures 0.201, 0.197, 0.160, and 0.162, and 0.133 (for Bayes error rate) are the true test error, computed based on true population distribution.
• In practice the true population distribution is unknown. Thus the true test error cannot be computed.
• We use cross validation to solve the problem.
44
2 4 6 8 10 0.01 0.02 0.05 0.10 0.20 0.50 1.00
Order of Polynomials Used 1/K
Figure: 5.8. Test error (brown), training error (blue), and 10-fold CV error (black) on the two-dimensional classification data displayed in Figure 5.7. Left: Logistic regression using polynomial functions of the predictors. The order of the polynomials used is displayed on the x-axis. Right: The KNN classifier with different values of K, the number of neighbors used in the KNN classifier.
45
Error Rate
0.12 0.14 0.16 0.18 0.20
Error Rate
0.12 0.14 0.16 0.18 0.20
• Training error declines in general when model complexity increases.
• Some times even reaches 0.
• Test error generally declines first and then increases.
• 10-fold cross validation provides reasonable estimate of the test error, with slight under-estimation.
46
Boostrap as a resampling procedure.
• Suppose we have data x1, …, xn, representing the ages of n randomly selected people in HK.
• Use sample mean x ̄ to estimate the population mean μ, the avearge age of all residents of HK.
• How to justify the estimation error x ̄ − μ? Usually by t-confidence interval, test of hypothesis.
• They rely on normality assumption or central limit theorm.
• Is there another reliable way?
• Just bootstrap:
47
12M
Boostrap as a resampling procedure.
• Take n random sample (with replacement) from x1, …, xn.
• calculate the sample mean of the “re-sample”, denoted as x ̄∗.
• Repeat the above a large number M times. We have x ̄∗,x ̄∗,…,x ̄∗ .
• Use the distribution of x ̄∗ − x ̄, …, x ̄∗ − x ̄ to approximate that 1M
of x ̄ − μ.
1
48
• Essential idea: Treat the data distribution (more professionally called empirical distributoin) as a proxy of the population distribution.
• Mimic the data generation from the true population, by trying resampling from the empirical distribution.
• Mimic your statistical procedure (such as computing an estimate x ̄) on data, by doing the same on the resampled data.
• Evaluate your statistical procedure (which may be difficult because it involves randomness and the unknown population distribution) by evaluating your analogue procudres on the re-samples.
49
Example
• X and Y are two random variables. Then minimizer of var(αX +(1−α)Y) is
α= σY2−σXY
σX2 +σY2 −2σXY
• Data: (X1,Y1),…,(Xn,Yn).
• We can compute sample variances and covariances.
• Estimate α by
αˆ= σˆY2 −σˆXY
σˆX2 +σˆY2 −2σˆXY
• How to evalute αˆ − α, (remember αˆ is random and α is unknown).
• Use Bootstrap
50
Example
• Sample n resamples from (X1, Y1), …, (Xn, Yn), and compute the sample the sample variance and covariances for this resample. And then compute
( σˆ ∗ ) 2 − σˆ ∗ αˆ∗ = Y XY
( σˆ ∗ ) 2 + ( σˆ ∗ ) 2 − 2 σˆ ∗ X Y XY
• Repeat this procedure, and we have αˆ1∗, …, αˆM∗ for a large M.
• Use the distribution of αˆ1∗ − αˆ, …, αˆM∗ − αˆ to approximate the
distribution of αˆ − α.
• For example, we can use
1 M
M
to estimate E(αˆ − α)2.
• Use Bootstrap
( αˆ j∗ − αˆ ) 2
j=1
51
−2 −1 0 1 2 −2 −1 0 1 2
XX
−3 −2 −1 0 1 2 −2 −1 0 1 2 3
XX
Figure: 5.9. Each panel displays 100 simulated returns for investments X and Y . From left to right and top to bottom, the resulting estimates for α are 0.576, 0.532, 0.657, and 0.651.
52
YY
−3 −2 −1 0 1 2 −2 −1 0 1 2
YY
−3 −2 −1 0 1 2 −2 −1 0 1 2
0.4 0.5 0.6 0.7 0.8 0.9 0.3 0.4 0.5 0.6 0.7 0.8 0.9 True Bootstrap
αα
Figure: 5.10. Left: A histogram of the estimates of α obtained by generating 1,000 simulated data sets from the true population. Center: A histogram of the estimates of α obtained from 1,000 bootstrap samples from a single data set. Right: The estimates of α displayed in the left and center panels are shown as boxplots. In each panel, the pink line indicates the true value of α.
53
0 50 100 150 200
0 50 100 150 200
α
0.3 0.4 0.5 0.6 0.7 0.8 0.9
Obs
1
2
3
X
4.3
2.1
5.3
Y
2.4
1.1
2.8
Obs
3
3
Obs
2
1
X
5.3
4.3
5.3
X
2.1
5.3
4.3
Y
2.8
1
2.4
2.8
Y
1.1
3
2.8
2.4
Obs
X
Y
2
2.1
1.1
2
2.1
1.1
1
4.3
2.4
αˆ * 1
Original Data (Z)
Z*1
Z*2
Z*B
αˆ * 2
αˆ * B
54
Figure 5.11. A graphical illustration of the bootstrap approach on a small sample containing n = 3 observations. Each bootstrap data set contains n observations, sampled with replacement from the original data set. Each bootstrap data set is used to obtain an estimate of α.
55
library(mvtnorm)
set.seed(1)
n <- 100
m <- 1000
sigmaX <- 1
sigmaY <- 1.25
sigmaXY <- 0.5
sigMat <- matrix(c(sigmaX,sigmaXY,sigmaXY,sigmaY),2,2)
alpha <- rep(0,m)
for(i in 1:m){
returns <- rmvnorm(n,rep(0,2),sigMat)
X <- returns[,1]
Y <- returns[,2]
alpha[i] <- (var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y))
}
mean(alpha)
sd(alpha)
hist(alpha,10)
abline(v=0.6,col="red",lwd=2)
The R code
56
Histogram of alpha
0.3 0.4 0.5 0.6 0.7 0.8 0.9
alpha
57
Frequency
0 50 100 150 200 250
The R code
set.seed(1)
n <- 100
B <- 1000
returns <- rmvnorm(n,rep(0,2),sigMat) # the dataset is fixed once sampled
alpha_bootstrap <- rep(0,B)
for(i in 1:B){
# sample with replacement
returns_i <- returns[sample(1:nrow(returns),n,replace = T),]
X <- returns_i[,1]
Y <- returns_i[,2]
alpha_bootstrap[i] <- (var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y))
}
library(repr)
options(repr.plot.width=8, repr.plot.height=4)
par(mfrow=c(1,2))
hist(alpha,10,main = "sampling from population")
abline(v=0.6,col="red",lwd=2)
mean(alpha_bootstrap)
sd(alpha_bootstrap)
hist(alpha_bootstrap,10,main="Bootstrap")
abline(v=mean(alpha_bootstrap),col="red",lwd=2)
58
sampling from population
Bootstrap
0.3 0.4 0.5 0.6 0.7 0.8 0.9
alpha
0.4 0.5 0.6 0.7 0.8 0.9
alpha_bootstrap
59
Frequency
0 50 100 150 200 250
Frequency
0 50 100 150 200 250