Motivation
Motivation
Resampling
2.0 – Resampling
Motivation
Resampling
Resampling techniques facilitate inference about a population, and
allow to quantify bias, variance and other statistical characteristics
of interest (of an estimator, a prediction error, etc.) e.g. by:
simulating data from a theoretic model in simulations;
simulating copies of a real dataset;
randomizing training and validation steps in predictive
modelling.
Main methodologies:
Monte Carlo
Bootstrap and jackknife
Cross-validation
Permutation tests (not covered here)
Motivation
Resampling
Simulating data
Monte-Carlo methods are generally used for simulation of
data from a statistical model (rather than observed).
Repetitions of the randomized statistical experiment are
performed; the experiment outputs are collected and then
analysed as outcomes of a random variable. Their statistical
distribution (and therefore statistical performance of the
technique) can then be characterized.
In situations where assuming a theoretic statistical model is
not appropriate or desired (e.g. when the Central Limit
Theory does not apply), Bootstrapping is a way of
simulating new samples from just one sample of observations.
Statistical properties of interest can be assessed from this set
of bootstrap resamples.
Motivation
Resampling
Validating predictive models
When evaluating the performance of a predictive model, we
are interested in measuring prediction accuracy (and related
metrics) on unseen data, which are usually not available.
Cross-validation frameworks allow to generate a number of
training subsets (for model fitting) and distinct validation
subsets (treated as new data). Prediction performance is then
simply assessed on average on the validation samples.
Motivation
Resampling
Monte-Carlo sampling
2.1 – Monte-Carlo sampling
Motivation
Resampling
Monte-Carlo sampling
Monte-Carlo methods
Monte Carlo methods use resampling techniques to estimate the
distribution of a population.
They rely on the law of large numbers, to allow us to approximate
an expectation by the sample mean of a random variable (or a
function thereof).
For example, risk analysis may be performed by MC repetitions of
a simulated model outcome. Each outcome is generated for a
different set of random values from the model distribution. The
(sample) distribution of outcome values can then be used as a
basis for assessing (i.e. approximating) the theoretic risk and/or
functionals of it.
Motivation
Resampling
Monte-Carlo sampling
Monte-Carlo integration
Let our objective be to compute
θ =
∫ b
a
g(x)dx
for a given function g, assuming this integral exists. Recall that if
X is a r.v. with density f(x), then
E [g(X)] =
∫ ∞
−∞
g(x)f(x)dx
Then if a random sample is available from F (X), an unbiased
estimator of E[g(X)] is the sample mean
gM (X) =
1
M
M∑
i=1
g(Xi)
Motivation
Resampling
Monte-Carlo sampling
This provides an algorithm for the evaluation of integrals.
Example: we wish to evaluate
θ =
∫ 1
0
g(x)dx
If X1, . . . , XM is a random U(0, 1) sample, then by the strong
LLN,
θ̂ = gM (X) =
1
M
M∑
i=1
g(Xi)
converges to θ = E[g(X)] (since X ∼ U(0, 1)) with probability 1.
Motivation
Resampling
Monte-Carlo sampling
Generalisation to any interval: in order to evaluate
θ =
∫ b
a
g(x)dx
we can perform a change of variables to fall back to the U(0, 1)
case, OR ELSE simply sample from a U(a, b):
θ̂ = (b− a)gM (X) =
(b− a)
M
M∑
i=1
g(Xi)
Exercise: evaluate e−x over (2, 4) and check with the theoretic
value.
Motivation
Resampling
Monte-Carlo sampling
Estimation variance and computation calibration
Variance: we have
Var
(
gM (X)
)
=
1
M
Var (g(X))
and thus
Var
(
θ̂
)
= (b− a)2Var
(
gM (X)
)
=
(b− a)2
M
Var (g(X))
For M large, the Central Limit Theorem (generally) applies and
one may assume that θ̂ is approximately Normally distributed. This
is useful for deriving confidence intervals and for selection of M .
Efficiency: given two estimators, θ̂1 is more efficient than θ̂2 if
Var(θ̂1) < Var(θ̂2)
Motivation
Resampling
Monte-Carlo sampling
Generalization of the principle
Now suppose we want to evaluate the unknown functional g of X,
a r.v. with any density f(x),
E [g(X)] =
∫ ∞
−∞
g(x)f(x)dx
The same approach is applicable, using a random sample from the
distribution of X:
θ̂ =
1
M
M∑
i=1
g(Xi)
Then with probability 1,
θ̂ −→
M→∞
E[θ̂] = θ
Motivation
Resampling
Monte-Carlo sampling
Note: Importance sampling is an extension of the principle of
Monte Carlo simulations. It consists in choosing an adequate
sampling distribution (from which random realizations are
generated) in order to approximate the expected value with
“good” control on the estimation variance.
Motivation
Resampling
Monte-Carlo sampling
Monte-Carlo repetitions of a statistical experiment
Suppose we need to evaluate a functional from observations
Yi = η(β,Xi) + εi, i = 1, . . . , N
where:
η and Xi ∈ X are known
β describes a finite number of parameters to be estimated
εi
iid∼ fε(0, σ2)
The Monte Carlo principle can be applied to estimate
E [Y ] = η(β,X)
Motivation
Resampling
Monte-Carlo sampling
Example: consider estimating β in a linear regression model
Yi = βXi + εi, i = 1, . . . , N
with deterministic Xi ∈ X and εi
iid∼ N (0, 1)... Then
fY |β,X = N (βX, 1)
1 Analyse a synthetic statistical problem with Monte-Carlo
repetitions of the experiment...
2 This implies generating M random samples of size N for the
same model
3 The statistical solution is derived for each repetition and
stored in an adequate R object
4 This yields an empirical distribution for the estimator β̂
Motivation
Resampling
Monte-Carlo sampling
1 Assume a true value for β, e.g. β = 3:
Yi = 3Xi + εi, i = 1, . . . , N
and regressors Xi, e.g. X = (1, 2, 3, 4, 5)
2 Set experiment parameters, e.g. N = 50 and M = 100
3 Implement repetitions of the set experiment using a loop:
x = rep(c(1:5), 10); # initialisation
ests = matrix(0, M, 1) # initialisation
for(i in 1:M){
y = 3*x + rnorm(N)
ests[i] = lm(y ∼ x+0)$coef[1]
}
Motivation
Resampling
Monte-Carlo sampling
4 Analyse the distribution of estimates (stored in ests):
summary(ests)
hist(ests)
boxplot(ests)
mean(ests)
sd(ests)
...
→ This allows comparing several estimators in terms of their
distributions
→ One could for example compare the distributions of the LS
estimator and of a robust alternative in a linear regression
problem with non-Gaussian noise
(hint: ?MASS::rlm and ?rexp)
Motivation
Resampling
Bootstrapping
2.2 - Bootstrapping
Motivation
Resampling
Bootstrapping
Boostrapping
One often refers to Monte Carlo repetitions as parametric
bootstrapping, since in that case the distribution is known and
parametrized.
The Bootstrap has been introduced by Bradley Efron in the late
1970’s.
Bootstrap methods are often used when the target population is
not specified and the sample is the only available information to
us. The distribution of the finite population represented by the
sample can be regarded as a pseudo-population with similar
characteristics as the true population.
Motivation
Resampling
Bootstrapping
Bootstrap estimates of a sampling distribution are analogous to
the idea of density estimation. We construct a histogram of a
sample to obtain an estimate of the shape of the density function.
The histogram is not the density but can be viewed as a reasonable
estimate of it.
By repeatedly generating random samples from the
pseudo-population (resampling) the sampling distribution of a
statistic can be estimated. Properties of an estimator such as bias,
standard error etc can then be estimated.
Motivation
Resampling
Bootstrapping
General idea of the Bootstrap procedure
1 Create lots of new samples (Bootstrap samples) by sampling
from the original random sample with replacement. All
samples have same size N as original.
2 Calculate the statistic in question for each one of the
Bootstrap samples.
→ The distribution of these statistics is called a Bootstrap
distribution.
→ It will give information about the shape, centre, and spread of
the sampling distribution of the statistic.
Motivation
Resampling
Bootstrapping
General algorithm
Given an initial sample X1, . . . , XN , assuming we are interested in
estimating parameter θ,
1 Repeat the following steps for b = 1, . . . , B:
1 draw a sample X∗(b) from X with replacement
2 evaluate the estimate θ̂(b) from X∗(b)
2 Deduce the bootstrap estimate of F
θ̂
as the empirical
distribution of replicates θ̂(1), . . . , θ̂(B)
Example: Bootstrap standard error estimate
Bootstrap SE estimate for estimator θ̂ (e.g., θ̂b = x̄b):
SEB =
√√√√ 1
B − 1
B∑
b=1
(
θ̂b −
1
B
B∑
b=1
θ̂b
)2
Motivation
Resampling
Bootstrapping
Common bootstrap statistics
Given:
- an initial sample {X1, . . . , XN} with initial statistic θ̂,
- B bootstrap estimates {θ̂b}Bb=1 of parameter of interest θ
(e.g. θ̂ = X̄ and θ̂b = X̄
∗
b ):
Bias: ˆBias(θ̂) = 1
B
∑B
b=1(θ̂b − θ̂) =
¯̂
θ − θ̂
Variance: ˆV ar(θ̂) = 1
B
∑B
b=1(θ̂b −
¯̂
θ)2
MSE: ˆMSE(θ̂) = ˆBias(θ̂)2 + ˆV ar(θ̂)
Standard error:
SEB =
√√√√ 1
B − 1
B∑
b=1
(
θ̂b −
1
B
B∑
b=1
θ̂b
)2
Motivation
Resampling
Bootstrapping
Accolade: International prize for the Bootstrap
“With the bootstrap [...], scientists are able to learn from limited data in a simple way
that enables them to assess the uncertainty of their findings. In essence, it is possible
to simulate a potentially infinite number of data sets from an original data set and–in
looking at the differences–measure the uncertainty of the result from the original data
analysis.”
“”While statistics offers no magic pill for quantitative scientific investigations, the
bootstrap is the best statistical pain reliever ever produced”, says Xiao-Li Meng,
Whipple V. N. Jones Professor of Statistics at Harvard University. ”It has saved
countless scientists and researchers the headache of finding a way to assess
uncertainty in complex problems by providing a simple and practical way to do so in
many seemingly hopeless situations.””
“”Because the bootstrap is easy for a computer to calculate and is applicable in an
exceptionally wide range of situations, the method has found use in many fields of
science, technology, medicine and public affairs”, says Sir David Cox [...].”
https://www.biometricsociety.org/2018/11/professor-bradley-efron-of-stanford-university-wins-international-prize-in-statistics/
Motivation
Resampling
Bootstrapping
Accolade: International prize for the Bootstrap
https://www.biometricsociety.org/2018/11/professor-bradley-efron-of-stanford-university-wins-international-prize-in-statistics/
Motivation
Resampling
Cross-validation
2.3 - Cross-validation
Motivation
Resampling
Cross-validation
Cross-validation (CV)
A single error/performance measurement is not reliable
A model fitted (trained) to a sample will describe that
sample’s pattern very well, maybe even too well (overfitting)
Cross-validation is used specifically to evaluate model
performance on ’new’ data
Main CV frameworks:
Leave-one-out CV (LOO CV)
Leave-p-out CV (LpO CV)
k-fold CV
Monte Carlo CV
Nested k-fold CV
Motivation
Resampling
Cross-validation
LOO CV (source: ISLR)
CV(n) =
1
n
n∑
i=1
MSEi
(figure from ISLR)
http://www-bcf.usc.edu/~gareth/ISL/
http://www-bcf.usc.edu/~gareth/ISL/
Motivation
Resampling
Cross-validation
LOO CV and LpO CV
Cross-validated error is the average of all LOO CV sample
MSE’s (or any other performance criterion):
CV(n) =
1
n
n∑
i=1
MSEi
LOO CV uses all possible combinations of n− 1 data points
as training sets
Leave-p-out CV generalizes this principle by using all possible
combinations of n− p data points as training sets
LpO CV thus requires calculating Cnp model fits and
corresponding error estimates
Even for moderately large n, this can quickly become
infeasible when p > 1
Motivation
Resampling
Cross-validation
k-fold CV (source: ISLR)
CV(n) =
1
n
n∑
i=1
MSEi
(figure from ISLR)
http://www-bcf.usc.edu/~gareth/ISL/
http://www-bcf.usc.edu/~gareth/ISL/
Motivation
Resampling
Cross-validation
Monte Carlo CV
Instead of splitting into distinct folds (as in k-fold CV), split
dataset randomly at every iteration (using same training and test
sample sizes every time)
Motivation
Resampling
Cross-validation
Nested k-fold CV
When training elaborate models, tuning of some model
hyper-parameters may be required
Ex: the degree p of a polynomial in polynomial regression
Y = θ0 + θ1X + · · ·+ θpXp + ε
Nested CV consists in applying an inner CV loop on each CV
training sample for such model tuning
Usually k-fold CV is used for both outer and inner CV
This will be covered more extensively in the follow-on course
Resampling
Monte-Carlo sampling
Bootstrapping
Cross-validation