CS计算机代考程序代写 algorithm Motivation

Motivation

Motivation

Resampling

2.0 – Resampling

Motivation

Resampling

Resampling techniques facilitate inference about a population, and
allow to quantify bias, variance and other statistical characteristics
of interest (of an estimator, a prediction error, etc.) e.g. by:

simulating data from a theoretic model in simulations;

simulating copies of a real dataset;

randomizing training and validation steps in predictive
modelling.

Main methodologies:

Monte Carlo

Bootstrap and jackknife

Cross-validation

Permutation tests (not covered here)

Motivation

Resampling

Simulating data

Monte-Carlo methods are generally used for simulation of
data from a statistical model (rather than observed).
Repetitions of the randomized statistical experiment are
performed; the experiment outputs are collected and then
analysed as outcomes of a random variable. Their statistical
distribution (and therefore statistical performance of the
technique) can then be characterized.

In situations where assuming a theoretic statistical model is
not appropriate or desired (e.g. when the Central Limit
Theory does not apply), Bootstrapping is a way of
simulating new samples from just one sample of observations.
Statistical properties of interest can be assessed from this set
of bootstrap resamples.

Motivation

Resampling

Validating predictive models

When evaluating the performance of a predictive model, we
are interested in measuring prediction accuracy (and related
metrics) on unseen data, which are usually not available.
Cross-validation frameworks allow to generate a number of
training subsets (for model fitting) and distinct validation
subsets (treated as new data). Prediction performance is then
simply assessed on average on the validation samples.

Motivation

Resampling

Monte-Carlo sampling

2.1 – Monte-Carlo sampling

Motivation

Resampling

Monte-Carlo sampling

Monte-Carlo methods

Monte Carlo methods use resampling techniques to estimate the
distribution of a population.

They rely on the law of large numbers, to allow us to approximate
an expectation by the sample mean of a random variable (or a
function thereof).

For example, risk analysis may be performed by MC repetitions of
a simulated model outcome. Each outcome is generated for a
different set of random values from the model distribution. The
(sample) distribution of outcome values can then be used as a
basis for assessing (i.e. approximating) the theoretic risk and/or
functionals of it.

Motivation

Resampling

Monte-Carlo sampling

Monte-Carlo integration

Let our objective be to compute

θ =

∫ b
a
g(x)dx

for a given function g, assuming this integral exists. Recall that if
X is a r.v. with density f(x), then

E [g(X)] =

∫ ∞
−∞

g(x)f(x)dx

Then if a random sample is available from F (X), an unbiased
estimator of E[g(X)] is the sample mean

gM (X) =
1

M

M∑
i=1

g(Xi)

Motivation

Resampling

Monte-Carlo sampling

This provides an algorithm for the evaluation of integrals.

Example: we wish to evaluate

θ =

∫ 1
0
g(x)dx

If X1, . . . , XM is a random U(0, 1) sample, then by the strong
LLN,

θ̂ = gM (X) =
1

M

M∑
i=1

g(Xi)

converges to θ = E[g(X)] (since X ∼ U(0, 1)) with probability 1.

Motivation

Resampling

Monte-Carlo sampling

Generalisation to any interval: in order to evaluate

θ =

∫ b
a
g(x)dx

we can perform a change of variables to fall back to the U(0, 1)
case, OR ELSE simply sample from a U(a, b):

θ̂ = (b− a)gM (X) =
(b− a)
M

M∑
i=1

g(Xi)

Exercise: evaluate e−x over (2, 4) and check with the theoretic
value.

Motivation

Resampling

Monte-Carlo sampling

Estimation variance and computation calibration

Variance: we have

Var
(
gM (X)

)
=

1

M
Var (g(X))

and thus

Var
(
θ̂
)

= (b− a)2Var
(
gM (X)

)
=

(b− a)2

M
Var (g(X))

For M large, the Central Limit Theorem (generally) applies and
one may assume that θ̂ is approximately Normally distributed. This
is useful for deriving confidence intervals and for selection of M .

Efficiency: given two estimators, θ̂1 is more efficient than θ̂2 if
Var(θ̂1) < Var(θ̂2) Motivation Resampling Monte-Carlo sampling Generalization of the principle Now suppose we want to evaluate the unknown functional g of X, a r.v. with any density f(x), E [g(X)] = ∫ ∞ −∞ g(x)f(x)dx The same approach is applicable, using a random sample from the distribution of X: θ̂ = 1 M M∑ i=1 g(Xi) Then with probability 1, θ̂ −→ M→∞ E[θ̂] = θ Motivation Resampling Monte-Carlo sampling Note: Importance sampling is an extension of the principle of Monte Carlo simulations. It consists in choosing an adequate sampling distribution (from which random realizations are generated) in order to approximate the expected value with “good” control on the estimation variance. Motivation Resampling Monte-Carlo sampling Monte-Carlo repetitions of a statistical experiment Suppose we need to evaluate a functional from observations Yi = η(β,Xi) + εi, i = 1, . . . , N where: η and Xi ∈ X are known β describes a finite number of parameters to be estimated εi iid∼ fε(0, σ2) The Monte Carlo principle can be applied to estimate E [Y ] = η(β,X) Motivation Resampling Monte-Carlo sampling Example: consider estimating β in a linear regression model Yi = βXi + εi, i = 1, . . . , N with deterministic Xi ∈ X and εi iid∼ N (0, 1)... Then fY |β,X = N (βX, 1) 1 Analyse a synthetic statistical problem with Monte-Carlo repetitions of the experiment... 2 This implies generating M random samples of size N for the same model 3 The statistical solution is derived for each repetition and stored in an adequate R object 4 This yields an empirical distribution for the estimator β̂ Motivation Resampling Monte-Carlo sampling 1 Assume a true value for β, e.g. β = 3: Yi = 3Xi + εi, i = 1, . . . , N and regressors Xi, e.g. X = (1, 2, 3, 4, 5) 2 Set experiment parameters, e.g. N = 50 and M = 100 3 Implement repetitions of the set experiment using a loop: x = rep(c(1:5), 10); # initialisation ests = matrix(0, M, 1) # initialisation for(i in 1:M){ y = 3*x + rnorm(N) ests[i] = lm(y ∼ x+0)$coef[1] } Motivation Resampling Monte-Carlo sampling 4 Analyse the distribution of estimates (stored in ests): summary(ests) hist(ests) boxplot(ests) mean(ests) sd(ests) ... → This allows comparing several estimators in terms of their distributions → One could for example compare the distributions of the LS estimator and of a robust alternative in a linear regression problem with non-Gaussian noise (hint: ?MASS::rlm and ?rexp) Motivation Resampling Bootstrapping 2.2 - Bootstrapping Motivation Resampling Bootstrapping Boostrapping One often refers to Monte Carlo repetitions as parametric bootstrapping, since in that case the distribution is known and parametrized. The Bootstrap has been introduced by Bradley Efron in the late 1970’s. Bootstrap methods are often used when the target population is not specified and the sample is the only available information to us. The distribution of the finite population represented by the sample can be regarded as a pseudo-population with similar characteristics as the true population. Motivation Resampling Bootstrapping Bootstrap estimates of a sampling distribution are analogous to the idea of density estimation. We construct a histogram of a sample to obtain an estimate of the shape of the density function. The histogram is not the density but can be viewed as a reasonable estimate of it. By repeatedly generating random samples from the pseudo-population (resampling) the sampling distribution of a statistic can be estimated. Properties of an estimator such as bias, standard error etc can then be estimated. Motivation Resampling Bootstrapping General idea of the Bootstrap procedure 1 Create lots of new samples (Bootstrap samples) by sampling from the original random sample with replacement. All samples have same size N as original. 2 Calculate the statistic in question for each one of the Bootstrap samples. → The distribution of these statistics is called a Bootstrap distribution. → It will give information about the shape, centre, and spread of the sampling distribution of the statistic. Motivation Resampling Bootstrapping General algorithm Given an initial sample X1, . . . , XN , assuming we are interested in estimating parameter θ, 1 Repeat the following steps for b = 1, . . . , B: 1 draw a sample X∗(b) from X with replacement 2 evaluate the estimate θ̂(b) from X∗(b) 2 Deduce the bootstrap estimate of F θ̂ as the empirical distribution of replicates θ̂(1), . . . , θ̂(B) Example: Bootstrap standard error estimate Bootstrap SE estimate for estimator θ̂ (e.g., θ̂b = x̄b): SEB = √√√√ 1 B − 1 B∑ b=1 ( θ̂b − 1 B B∑ b=1 θ̂b )2 Motivation Resampling Bootstrapping Common bootstrap statistics Given: - an initial sample {X1, . . . , XN} with initial statistic θ̂, - B bootstrap estimates {θ̂b}Bb=1 of parameter of interest θ (e.g. θ̂ = X̄ and θ̂b = X̄ ∗ b ): Bias: ˆBias(θ̂) = 1 B ∑B b=1(θ̂b − θ̂) = ¯̂ θ − θ̂ Variance: ˆV ar(θ̂) = 1 B ∑B b=1(θ̂b − ¯̂ θ)2 MSE: ˆMSE(θ̂) = ˆBias(θ̂)2 + ˆV ar(θ̂) Standard error: SEB = √√√√ 1 B − 1 B∑ b=1 ( θ̂b − 1 B B∑ b=1 θ̂b )2 Motivation Resampling Bootstrapping Accolade: International prize for the Bootstrap “With the bootstrap [...], scientists are able to learn from limited data in a simple way that enables them to assess the uncertainty of their findings. In essence, it is possible to simulate a potentially infinite number of data sets from an original data set and–in looking at the differences–measure the uncertainty of the result from the original data analysis.” “”While statistics offers no magic pill for quantitative scientific investigations, the bootstrap is the best statistical pain reliever ever produced”, says Xiao-Li Meng, Whipple V. N. Jones Professor of Statistics at Harvard University. ”It has saved countless scientists and researchers the headache of finding a way to assess uncertainty in complex problems by providing a simple and practical way to do so in many seemingly hopeless situations.”” “”Because the bootstrap is easy for a computer to calculate and is applicable in an exceptionally wide range of situations, the method has found use in many fields of science, technology, medicine and public affairs”, says Sir David Cox [...].” https://www.biometricsociety.org/2018/11/professor-bradley-efron-of-stanford-university-wins-international-prize-in-statistics/ Motivation Resampling Bootstrapping Accolade: International prize for the Bootstrap https://www.biometricsociety.org/2018/11/professor-bradley-efron-of-stanford-university-wins-international-prize-in-statistics/ Motivation Resampling Cross-validation 2.3 - Cross-validation Motivation Resampling Cross-validation Cross-validation (CV) A single error/performance measurement is not reliable A model fitted (trained) to a sample will describe that sample’s pattern very well, maybe even too well (overfitting) Cross-validation is used specifically to evaluate model performance on ’new’ data Main CV frameworks: Leave-one-out CV (LOO CV) Leave-p-out CV (LpO CV) k-fold CV Monte Carlo CV Nested k-fold CV Motivation Resampling Cross-validation LOO CV (source: ISLR) CV(n) = 1 n n∑ i=1 MSEi (figure from ISLR) http://www-bcf.usc.edu/~gareth/ISL/ http://www-bcf.usc.edu/~gareth/ISL/ Motivation Resampling Cross-validation LOO CV and LpO CV Cross-validated error is the average of all LOO CV sample MSE’s (or any other performance criterion): CV(n) = 1 n n∑ i=1 MSEi LOO CV uses all possible combinations of n− 1 data points as training sets Leave-p-out CV generalizes this principle by using all possible combinations of n− p data points as training sets LpO CV thus requires calculating Cnp model fits and corresponding error estimates Even for moderately large n, this can quickly become infeasible when p > 1

Motivation

Resampling

Cross-validation

k-fold CV (source: ISLR)

CV(n) =
1

n

n∑
i=1

MSEi

(figure from ISLR)

http://www-bcf.usc.edu/~gareth/ISL/
http://www-bcf.usc.edu/~gareth/ISL/

Motivation

Resampling

Cross-validation

Monte Carlo CV

Instead of splitting into distinct folds (as in k-fold CV), split
dataset randomly at every iteration (using same training and test
sample sizes every time)

Motivation

Resampling

Cross-validation

Nested k-fold CV

When training elaborate models, tuning of some model
hyper-parameters may be required

Ex: the degree p of a polynomial in polynomial regression

Y = θ0 + θ1X + · · ·+ θpXp + ε

Nested CV consists in applying an inner CV loop on each CV
training sample for such model tuning

Usually k-fold CV is used for both outer and inner CV

This will be covered more extensively in the follow-on course

Resampling
Monte-Carlo sampling
Bootstrapping
Cross-validation