§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap Methods
MAST90083 Computational Statistics and Data Mining
Karim Seghouane
School of Mathematics & Statistics The University of Melbourne
Bootstrap Methods 1/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Outline
§6.1 Introduction
§6.2 Bootstrap principle
§6.3 Parametric vs Nonparametric §6.4 Bias correction
§6.3 Confidence Interval
Bootstrap Methods 2/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Introduction
Bootstrap methods use computer simulation to reveal aspects of the sampling distribution for an estimator θˆ of interest.
With the power of modern computers the approach has broad applicability and is now a practical and useful tool for applied statisticians and data scientists
Bootstrap Methods 3/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Introduction
The bootstrap is a general tool for assessing statistical accuracy
It is based on re-sampling strategy
Having the estimated feature of the data that we compute based on the sample on hand, we are interested to understand how the estimate changes for a different sample
Examples of features: prediction accuracy, the mean value, etc.
But unfortunately we cannot use more than one sample
Solution: bootstrap
Bootstrap Methods 4/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Introduction
The idea behind the bootstrap is an old one.
Assume we wish to estimate a functional of a population distribution function F, such as the population mean
θˆ =
xdFˆ (x) = x ̄
θ =
xdF (x)
Consider employing the same functional of the sample (or empirical) distribution function Fˆ, which in this case leads to the sample mean
Bootstrap Methods 5/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Introduction
One can use θˆ = x ̄ to estimate θ.
Evaluating the variability in this estimation would require the sampling distribution of x ̄.
Bootstrap Methods 6/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Empirical distribution
The empirical distribution is that probability measure that assigns to a set a measure equal to the proportion of samples that lie in that set
ˆ 1n
δ (x − xi )
for a set x1,…,xn of i.i.d from F, where δ(x −xi) represents a“point mass”at xi (that assigns full probability to the point xi and zero to all other points).
Fˆ is the discrete distribution that assigns a mass 1/n to each point xi , 1 ≤ i ≤ n.
BytheL.L.N.Fˆ→p F asn→∞.
F (x ) = n
i=1
Bootstrap Methods 7/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Sample and resample
A sample X = {x1, …, xn} is a collection of n numbers (or vectors), without regard to order drawn at random from the population F.
The xi′s are therefore i.i.d. random variables each having the population distribution function F
A resample X∗ = {x1∗,…,xn∗} is an unordered collection of n items randomly drawn from X with replacement
It is known as a bootstrap sample and is a central step of the nonparametric bootstrap method
Bootstrap Methods 8/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Resample
Each xi∗ has probability 1/n of being equal to any given one x j′ s
P(xi∗=xj|X)=n1, 1≤i,j≤n The xi∗’s are i.i.d. conditional on X.
X∗ is likely to contain repeats, all of which must be listed in X∗.
Example: X ∗ = {1.5, 1.7, 1.7, 1.8} is different from {1.5, 1.7, 1.8} and X ∗ is the same as {1.5, 1.7, 1.8, 1.7}, {1.7, 1.5, 1.8, 1.7}.
Bootstrap Methods 9/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Population and sample distribution
F is the population distribution of X whereas Fˆ is its sample distribution.
Fˆ on the other hand is the distribution function of the population from which X∗ is drawn.
F,Fˆ is generally written (F0,F1) in bootstrap iteration, where i ≥ 1.
Fi denotes the distribution function of a sample drawn from Fi−1 conditional on Fi−1.
The ith application of the bootstrap is termed ith iteration, not the (i − 1)th iteration
Bootstrap Methods 10/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Estimation as functional
An estimate θˆ is a function of the data and a functional of the sample distribution function Fˆ
Example: The sample mean
ˆ 1n
whereas the population mean
θ = θ (F ) =
xdF (x ).
xi θ=θ F = xdF(x)
θ = θ [X ] = n
ˆ ˆ ˆ
i=1
Bootstrap Methods 11/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
Bootstrap Methods 12/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
Assume we can’t observe“doll 0”→ it represents the population in a sampling scheme
We wish to estimate the number n0 of freckles on its face.
Let ni denotes the number of freckles on the face of“doll i”
Assuming the ration of n1/n2 close to the ratio n0/n1, we have nˆ0 ≃ n12/n2
The key feature of this argument is our hypothesis that the relationship between n2 and n1 should closely resemble that between n1 and the unknown n0
Bootstrap Methods 13/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
Statistical inference amounts to describing the relationship between a sample and the population from which the sample is drawn
Formally: Given a functional ft from a class {ft : t ∈ τ }, we aim to find t0 such that
E{ft (F0,F1)|F0} = EF0 {ft (F0,F1)} = 0
where F0 = F (population distribution) and F1 = Fˆ (sample
distribution)
we want to find t0 the solution of the population equation (because we need properties of the population to solve this equation exactly)
Bootstrap Methods 14/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
Example:
Let θ0 = θ(F) = θ(F0) be the true parameter value, such as the rth power of a mean
θ0 =
r xdF0(x)
Let θˆ = θ(F1) be the bootstrap estimate of θ0
θˆ =
where F1 is the empirical distribution function
r xdF1(x) = x ̄r
Bootstrap Methods 15/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example: Bias correction
Correcting θˆ for bias is equivalent to finding t0 that solves EF0 {ft (F0, F1)} = 0
where
ft (F0,F1)=θ(F1)−θ(F0)+t and the bias corrected estimate is θˆ + t0
Bootstrap Methods 16/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example: Confidence interval
To construct a symmetric, (1 − α) confidence interval for θ0 is equivalent to using
ft (F0, F1) = I {θ(F1) − t ≤ θ(F0) ≤ θ(F1) + t} − (1 − α)
where I(.) denotes the indicator of the event that the true
parameter value θ(F0) lies in the interval
[ θ ( F 1 ) − t , θ ( F 1 ) + t ] = θˆ − t , θˆ + t
minus the nominal coverage 1 − α of the interval. Asking that E{ft (F0,F1)|F0} = 0
is equivalent to insisting that t be chosen so that the interval has zero coverage error.
Bootstrap Methods 17/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
The equation
E{ft (F0,F1)|F0} = 0
provides an explicite description of the relationship between F0
and F1 we are trying to determine.
The analogue in the case of the number of freckles problem is
n0 − tn1 = 0
where ni is the number of freckles on doll“i”
Bootstrap Methods 18/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
If we had t = t0 solving the equation, then n0 = t0n1.
The estimation of t0 is obtained from the pair (n1,n2) we
know
n1 − tn2 = 0
we obtain the solution ˆt0 of this equation and thereby
is the estimate of n0
nˆ 0 = ˆt 0 n 1 = n 12 n2
Bootstrap Methods 19/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
Similarly, the population equation
E{ft (F0,F1)|F0} = 0
is solved via the sample equation
E{ft (F1,F2)|F1} = 0
where F2 is the distribution function of a sample drawn from F1 is the analogue of n2.
Bootstrap Methods 20/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
The solution ˆt0 is a function of the sample values
The idea is that the solution of the sample equation should be a good approximation of the solution of the population equation
The population equation is not obtainable in practice → this is the bootstrap principle.
Bootstrap Methods 21/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap principle
We call ˆt0 and E {ft (F1, F2) |F1} “the bootstrap estimates” of t0 and E {ft (F0, F1) |F0}.
They are obtained by replacing F0 and F1 in the formulae for t0
The bootstrap version of the bias corrected estimate is θˆ + ˆt0 The bootstrap confidence interval is θˆ − ˆt , θˆ + ˆt called the
00 symmetric percentile method confidence interval for θ0
Bootstrap Methods 22/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Parametric vs Nonparametric
In both parametric and nonparametric problems, the inference is based on a sample X of size n (n i.i.d. observations of the population)
In nonparametric case F1, is the empirical distribution function of X
Similarly F2 is the empirical distribution function of a sample drawn at random from the population F1
Bootstrap Methods 23/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Nonparametric
It is the empirical distribution of a sample X∗ drawn randomly with replacement from X
If we denote the population by X0, then we have a nest of sampling operations
X is drawn at random from X0 X∗ is drawn at random from X
Bootstrap Methods 24/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Parametric
In this case F0 is assumed completely known up to a finite vector λ0 of unknown parameters.
F0 = F(λ0) is an element of a class F(λ), λ ∈ Λ of possible distributions
Then F1 = F(λˆ), the distribution function obtained using the sample estimate λˆ obtained from X often (but not necessary)
using maximum likelihood estimate
Let X∗ denotes the sample drawn at random from F(λˆ) and
F 2 = F ( λˆ ∗ )
In both cases, X∗ is obtained by resampling from a distribution determined by the original sample X
Bootstrap Methods 25/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example
Estimate of the MSE
2ˆ2 2
τ =E θ−θ0 =E [θ(F1)−θ(F0)] |F0 has bootstrap estimate
2 ˆ∗ ˆ2 2 τˆ =E θ −θ |X =E [θ(F2)−θ(F1)] |F1
whereθˆ∗ =θ[X∗]isanestimateversionofθˆobtainedusing X∗ instead of X
Bootstrap Methods 26/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bias correction
Here we have
ft (F0,F1)=θ(F1)−θ(F0)+t and the sample equation
E{ft (F1,F2)|F1}=E{θ(F2)−θ(F1)+t|F1}=0 whose solution is
t = ˆt 0 = θ ( F 1 ) − E { θ ( F 2 ) | F 1 }
Bootstrap Methods 27/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bias correction
The bootstrap bias-reduced estimate is this
θˆ 1 = θˆ + ˆt 0 = θ ( F 1 ) + ˆt 0 = 2 θ ( F 1 ) − E { θ ( F 2 ) | F 1 }
Note that the estimate θˆ = θ(F1) is also a bootstrap functional since it is obtained by replacing F1 for F0 in the functional formula θ0 = θ(F0).
the expectation E{θ(F2)|F1} is computed (or approximated) by Monte Carlo simulation
Bootstrap Methods 28/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bias correction
Draw B resamples {Xb∗, 1 ≤ b ≤ B} independently from the distribution function F1
In the nonparametric case F1 is the empirical distribution of the the sample X
Let F2b denotes the empirical distribution function of Xb∗
In the parametric case, λˆ∗b = λ (Xb∗) is the estimate of λ0 obtained from Xb∗ and F2b = F(λˆ ∗b )
Bootstrap Methods 29/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bias correction
Define θˆ∗ = θ(F2b), then b
1BB
uˆ B = B
b=1
converge to (as B → ∞) uˆ=E{θ(F2)|F1}=E θˆ|X
θ ( F 2 b ) = B
−1 ∗ θˆ b
b=1
∗
Bootstrap Methods 30/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example
Let
μ =
X = {x1,…,xn} and
xdF0(x) and assume θ0 = θ(F0) = μ3 1 n
x ̄=n xi i=1
In nonparametric approach
θˆ = θ ( F 1 ) = x ̄ 3
Bootstrap Methods 31/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example
In nonparametric approach
E {θ(F1)|F0} = EF0 xi
3 1n
ni=1 3
1n = E μ + (xi − μ)
ni=1 = μ3 + n−13μσ2 + n−2γ
whereσ2 =E(x−μ)2 andγ=E(x−μ)3 denotethe population variance and skewness
Bootstrap Methods 32/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example
In the nonparametric case
E {θ(F2)|F1} = x ̄3 + n−13x ̄σˆ2 + n−2γˆ
whereσˆ2 =n−1(xi −x ̄)andγˆ=n−1(xi −x ̄)3 denote the sample variance and skewness
Therefore the bootstrap bias reduced estimate is
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = 2x ̄3 − x ̄3 + n−13x ̄σˆ2 + n−2γˆ
= x ̄3 − n−13x ̄σˆ2 − n−2γˆ
Bootstrap Methods 33/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example
If the population is normal N(μ, σ2), γ = 0 and E {θ(F1)|F0} = μ3 + n−13μσ2
The maximum likelihood could be used to estimate λˆ = x ̄ , σˆ 2
θ(F2) is the statistic θˆ computed for a sample from a normal x ̄,σˆ2 distribution and in direct analogy we have
E { θ ( F 2 ) | F 1 } = x ̄ 3 + n − 1 3 x ̄ σˆ 2
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = x ̄3 − n−13x ̄σˆ2
Therefore
Bootstrap Methods 34/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example
If the population is exponential with mean μ and −1 x
fμ(x)=μ exp −μ forx>0 hereσ2 =μ2 andγ=2μ3
E {θ(F1)|F0} = μ3 1 + 3n−1 + 2n−2 Taking the maximum likelihood estimate x ̄ for μ
E {θ(F2)|F1} = x ̄3 1 + 3n−1 + 2n−2
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = x ̄3 1 − 3n−1 − 2n−2
Therefore
Bootstrap Methods 35/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example
The estimate θˆ represent improvement in the sense of bias 1
reduction on the basic bootstrap estimate θˆ = θ(F1) To check the bias reduction observe that for general
distributions with finite third moments
E(x ̄3) = μ3 + n−13μσ2 + n−2γ
E(x ̄σˆ2) = μσ2 + n−1(γ − μσ2) − n−2γ E ( γˆ ) = γ 1 − 3 n − 1 + 2 n − 2
Bootstrap Methods 36/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Example
In the case of general population
E(θˆ1) − θ0 = n−23(μσ2 − γ) + n−36γ − n−42γ
In the case of normal population
E(θˆ1) − θ0 = n−23μσ2
In the case of exponential population
E(θˆ1) − θ0 = −μ3(9n−2 + 12n−3 + 4n−4)
Bootstrap Methods 37/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Remarks
Therefore bootstrap bias reduction has diminished bias to at most O n−2 in each case.
This is compared with the bias of θˆ which is of size n−1 unless μ = 0.
Bootstrap bias correction reduces the order of magnitude of the bias by the factor n−1.
Bootstrap Methods 38/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Nonparametric bootstrap
Finding the ideal bootstrap estimate of E {ft (F1, F2)|F1} requires complete enumeration of Fˆ2, which is not practical when the sample size n is even moderate.
1 n
I (xi∗ ≤ x)
where X∗ = {x1∗,…,xn∗} is obtained by sampling randomly
F2(x) = n
with replacement from the original X = {x1, …, xn}.
Instead, B i.i.d. samples, each of size n, are drawn from Fˆ = F1, producing B nonparametric bootstrap samples.
∗ ∗ ∗iid
Denote them as Xi = {xi1,··· ,xin} = F1 for i = 1,··· ,B.
i=1
Bootstrap Methods 39/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Nonparametric bootstrap
The empirical estimation of {θˆ(F2i ), i = 1, · · · , B} is used to approximate the ideal bootstrap equation E {ft (F1, F2)|F1} which further approximates the population equation of E{ft(F0,F1)|F0}, allowing inference.
The simulation error in approximating the ideal bootstrap of E {ft (F1, F2)|F1} can be made arbitrarily small by increasing B.
A key requirement of bootstrapping is that the data to be resampled must be an i.i.d. sample.
Bootstrap Methods 40/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Parametric bootstrap
When a parametric model is assumed for the data, namely iid
x1,··· ,xn = F(x|θ), the cdf F(x|θ) can be parametrically estimated by F(x|θˆ) instead of being estimated by the empirical cdf Fˆ.
To estimate the distribution of E{ft(F0,F1)|F0}, one can draw B i.i.d. samples, each of size n, from F(x|θˆ), producing B parametric bootstrap samples. Denote them as
∗ ∗ ∗iid ˆ
Xi = {xi1,··· ,xin} = F(x|θ) for i = 1,··· ,B.
Bootstrap Methods 41/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Parametric bootstrap
The empirical estimation {f X∗,F(x|θˆ), i = 1,··· ,B} is ti
then used to approximate the ideal bootstrap equation
E {ft (F1, F2)|F1} and further approximates the population equation of E{ft(F0,F1)|F0}
If the parametric model is not good, the parametric bootstrap can give misleading inference.
Bootstrap Methods 42/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrapping samples in regression
Consider a multiple regression model, Yi = xTi β + εi , for iid
i = 1,··· ,n, where ε1,··· ,εn = F with EF(εi) = 0 and VarF (ε) = σ2.
The observed data are {z1 = (x1,y1),··· ,zn = (xn,yn)}.
It is wrong to generate bootstrap samples from {y1, · · · , yn} and from {x1,··· ,xn} independently, because {y1,··· ,yn} are not i.i.d. samples.
Two appropriate ways to construct bootstrap samples from the observed data are bootstrap the residuals and bootstrap the cases.
Bootstrap Methods 43/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrapping samples in regression
Bootstrap the residuals
1. Fit the regression model to the observed data. Obtain the fitted responses yˆ = xT θˆ and residuals εˆ = y − yˆ .
ii iii
2. Bootstrap residuals from {εˆ1, · · · , εˆn} to get {εˆ , · · · , εˆn}. 1
Note {εˆ1, · · · , εˆn} are not i.i.d. but roughly so if the regression model is correct.
i = 1, · · · , n.
4. Fit the regression model to {(x1, Y1∗), · · · , (xn, Yn∗)} to get
3. Create a bootstrap sample of responses: Y = yˆi + εˆ for ii
bootstrap estimate (θˆ∗, σˆ∗) of (θ, σ).
5. Repeat this process B times to obtain {(θˆ∗, σˆ∗), · · · (θˆ∗ , σˆ∗ )},
11 BB from which an empirical cdf (F2) can be built for inference.
∗∗
∗∗
Bootstrap Methods 44/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrapping samples in regression (3)
Bootstrap the cases (also called paired bootstrap)
1. Treat the observed data {z1 = (x1,y1),··· ,zn = (xn,yn)} as
i.i.d. from a cdf F(x,y).
2. Create a bootstrap sample {Z∗1, · · · , Z∗n} by sampling with
replacement from {z1, · · · , zn}.
3. Fit the regression model to {Z∗1, · · · , Z∗n} to get bootstrap
estimate (θˆ∗, σˆ∗) of (θ, σ).
4. Repeat this process B times to obtain {(θˆ∗, σˆ∗), · · · (θˆ∗ , σˆ∗ )},
11 BB from which an empirical cdf can be built for inference.
5. Bootstrapping the cases is less sensitive to violations in the regression model assumptions (i.e. adequacy of the model and constancy of σ2) than bootstrapping the residuals.
Bootstrap Methods 45/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap Bias correction resume
The population and sample equations are given by E {θ(F1) − θ(F0) + t|F0} = 0
E {θ(F2) − θ(F1) + t|F1} = 0 The solution of the latter is
∗ t =ˆt =θ(F1)−E{θ(F2)|F1}=θˆ−E θˆ |F1
Bootstrap Methods 46/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Bootstrap Bias correction resume
The bootstrap bias corrected estimator is thus ∗
θˆ1=θˆbc=θˆ+ˆt=2θˆ−E θˆ|Fˆ ˆ∗
Estimation of E θ |F1 is obtained through numerical
approximation.
Condition on X , we compute independent values of θˆ∗, …, θˆ∗ 1B
and take
to be the numerical approximation to E
ˆ∗ ˆ θ |F
1 B Bb
θˆ∗ b=1
Bootstrap Methods 47/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
ˆˆ Bootstrap estimation of bias(θ) and se(θ)
Bias(θˆ) = EF (θˆ) − θ and se(θˆ) = VarF (θˆ) are the two basic attributes of the estimator θˆ that we can use bootstrap
analysis to estimate.
Supposeθ=T(F)=θ(F0)andθˆ=T(Fˆ)=θ(F1)or θˆ = T (F (·|θˆ)) for some functional T .
ˆ or ˆ ˆ
Let R(X,F) = T(F) − T(F) = T(F(·|θ)) − T(F) = θ − θ.
Then bias(θˆ) = EF[R(X,F)] and Var(θˆ) = VarF[R(X,F)] are population moments of R(X,F), which can be estimated by the population moments of the ideal bootstrap distribution of R(X∗,Fˆ) or R(X∗,F(·|θˆ)) per the bootstrap principle.
They can be further estimated by the sample moments of R(X∗,Fˆ) or R(X∗,F(·|θˆ)), calculated from the bootstrap samples.
Bootstrap Methods 48/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
ˆˆ Nonparametric bootstrap estimation of bias(θ) and se(θ)
Computing steps for obtaining nonparametric bootstrap estimates of bias(θˆ) and se(θˆ) are as following:
1◦ Compute θˆ from the observed sample xn = (x1, · · · , xn).
2◦ Generate B (typically B ≥ 999) nonparametric bootstrap
samples of size n from the observed sample.
3◦ For each bootstrap sample, compute an estimate of θ in the same way as estimating θ by θˆ. The new estimates of θ are called the bootstrap replicates of θˆ and are denoted as
θˆ ∗ , · · · , θˆ ∗ .
1B
4◦ Compute θˆ∗ = B−1 B θˆ∗ and estimate bias(θˆ) by
r=1 r
b (θˆ)=θˆ∗−θˆ;computese (θˆ)= 1 B (θˆ∗−θˆ∗)2
B B B−1 r=1 r and estimate se(θˆ) by seB(θˆ).
Bootstrap Methods 49/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
ˆˆ Parametric bootstrap estimation of bias(θ) and se(θ)
Parametric bootstrap estimation proceeds the same way as the nonparametric bootstrap estimation except for in step 2◦ where bootstrap samples of size n are generated from F(x|θˆ).
Remark:
A bootstrap estimate of MSE(θˆ) = EF [(θˆ − θ)2] may be
obtained as MSE (θˆ) = 1 B (θˆ∗ − θˆ)2. B Br=1r
Bootstrap Methods 50/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Confidence interval
A symmetric confidence interval for θ0 = θ(F0) may be constructed by applying the resampling principle using
ff (F0,F1)=I{θ(F1)−t≤θ(F0)≤θ(F1)+t}−(1−α) This is for 100(1 − α)% confidence interval for θ0
Example: For a confidence level α = 0.05 we have a 95% confidence interval
The sample equation
E{ff (F1,F2)|F1}=0
Bootstrap Methods 51/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Confidence interval
leads to the equation
P{θ(F2)−t ≤θ(F1)≤θ(F2)+t|F1}−(1−α)=0
and
ˆt0 =inf{t :P{θ(F2)−t ≤θ(F1)≤θ(F2)+t|F1}−(1−α)≥0}
is a solution
θˆ − ˆt , θˆ + ˆt is a bootstrap confidence interval for
00
θ0 = θ(F0), called a two sided symmetric percentile interval
Bootstrap Methods 52/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Confidence interval
Other nominal 100(1 − α)% percentile intervals include: the two-sided equal-tailed interval θˆ − ˆt , θˆ + ˆt where ˆt and
ˆt02 solve
P{θ(F1)≤θ(F2)−t|F1}−α2 =0
P {θ(F1) ≤ θ(F2) + t|F1} − 1 − α = 0
2
It is called equal-tailed because it attempts to place equal
probability in each tail
Pθ ≤θˆ−ˆt ≈Pθ ≥θˆ+ˆt ≈α 0 01 0 02 2
01 02 01
Bootstrap Methods 53/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Confidence interval
The ideal form of this interval obtained by solving the population equation rather than the sample equation does place equal probability in each tail
The one-sided interval −∞, θˆ + ˆt where ˆt solves 03 03
P {θ(F1) ≤ θ(F2) + t|F1} − (1 − α) = 0
is also a nominal 100(1 − α)% percentile interval.
Bootstrap Methods 54/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Confidence interval
Other 100(1 − α)% percentile interval are
ˆI = θˆ−ˆt ,θˆ+ˆt and ˆI = −∞,θˆ+ˆt where ˆt
2 02 01 1 04 04
solves
P {θ(F1) ≤ θ(F2) − t|F1} − α = 0
Define θˆ∗ = θ(F2), Hˆ(x) = P θˆ∗ ≤ x|X and
Hˆ−1(α) = inf x : Hˆ(x) ≥ α
ThenˆI2 =Hˆ−1(α/2),Hˆ−1(1−α/2)and ˆI1 =−∞,Hˆ−1(1−α)
Bootstrap Methods 55/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Normal approximation
In many cases, θˆ−θ ≈d N (0, 1), e.g., when θˆ is the MLE se (θˆ)
Then an approximate 100(1 − α)% CI for θ would be θˆ± z1−α2 × se(θˆ) where z1−α2 = Φ(1 − α2 ).
If bootstrap replicates are available, we use seB(θˆ) to estimare seB(θ) (if otherwise difficult to estimate), and estimate θ by
θˆ−bB(θˆ)=2θˆ−θˆ∗ (θˆ−bias(θˆ)isan”unbiasedestimator”of θ). This suggests the 100(1 − α)% normal approximation based bootstrap CI for θ:
(2θˆ − θˆ∗) − z1− α2 · seB (θˆ), (2θˆ − θˆ∗) + z1− α2 · seB (θˆ)
Bootstrap Methods 56/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Percentile bootstrap confidence intervals
∗∗ θˆα ,θˆα
( 2 [B+1]) ((1− 2 )[B+1])
Uses the distribution of the statistics from the bootstrap
directly for the percentiles approximation.
Tends to be highly asymmetric.
It is prone to bias and inaccurate coverage probabilities.
Works better if θ is a location parameter.
Bootstrap Methods 57/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Percentile bootstrap confidence intervals
A justification on the percentile method bootstrap CI for θ
Assume the existence of a continuous and strictly increasing
transformation ψ, and a continuous cdf H with symmetric pdf
(implyingH(z)=1−H(−z))suchthatψ(θˆ)−ψ(θ)=d H.
This assumption is likely to be reasonable although it may be difficult to find such ψ and H. However, it turns out that we don’t need explicit specification of ψ and H.
Now we know
P h ≤ ψ(θˆ) − ψ(θ) ≤ h = 1 − α (1)
α/2
where hα is the α quantile of H.
1−α/2
Bootstrap Methods 58/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Percentile bootstrap confidence intervals
A justification on the percentile method CI for θ (continued)
Applying the bootstrap principle to (1), we have
1−α ≈ P∗h ≤ψ(θˆ∗)−ψ(θˆ)≤h α/2 1−α/2
= P∗hα/2 + ψ(θˆ) ≤ ψ(θˆ∗) ≤ h1−α/2 + ψ(θˆ)
= P∗ψ−1hα/2 +ψ(θˆ)≤θˆ∗ ≤ψ−1h1−α/2 +ψ(θˆ).(2)
Hence ψ−1hα/2 +ψ(θˆ) ≈ ξα/2 and ψ−1h1−α/2 +ψ(θˆ) ≈ ξ1−α/2 with ξα being the α quantile of the ideal bootstrap
distribution P∗() of θˆ∗ = θ(F2) which can be estimated by θˆ∗ ≈ θˆ∗ , the sample quantiles (order statistic) from B
α ([B +1]α)
bootstrap replicates of θˆ∗.
Bootstrap Methods 59/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Percentile bootstrap confidence intervals
A justification on the percentile method CI for θ (continued)
On the other hand, (1) can be rewritten as
P ψ−1hα/2 +ψ(θˆ) ≤ θ ≤ ψ−1h1−α/2 +ψ(θˆ) = 1 − α (3)
noting that H has a symmetric pdf so that hα/2 = −h1−α/2 .
Therefore, by comparing (2) and (3) we know
∗∗∗∗ θˆ,θˆ ≈θˆ ,θˆ
α/2 1−α/2 ([B +1]α/2) ([B +1](1−α/2))
can serve as an approximate 100(1 − α)% C.I. for θ, which is called the (basic) percentile bootstrap CI.
Bootstrap Methods 60/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Basic (or residual)
Taking ψ to be the identical transformation, eq. (2) becomes P[hα/2 ≤θˆ−θ≤h1−α/2]=1−α (4)
We call θˆ − θ the residual of the estimator θˆ.
By the bootstrap principle, h ≈ (θˆ∗ − θˆ) = θˆ∗ − θˆ where θˆ∗
1−α/2
This suggests the following approximate 100(1 − α)% basic
αααα
is the α sample quantile of θˆ∗. Using this approximation, (4) becomes P[θˆ∗ − θˆ ≤ θˆ − θ ≤ θˆ∗ − θˆ] ≈ 1 − α, which is
α/2 P[2θˆ−θˆ∗
1−α/2 ≤θ≤2θˆ−θˆ∗ ]≈1−α.
(or residual) bootstrap CI for θ:
[2θˆ−θˆ∗ , 2θˆ−θˆ∗ ] ≈ [2θˆ−θˆ∗ ≤ θ ≤ 2θˆ−θˆ∗ ] 1−α/2 α/2 ([B +1](1−α/2)) ([B +1]α/2)
α/2
Bootstrap Methods 61/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
BCa bootstrap confidence intervals
The basic (residual) bootstrap CI tends to suffer from the same defects as the basic percentile bootstrap CI does. Namely, it is prone to bias and inaccurate coverage probabilities.
For these two CIs to work well, it requires the cdf H there to be free from θ. This implies that a stronger transformation ψ is in need to get a pivotal quantity for θˆ, and to find the CI based on the pivot.
The bias corrected and accelerated percentile method, or BCa, is motivated by this finding, and has been used to derive CIs for θ with substantial improvement over the previous two percentile methods.
Bootstrap Methods 62/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
BCa bootstrap confidence intervals
Suppose there is a strictly increasing transformation ψ such that ψ(θˆ) has a normal distribution with
Eψ(θˆ)=ψ(θ) − c0 [1+aψ(θ)] and Varψ(θˆ) = [1+aψ(θ)]2 . Namely, ψ(θˆ)−ψ(θ) +c0 =d N(0,1). (5)
1 + aψ(θ)
Ifzp isthe100pthpercentileofN(0,1)withp=1−α2,then
ψ(θˆ)−ψ(θ)
P −zp ≤ 1+aψ(θ) +c0 ≤zp =1−α
ψ(θˆ)+c −z ψ(θˆ)+c +z
⇒P 0p≤ψ(θ)≤ 0p=1−α
1−a(c0 −zp) 1−a(c0 +zp)
Bootstrap Methods 63/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
BCa bootstrap confidence intervals
which suggests a 100(1 − α)% CI for ψ(θ) as
ψ(θˆ)+c +z
0 p , not computable as ψ unknown;
ψ(θˆ)+c −z
L = 0 p ,
U =
And a 100(1 − α)% CI for θ would be [ψ−1(L), ψ−1(U)].
1−a(c0 −zp)
1−a(c0 +zp)
By the bootstrap principle and (5), ψ(θˆ∗)−ψ(θˆ) + c0 ≈d N(0, 1). 1+aψ(θˆ)
Thusitcanbeverifiedthat(notep=1−α2)
c +z 0 p
+ c 0 ≈Φ 1−a(c0+zp)+c0 = pU.
ψ(θˆ∗)−ψ(θˆ)
P ∗ ψ ( θˆ ∗ ) ≤ U = P ∗
1 + a ψ ( θˆ )
+ c 0 ≤ c0 + zp
1 − a ( c 0 + z p ) denoted
Bootstrap Methods 64/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
BCa bootstrap confidence intervals
Hence U is approximately the pU quantile of the cdf of ψ(θˆ∗).
ψ−1(U) is the UCL of the 100(1−α)% CI for θ as ψ is strictly
increasing. It is also approximately the pU quantile of the cdf
of ψ−1(ψ(θˆ∗)) = θˆ∗, which can be estimated by θˆ∗ , the p
sample quantile of the bootstrap replicates of θˆ∗. Similarly(notep=1−α2)
pU U
ψ(θˆ∗)−ψ(θˆ)
c −z 0 p + c 0
P ∗ ψ ( θˆ ∗ ) ≤ L = P ∗
+ c 0 ≤
c0 − zp denoted
So the LCL of the 100(1−α)% CI for θ can be estimated by
θˆ∗ , the p sample quantile of the bootstrap replicates of θˆ∗. pL L
1 + a ψ ( θˆ )
1 − a ( c 0 − z p )
≈Φ 1−a(c0−zp)+c0 = pL,
Bootstrap Methods 65/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
BCa bootstrap confidence intervals (6)
Given values of c0 and a, we can compute pU and pL , and accordingly a 100(1 − α)% BCa bootstrap CI of θ as
∗∗∗∗
θˆ , θˆ ≈ θˆ , θˆ
pL pU ([B+1]pL ) ([B+1]pU )
Bootstrap Methods 66/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Studentized bootstrap confidence intervals
A more intuitive way to construct an appropriate pivot for bootstrap is the studentized bootstrap or bootstrap t method.
Suppose θ = T(F) is to be estimated using θˆ = T(Fˆ), with V(Fˆ) estimating the variance of θˆ.
T ( Fˆ ) − T ( F )
Then it is reasonable to expect that R(X,F) = √ ˆ
V(F) will be roughly pivotal. Bootstrapping R(X,F) yields a
collection of R(X∗,Fˆ).
Denote by Gˆ and Gˆ∗ the distributions of R(X,F) and R(X∗,Fˆ) respectively.
Bootstrap Methods 67/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Studentized bootstrap confidence intervals
Theoretically a 100(1 − α)% CI for θ can be obtained using Pξα2 (Gˆ)≤R(X,F)≤ξ1−α2 (Gˆ)
ˆ ˆˆ ˆ ˆˆ
=P θ−ξ1−α2 (G) V(F)≤θ≤θ−ξα2 (G) V(F) =1−α
where ξα(Gˆ ) is the α quantile of Gˆ . These quantiles are unknown but can be estimated under the bootstrap principle, so ξα(Gˆ ) ≈ ξα(Gˆ ∗).
This gives the 100(1 − α)% studentized bootstrap CI of θ: ˆˆ∗ˆˆˆ∗ˆ
T(F)−ξ1−α2 (G ) V(F), T(F)−ξα2 (G ) V(F) ∗∗
ˆˆˆˆˆˆ = θ−ξ1−α2 (G ) Var(θ), θ−ξα2 (G ) Var(θ)
where ξα(Gˆ∗) is the α quantile of Gˆ∗.
Bootstrap Methods 68/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Studentized bootstrap confidence intervals (3)
To calculate the studentized bootstrap CI for θ, we need the estimated variance V(Fˆ), which can be approximated by the bootstrap estimate sd2B(θˆ) or by using a delta method.
A more difficult problem in calculating the studentized bootstrap CI for θ is finding the ξα(Gˆ ∗) values. Note ξα(Gˆ ∗) is
∗ ˆ T ( Fˆ ∗ ) − T ( Fˆ ) ˆ ∗ theαquantileofR(X ,F)= √ ˆ∗ w.r.t. thecdfG .
V(F )
The bootstrap replicates of R(X∗,Fˆ) for B given bootstrap
samples are θˆ1∗−θˆ ,··· , θˆB∗−θˆ . Using sd2B(θˆ) to replace
ˆ∗ Var(θ1 )
ˆ∗ Var(θB )
all Var(θˆ∗)’s ignores their variation, which reduces to the basic j
(or residual) bootstrap CI method.
Bootstrap Methods 69/70
n
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction §6.3 Confide
Studentized bootstrap confidence intervals
Coverage probability of the studentized bootstrap CI closely approximates the nominal confidence level in general.
The approximation is most reliable when T(Fˆ) is a location statistic in the sense that a constant shift in all the data values will induce the same shift in T(Fˆ).
It is however sensitive to the presence of outliers in the dataset, so use the studentized bootstrap CI with caution in such cases.
Bootstrap Methods 70/70
n