§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap Methods
MAST90083 Computational Statistics and Data Mining
School of Mathematics & Statistics The University of Melbourne
Bootstrap Methods 1/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Outline
§6.1 Introduction
§6.2 Bootstrap principle
§6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap Methods 2/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Introduction
Bootstrap methods use computer simulation to reveal aspects of the sampling distribution for an estimator θˆ of interest.
With the power of modern computers the approach has broad applicability and is now a practical and useful tool for applied statisticians and data scientists
Bootstrap Methods 3/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Introduction
The bootstrap is a general tool for assessing statistical accuracy
It is based on re-sampling strategy
Having the estimated feature of the data that we compute based on the sample on hand, we are interested to understand how the estimate changes for a different sample
Examples of features: prediction accuracy, the mean value, etc.
But unfortunately we cannot use more than one sample
Solution: bootstrap
Bootstrap Methods 4/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Introduction
The idea behind the bootstrap is an old one.
Assume we wish to estimate a functional of a population distribution function F, such as the population mean
θˆ =
xdFˆ (x) = x ̄
θ =
xdF (x)
Consider employing the same functional of the sample (or empirical) distribution function Fˆ, which in this case leads to the sample mean
Bootstrap Methods 5/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Introduction
One can use θˆ = x ̄ to estimate θ.
Evaluating the variability in this estimation would require the sampling distribution of x ̄.
Bootstrap Methods 6/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Empirical distribution
The empirical distribution is that probability measure that assigns to a set a measure equal to the proportion of samples that lie in that set
ˆ 1n
δ (x − xi )
for a set x1,…,xn of i.i.d from F, where δ(x −xi) represents a“point mass”at xi (that assigns full probability to the point xi and zero to all other points).
Fˆ is the discrete distribution that assigns a mass 1/n to each point xi , 1 ≤ i ≤ n.
BytheL.L.N.Fˆ→p F asn→∞.
F (x ) = n
i=1
Bootstrap Methods 7/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Sample and resample
A sample X = {x1, …, xn} is a collection of n numbers (or vectors), without regard to order drawn at random from the population F.
The xi′s are therefore i.i.d. random variables each having the population distribution function F
A resample X∗ = {x1∗,…,xn∗} is an unordered collection of n items randomly drawn from X with replacement
It is known as a bootstrap sample and is a central step of the nonparametric bootstrap method
Bootstrap Methods 8/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Resample
Each xi∗ has probability 1/n of being equal to any given one x j′ s
P(xi∗=xj|X)=n1, 1≤i,j≤n The xi∗’s are i.i.d. conditional on X.
X∗ is likely to contain repeats, all of which must be listed in X∗.
Example: X ∗ = {1.5, 1.7, 1.7, 1.8} is different from {1.5, 1.7, 1.8} and X ∗ is the same as {1.5, 1.7, 1.8, 1.7}, {1.7, 1.5, 1.8, 1.7}.
Bootstrap Methods 9/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Population and sample distribution
F is the population distribution of X whereas Fˆ is its sample distribution.
Fˆ on the other hand is the distribution function of the population from which X∗ is drawn.
F,Fˆ is generally written (F0,F1) in bootstrap iteration, where i ≥ 1.
Fi denotes the distribution function of a sample drawn from Fi−1 conditional on Fi−1.
The ith application of the bootstrap is termed ith iteration, not the (i − 1)th iteration
Bootstrap Methods 10/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Estimation as functional
An estimate θˆ is a function of the data and a functional of the sample distribution function Fˆ
Example: The sample mean
ˆ 1n
whereas the population mean
θ = θ (F ) =
xdF (x ).
xi θ=θ F = xdF(x)
θ = θ [X ] = n
ˆ ˆ ˆ
i=1
Bootstrap Methods 11/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
Bootstrap Methods 12/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
Assume we can’t observe“doll 0”→ it represents the population in a sampling scheme
We wish to estimate the number n0 of freckles on its face.
Let ni denotes the number of freckles on the face of“doll i”
Assuming the ration of n1/n2 close to the ratio n0/n1, we have nˆ0 ≃ n12/n2
The key feature of this argument is our hypothesis that the relationship between n2 and n1 should closely resemble that between n1 and the unknown n0
Bootstrap Methods 13/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
Statistical inference amounts to describing the relationship between a sample and the population from which the sample is drawn
Formally: Given a functional ft from a class {ft : t ∈ τ }, we aim to find t0 such that
E{ft (F0,F1)|F0} = EF0 {ft (F0,F1)} = 0
where F0 = F (population distribution) and F1 = Fˆ (sample
distribution)
we want to find t0 the solution of the population equation (because we need properties of the population to solve this equation exactly)
Bootstrap Methods 14/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
Example:
Let θ0 = θ(F) = θ(F0) be the true parameter value, such as the rth power of a mean
θ0 =
r xdF0(x)
Let θˆ = θ(F1) be the bootstrap estimate of θ0
θˆ =
where F1 is the empirical distribution function
r xdF1(x) = x ̄r
Bootstrap Methods 15/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example: Bias correction
Correcting θˆ for bias is equivalent to finding t0 that solves EF0 {ft (F0, F1)} = 0
where
ft (F0,F1)=θ(F1)−θ(F0)+t and the bias corrected estimate is θˆ + t0
Bootstrap Methods 16/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example: Confidence interval
To construct a symmetric, (1 − α) confidence interval for θ0 is equivalent to using
ft (F0, F1) = I {θ(F1) − t ≤ θ(F0) ≤ θ(F1) + t} − (1 − α)
where I(.) denotes the indicator of the event that the true
parameter value θ(F0) lies in the interval
[ θ ( F 1 ) − t , θ ( F 1 ) + t ] = θˆ − t , θˆ + t
minus the nominal coverage 1 − α of the interval. Asking that E{ft (F0,F1)|F0} = 0
is equivalent to insisting that t be chosen so that the interval has zero coverage error.
Bootstrap Methods 17/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
The equation
E{ft (F0,F1)|F0} = 0
provides an explicite description of the relationship between F0
and F1 we are trying to determine.
The analogue in the case of the number of freckles problem is
n0 − tn1 = 0
where ni is the number of freckles on doll“i”
Bootstrap Methods 18/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
If we had t = t0 solving the equation, then n0 = t0n1.
The estimation of t0 is obtained from the pair (n1,n2) we
know
n1 − tn2 = 0
we obtain the solution ˆt0 of this equation and thereby
is the estimate of n0
nˆ 0 = ˆt 0 n 1 = n 12 n2
Bootstrap Methods 19/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
Similarly, the population equation
E{ft (F0,F1)|F0} = 0
is solved via the sample equation
E{ft (F1,F2)|F1} = 0
where F2 is the distribution function of a sample drawn from F1 is the analogue of n2.
Bootstrap Methods 20/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
The solution ˆt0 is a function of the sample values
The idea is that the solution of the sample equation should be a good approximation of the solution of the population equation
The population equation is not obtainable in practice → this is the bootstrap principle.
Bootstrap Methods 21/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
We call ˆt0 and E {ft (F1, F2) |F1} “the bootstrap estimates” of t0 and E {ft (F0, F1) |F0}.
They are obtained by replacing F0 and F1 in the formulae for t0
The bootstrap version of the bias corrected estimate is θˆ + ˆt0 The bootstrap confidence interval is θˆ − ˆt , θˆ + ˆt called the
00 symmetric percentile method confidence interval for θ0
Bootstrap Methods 22/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Parametric vs Nonparametric
In both parametric and nonparametric problems, the inference is based on a sample X of size n (n i.i.d. observations of the population)
In nonparametric case F1, is the empirical distribution function of X
Similarly F2 is the empirical distribution function of a sample drawn at random from the population F1
Bootstrap Methods 23/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Nonparametric
It is the empirical distribution of a sample X∗ drawn randomly with replacement from X
If we denote the population by X0, then we have a nest of sampling operations
X is drawn at random from X0 X∗ is drawn at random from X
Bootstrap Methods 24/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Parametric
In this case F0 is assumed completely known up to a finite vector λ0 of unknown parameters.
F0 = F(λ0) is an element of a class F(λ), λ ∈ Λ of possible distributions
Then F1 = F(λˆ), the distribution function obtained using the sample estimate λˆ obtained from X often (but not necessary)
using maximum likelihood estimate
Let X∗ denotes the sample drawn at random from F(λˆ) and
F 2 = F ( λˆ ∗ )
In both cases, X∗ is obtained by resampling from a distribution determined by the original sample X
Bootstrap Methods 25/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
Estimate of the MSE
2ˆ2 2
τ =E θ−θ0 =E [θ(F1)−θ(F0)] |F0 has bootstrap estimate
2 ˆ∗ ˆ2 2 τˆ =E θ −θ |X =E [θ(F2)−θ(F1)] |F1
whereθˆ∗ =θ[X∗]isanestimateversionofθˆobtainedusing X∗ instead of X
Bootstrap Methods 26/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bias correction
Here we have
ft (F0,F1)=θ(F1)−θ(F0)+t and the sample equation
E{ft (F1,F2)|F1}=E{θ(F2)−θ(F1)+t|F1}=0 whose solution is
t = ˆt 0 = θ ( F 1 ) − E { θ ( F 2 ) | F 1 }
Bootstrap Methods 27/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bias correction
The bootstrap bias-reduced estimate is this
θˆ 1 = θˆ + ˆt 0 = θ ( F 1 ) + ˆt 0 = 2 θ ( F 1 ) − E { θ ( F 2 ) | F 1 }
Note that the estimate θˆ = θ(F1) is also a bootstrap functional since it is obtained by replacing F1 for F0 in the functional formula θ0 = θ(F0).
the expectation E{θ(F2)|F1} is computed (or approximated) by Monte Carlo simulation
Bootstrap Methods 28/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bias correction
Draw B resamples {Xb∗, 1 ≤ b ≤ B} independently from the distribution function F1
In the nonparametric case F1 is the empirical distribution of the the sample X
Let F2b denotes the empirical distribution function of Xb∗
In the parametric case, λˆ∗b = λ (Xb∗) is the estimate of λ0 obtained from Xb∗ and F2b = F(λˆ ∗b )
Bootstrap Methods 29/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bias correction
Define θˆ∗ = θ(F2b), then b
1BB
uˆ B = B
b=1
converge to (as B → ∞) uˆ=E{θ(F2)|F1}=E θˆ|X
θ ( F 2 b ) = B
−1 ∗ θˆ b
b=1
∗
Bootstrap Methods 30/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example (1)
Let
μ =
X = {x1,…,xn} and
xdF0(x) and assume θ0 = θ(F0) = μ3 1 n
x ̄=n xi i=1
In nonparametric approach
θˆ = θ ( F 1 ) = x ̄ 3
Bootstrap Methods 31/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example (2)
In nonparametric approach
E {θ(F1)|F0} = EF0 xi
3 1n
ni=1 3
1n = E μ + (xi − μ)
ni=1 = μ3 + n−13μσ2 + n−2γ
whereσ2 =E(x−μ)2 andγ=E(x−μ)3 denotethe population variance and skewness
Bootstrap Methods 32/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
In the nonparametric case
E {θ(F2)|F1} = x ̄3 + n−13x ̄σˆ2 + n−2γˆ
whereσˆ2 =n−1(xi −x ̄)andγˆ=n−1(xi −x ̄)3 denote the sample variance and skewness
Therefore the bootstrap bias reduced estimate is
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = 2x ̄3 − x ̄3 + n−13x ̄σˆ2 + n−2γˆ
= x ̄3 − n−13x ̄σˆ2 − n−2γˆ
Bootstrap Methods 33/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
If the population is normal N(μ, σ2), γ = 0 and E {θ(F1)|F0} = μ3 + n−13μσ2
The maximum likelihood could be used to estimate λˆ = x ̄ , σˆ 2
θ(F2) is the statistic θˆ computed for a sample from a normal x ̄,σˆ2 distribution and in direct analogy we have
E { θ ( F 2 ) | F 1 } = x ̄ 3 + n − 1 3 x ̄ σˆ 2
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = x ̄3 − n−13x ̄σˆ2
Therefore
Bootstrap Methods 34/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
If the population is exponential with mean μ and −1 x
fμ(x)=μ exp −μ forx>0 hereσ2 =μ2 andγ=2μ3
E {θ(F1)|F0} = μ3 1 + 3n−1 + 2n−2 Taking the maximum likelihood estimate x ̄ for μ
E {θ(F2)|F1} = x ̄3 1 + 3n−1 + 2n−2
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = x ̄3 1 − 3n−1 − 2n−2
Therefore
Bootstrap Methods 35/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
The estimate θˆ represent improvement in the sense of bias 1
reduction on the basic bootstrap estimate θˆ = θ(F1) To check the bias reduction observe that for general
distributions with finite third moments
E(x ̄3) = μ3 + n−13μσ2 + n−2γ
E(x ̄σˆ2) = μσ2 + n−1(γ − μσ2) − n−2γ E ( γˆ ) = γ 1 − 3 n − 1 + 2 n − 2
Bootstrap Methods 36/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
In the case of general population
E(θˆ1) − θ0 = n−23(μσ2 − γ) + n−36γ − n−42γ
In the case of normal population
E(θˆ1) − θ0 = n−23μσ2
In the case of exponential population
E(θˆ1) − θ0 = −μ3(9n−2 + 12n−3 + 4n−4)
Bootstrap Methods 37/38
§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Remarks
Therefore bootstrap bias reduction has diminished bias to at most O n−2 in each case.
This is compared with the bias of θˆ which is of size n−1 unless μ = 0.
Bootstrap bias correction reduces the order of magnitude of the bias by the factor n−1.
Bootstrap Methods 38/38