程序代做CS代考 scheme data mining §6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap Methods
MAST90083 Computational Statistics and Data Mining

School of Mathematics & Statistics The University of Melbourne
Bootstrap Methods 1/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Outline
§6.1 Introduction
§6.2 Bootstrap principle
§6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap Methods 2/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Introduction
􏰔 Bootstrap methods use computer simulation to reveal aspects of the sampling distribution for an estimator θˆ of interest.
􏰔 With the power of modern computers the approach has broad applicability and is now a practical and useful tool for applied statisticians and data scientists
Bootstrap Methods 3/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Introduction
􏰔 The bootstrap is a general tool for assessing statistical accuracy
􏰔 It is based on re-sampling strategy
􏰔 Having the estimated feature of the data that we compute based on the sample on hand, we are interested to understand how the estimate changes for a different sample
􏰔 Examples of features: prediction accuracy, the mean value, etc.
􏰔 But unfortunately we cannot use more than one sample
􏰔 Solution: bootstrap
Bootstrap Methods 4/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Introduction
􏰔 The idea behind the bootstrap is an old one.
􏰔 Assume we wish to estimate a functional of a population distribution function F, such as the population mean
􏰐
θˆ =
xdFˆ (x) = x ̄
θ =
xdF (x)
􏰔 Consider employing the same functional of the sample (or empirical) distribution function Fˆ, which in this case leads to the sample mean
􏰐
Bootstrap Methods 5/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Introduction
􏰔 One can use θˆ = x ̄ to estimate θ.
􏰔 Evaluating the variability in this estimation would require the sampling distribution of x ̄.
Bootstrap Methods 6/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Empirical distribution
􏰔 The empirical distribution is that probability measure that assigns to a set a measure equal to the proportion of samples that lie in that set
ˆ 1􏰏n
δ (x − xi )
􏰔 for a set x1,…,xn of i.i.d from F, where δ(x −xi) represents a“point mass”at xi (that assigns full probability to the point xi and zero to all other points).
􏰔 Fˆ is the discrete distribution that assigns a mass 1/n to each point xi , 1 ≤ i ≤ n.
􏰔 BytheL.L.N.Fˆ→p F asn→∞.
F (x ) = n
i=1
Bootstrap Methods 7/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Sample and resample
􏰔 A sample X = {x1, …, xn} is a collection of n numbers (or vectors), without regard to order drawn at random from the population F.
􏰔 The xi′s are therefore i.i.d. random variables each having the population distribution function F
􏰔 A resample X∗ = {x1∗,…,xn∗} is an unordered collection of n items randomly drawn from X with replacement
􏰔 It is known as a bootstrap sample and is a central step of the nonparametric bootstrap method
Bootstrap Methods 8/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Resample
􏰔 Each xi∗ has probability 1/n of being equal to any given one x j′ s
P(xi∗=xj|X)=n1, 1≤i,j≤n 􏰔 The xi∗’s are i.i.d. conditional on X.
􏰔 X∗ is likely to contain repeats, all of which must be listed in X∗.
􏰔 Example: X ∗ = {1.5, 1.7, 1.7, 1.8} is different from {1.5, 1.7, 1.8} and X ∗ is the same as {1.5, 1.7, 1.8, 1.7}, {1.7, 1.5, 1.8, 1.7}.
Bootstrap Methods 9/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Population and sample distribution
􏰔 F is the population distribution of X whereas Fˆ is its sample distribution.
􏰔 Fˆ on the other hand is the distribution function of the population from which X∗ is drawn.
􏰔 􏰁F,Fˆ􏰂 is generally written (F0,F1) in bootstrap iteration, where i ≥ 1.
􏰔 Fi denotes the distribution function of a sample drawn from Fi−1 conditional on Fi−1.
􏰔 The ith application of the bootstrap is termed ith iteration, not the (i − 1)th iteration
Bootstrap Methods 10/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Estimation as functional
􏰔 An estimate θˆ is a function of the data and a functional of the sample distribution function Fˆ
􏰔 Example: The sample mean
ˆ 1􏰏n
􏰔 whereas the population mean
􏰐
θ = θ (F ) =
xdF (x ).
xi θ=θ F = xdF(x)
θ = θ [X ] = n
ˆ 􏰁ˆ􏰂 􏰐 ˆ
i=1
Bootstrap Methods 11/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
Bootstrap Methods 12/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
􏰔 Assume we can’t observe“doll 0”→ it represents the population in a sampling scheme
􏰔 We wish to estimate the number n0 of freckles on its face.
􏰔 Let ni denotes the number of freckles on the face of“doll i”
􏰔 Assuming the ration of n1/n2 close to the ratio n0/n1, we have nˆ0 ≃ n12/n2
􏰔 The key feature of this argument is our hypothesis that the relationship between n2 and n1 should closely resemble that between n1 and the unknown n0
Bootstrap Methods 13/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
􏰔 Statistical inference amounts to describing the relationship between a sample and the population from which the sample is drawn
􏰔 Formally: Given a functional ft from a class {ft : t ∈ τ }, we aim to find t0 such that
E{ft (F0,F1)|F0} = EF0 {ft (F0,F1)} = 0
􏰔 where F0 = F (population distribution) and F1 = Fˆ (sample
distribution)
􏰔 we want to find t0 the solution of the population equation (because we need properties of the population to solve this equation exactly)
Bootstrap Methods 14/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
Example:
􏰔 Let θ0 = θ(F) = θ(F0) be the true parameter value, such as the rth power of a mean
θ0 =
􏰇􏰐 􏰈r xdF0(x)
􏰔 Let θˆ = θ(F1) be the bootstrap estimate of θ0
θˆ =
􏰔 where F1 is the empirical distribution function
􏰇􏰐 􏰈r xdF1(x) = x ̄r
Bootstrap Methods 15/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example: Bias correction
􏰔 Correcting θˆ for bias is equivalent to finding t0 that solves EF0 {ft (F0, F1)} = 0
􏰔 where
ft (F0,F1)=θ(F1)−θ(F0)+t 􏰔 and the bias corrected estimate is θˆ + t0
Bootstrap Methods 16/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example: Confidence interval
􏰔 To construct a symmetric, (1 − α) confidence interval for θ0 is equivalent to using
ft (F0, F1) = I {θ(F1) − t ≤ θ(F0) ≤ θ(F1) + t} − (1 − α)
􏰔 where I(.) denotes the indicator of the event that the true
parameter value θ(F0) lies in the interval
[ θ ( F 1 ) − t , θ ( F 1 ) + t ] = 􏰑 θˆ − t , θˆ + t 􏰒
􏰔 minus the nominal coverage 1 − α of the interval. Asking that E{ft (F0,F1)|F0} = 0
􏰔 is equivalent to insisting that t be chosen so that the interval has zero coverage error.
Bootstrap Methods 17/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
􏰔 The equation
E{ft (F0,F1)|F0} = 0
􏰔 provides an explicite description of the relationship between F0
and F1 we are trying to determine.
􏰔 The analogue in the case of the number of freckles problem is
n0 − tn1 = 0
􏰔 where ni is the number of freckles on doll“i”
Bootstrap Methods 18/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
􏰔 If we had t = t0 solving the equation, then n0 = t0n1.
􏰔 The estimation of t0 is obtained from the pair (n1,n2) we
know
n1 − tn2 = 0
􏰔 we obtain the solution ˆt0 of this equation and thereby
􏰔 is the estimate of n0
nˆ 0 = ˆt 0 n 1 = n 12 n2
Bootstrap Methods 19/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
􏰔 Similarly, the population equation
E{ft (F0,F1)|F0} = 0
􏰔 is solved via the sample equation
E{ft (F1,F2)|F1} = 0
􏰔 where F2 is the distribution function of a sample drawn from F1 is the analogue of n2.
Bootstrap Methods 20/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
􏰔 The solution ˆt0 is a function of the sample values
􏰔 The idea is that the solution of the sample equation should be a good approximation of the solution of the population equation
􏰔 The population equation is not obtainable in practice 􏰔 → this is the bootstrap principle.
Bootstrap Methods 21/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bootstrap principle
􏰔 We call ˆt0 and E {ft (F1, F2) |F1} “the bootstrap estimates” of t0 and E {ft (F0, F1) |F0}.
􏰔 They are obtained by replacing F0 and F1 in the formulae for t0
􏰔 The bootstrap version of the bias corrected estimate is θˆ + ˆt0 􏰔 The bootstrap confidence interval is 􏰑θˆ − ˆt , θˆ + ˆt 􏰒 called the
00 symmetric percentile method confidence interval for θ0
Bootstrap Methods 22/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Parametric vs Nonparametric
􏰔 In both parametric and nonparametric problems, the inference is based on a sample X of size n (n i.i.d. observations of the population)
􏰔 In nonparametric case F1, is the empirical distribution function of X
􏰔 Similarly F2 is the empirical distribution function of a sample drawn at random from the population F1
Bootstrap Methods 23/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Nonparametric
􏰔 It is the empirical distribution of a sample X∗ drawn randomly with replacement from X
􏰔 If we denote the population by X0, then we have a nest of sampling operations
􏰔 X is drawn at random from X0 􏰔 X∗ is drawn at random from X
Bootstrap Methods 24/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Parametric
􏰔 In this case F0 is assumed completely known up to a finite vector λ0 of unknown parameters.
􏰔 F0 = F(λ0) is an element of a class 􏰜F(λ), λ ∈ Λ􏰝 of possible distributions
􏰔 Then F1 = F(λˆ), the distribution function obtained using the sample estimate λˆ obtained from X often (but not necessary)
using maximum likelihood estimate
􏰔 Let X∗ denotes the sample drawn at random from F(λˆ) and
F 2 = F ( λˆ ∗ )
􏰔 In both cases, X∗ is obtained by resampling from a distribution determined by the original sample X
Bootstrap Methods 25/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
􏰔 Estimate of the MSE
2􏰁ˆ􏰂2􏰞 2􏰟
τ =E θ−θ0 =E [θ(F1)−θ(F0)] |F0 􏰔 has bootstrap estimate
2 􏰇􏰁ˆ∗ ˆ􏰂2 􏰈 􏰞 2 􏰟 τˆ =E θ −θ |X =E [θ(F2)−θ(F1)] |F1
􏰔 whereθˆ∗ =θ[X∗]isanestimateversionofθˆobtainedusing X∗ instead of X
Bootstrap Methods 26/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bias correction
􏰔 Here we have
ft (F0,F1)=θ(F1)−θ(F0)+t 􏰔 and the sample equation
E{ft (F1,F2)|F1}=E{θ(F2)−θ(F1)+t|F1}=0 􏰔 whose solution is
t = ˆt 0 = θ ( F 1 ) − E { θ ( F 2 ) | F 1 }
Bootstrap Methods 27/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bias correction
􏰔 The bootstrap bias-reduced estimate is this
θˆ 1 = θˆ + ˆt 0 = θ ( F 1 ) + ˆt 0 = 2 θ ( F 1 ) − E { θ ( F 2 ) | F 1 }
􏰔 Note that the estimate θˆ = θ(F1) is also a bootstrap functional since it is obtained by replacing F1 for F0 in the functional formula θ0 = θ(F0).
􏰔 the expectation E{θ(F2)|F1} is computed (or approximated) by Monte Carlo simulation
Bootstrap Methods 28/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bias correction
􏰔 Draw B resamples {Xb∗, 1 ≤ b ≤ B} independently from the distribution function F1
􏰔 In the nonparametric case F1 is the empirical distribution of the the sample X
􏰔 Let F2b denotes the empirical distribution function of Xb∗
􏰔 In the parametric case, λˆ∗b = λ (Xb∗) is the estimate of λ0 obtained from Xb∗ and F2b = F(λˆ ∗b )
Bootstrap Methods 29/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Bias correction
􏰔 Define θˆ∗ = θ(F2b), then b
1BB
􏰏
uˆ B = B
b=1
􏰔 converge to (as B → ∞) uˆ=E{θ(F2)|F1}=E θˆ|X
θ ( F 2 b ) = B
−1􏰏 ∗ θˆ b
b=1
􏰞∗􏰟
Bootstrap Methods 30/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example (1)
􏰔 Let
μ =
􏰔 X = {x1,…,xn} and
􏰐
xdF0(x) and assume θ0 = θ(F0) = μ3 1 􏰏n
x ̄=n xi i=1
􏰔 In nonparametric approach
θˆ = θ ( F 1 ) = x ̄ 3
Bootstrap Methods 31/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example (2)
􏰔 In nonparametric approach
E {θ(F1)|F0} = EF0 xi
 3 􏰉1􏰏n 􏰊
ni=1   3
􏰉1􏰏n 􏰊 = E μ + (xi − μ)
ni=1  = μ3 + n−13μσ2 + n−2γ
􏰔 whereσ2 =E(x−μ)2 andγ=E(x−μ)3 denotethe population variance and skewness
Bootstrap Methods 32/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
􏰔 In the nonparametric case
E {θ(F2)|F1} = x ̄3 + n−13x ̄σˆ2 + n−2γˆ
􏰔 whereσˆ2 =n−1􏰎(xi −x ̄)andγˆ=n−1􏰎(xi −x ̄)3 denote the sample variance and skewness
􏰔 Therefore the bootstrap bias reduced estimate is
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = 2x ̄3 − 􏰕x ̄3 + n−13x ̄σˆ2 + n−2γˆ􏰖
= x ̄3 − n−13x ̄σˆ2 − n−2γˆ
Bootstrap Methods 33/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
􏰔 If the population is normal N(μ, σ2), γ = 0 and E {θ(F1)|F0} = μ3 + n−13μσ2
􏰔 The maximum likelihood could be used to estimate λˆ = 􏰕 x ̄ , σˆ 2 􏰖
􏰔 θ(F2) is the statistic θˆ computed for a sample from a normal 􏰕x ̄,σˆ2􏰖 distribution and in direct analogy we have
E { θ ( F 2 ) | F 1 } = x ̄ 3 + n − 1 3 x ̄ σˆ 2
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = x ̄3 − n−13x ̄σˆ2
􏰔 Therefore
Bootstrap Methods 34/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
􏰔 If the population is exponential with mean μ and −1 􏰃x􏰄
fμ(x)=μ exp −μ forx>0 􏰔 hereσ2 =μ2 andγ=2μ3
E {θ(F1)|F0} = μ3 􏰕1 + 3n−1 + 2n−2􏰖 􏰔 Taking the maximum likelihood estimate x ̄ for μ
E {θ(F2)|F1} = x ̄3 􏰕1 + 3n−1 + 2n−2􏰖
θˆ1 = 2θ(F1) − E {θ(F2)|F1} = x ̄3 􏰕1 − 3n−1 − 2n−2􏰖
􏰔 Therefore
Bootstrap Methods 35/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
􏰔 The estimate θˆ represent improvement in the sense of bias 1
reduction on the basic bootstrap estimate θˆ = θ(F1) 􏰔 To check the bias reduction observe that for general
distributions with finite third moments
E(x ̄3) = μ3 + n−13μσ2 + n−2γ
E(x ̄σˆ2) = μσ2 + n−1(γ − μσ2) − n−2γ E ( γˆ ) = γ 􏰕 1 − 3 n − 1 + 2 n − 2 􏰖
Bootstrap Methods 36/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Example
􏰔 In the case of general population
E(θˆ1) − θ0 = n−23(μσ2 − γ) + n−36γ − n−42γ
􏰔 In the case of normal population
E(θˆ1) − θ0 = n−23μσ2
􏰔 In the case of exponential population
E(θˆ1) − θ0 = −μ3(9n−2 + 12n−3 + 4n−4)
Bootstrap Methods 37/38

§6.1 Introduction §6.2 Bootstrap principle §6.3 Parametric vs Nonparametric §6.4 Bias correction
Remarks
􏰔 Therefore bootstrap bias reduction has diminished bias to at most O 􏰕n−2􏰖 in each case.
􏰔 This is compared with the bias of θˆ which is of size n−1 unless μ = 0.
􏰔 Bootstrap bias correction reduces the order of magnitude of the bias by the factor n−1.
Bootstrap Methods 38/38