Order statistics, quantiles & resampling
(Module 10)
Statistics (MAST20005) & Elements of Statistics (MAST90058)
School of Mathematics and Statistics University of Melbourne
Copyright By PowCoder代写 加微信 powcoder
Semester 2, 2022
Order statistics Introduction
Sampling distribution
Quantiles Definitions
Asymptotic distribution Confidence intervals for quantiles
Resampling methods
Aims of this module
• Go back to order statistics and sample quantiles
• More detailed definitions
• Derive sampling distributions and construct confidence intervals • See examples of CIs that are not of the form θˆ ± se(θˆ)
• Learn some more distribution-free methods
• See how to use computation to avoid mathematical derivations
Unifying theme
• Use the data ‘directly’ rather than via assumed distributions • Use the sample cdf and related summaries
(such as order statistics)
Order statistics Introduction
Sampling distribution
Quantiles Definitions
Asymptotic distribution Confidence intervals for quantiles
Resampling methods
Definition (recap)
• Sample: X1,…,Xn
• Arrange them in increasing order:
X(1) = Smallest of the Xi X(2) = 2nd smallest of the Xi
X(n) = Largest of the Xi
• These are called the order statistics
X(1) X(2) · · · X(n)
• X(k) is called the kth order statistic of the sample
• X(1) is the minimum or sample minimum
• X(n) is the maximum or sample maximum 6 of 50
Motivating example
• TakeiidsamplesX∼N(0,1)ofsizen=9
• What can we say about the order statistics, X(k)? • Simulated values:
[,1] [,2] [,3] [,4] [,5]
[1,] -0.76 -1.94 -1.32 -0.85 -1.96 <-- Minimum
[2,] -0.32 -0.17 -0.53 -0.30 -0.98
[3,] -0.23 0.06 -0.44 0.14 -0.83
[4,] 0.05 0.18 -0.10 0.25 -0.63
[5,] 0.08 0.76 0.17 0.35 -0.47 <-- Median
[6,] 0.18 0.96 0.26 0.68 0.05
[7,] 0.27 1.07 0.60 0.69 0.34
[8,] 0.73 1.42 0.66 1.13 1.26
[9,] 0.91 1.77 1.93 1.98 1.26 <-- Maximum
Standard normal distribution, n = 9
Order statistics
k = 1 (min)
k = 5 (median) k = 9 (max)
−3 −2 −1 0 1 2 3
Probability density
0.0 0.2 0.4 0.6 0.8 1.0
Example (triangular distribution)
• Random sample: X1,...,X5 with pdf f(x) = 2x, 0 < x < 1 • Calculate Pr(X(4) 0.5)
• Occurs if at least four of the Xi are less than 0.5,
Pr(X(4) 0.5) = Pr(at least 4 Xi’s less than 0.5) = Pr(exactly 4 Xi’s less than 0.5)
+ Pr(exactly 5 Xi’s less than 0.5)
• This is a binomial with 5 trials and probability of success given by
Pr(Xi 0.5) = • So we have,
2xdx = x2 = 0.52 = 0.25
Pr(X(4) 0.5) = 4 0.25 0.75 + 0.25 = 0.0156
• More generally we have, F(x)=Pr(Xi x)=
xx 2tdt=t2 =x2
0 G(x)=Pr(X(4) x)= 4 (x ) (1−x )+(x )
• Taking derivatives gives the pdf,
g(x)=G(x)= 4 4(x ) (1−x )(2x) 5 3
=4 4 F(x) (1−F(x))f(x) since we know that F(x) = x2.
Order statistics k =4
Triangular distribution, n = 5
0.0 0.2 0.4
0.6 0.8 1.0
Probability density
Distribution of X(k)
• Sample from a continuous distribution with cdf F (x) and
pdf f(x) = F′(x). • The cdf of X(k) is,
Gk(x) = Pr(X(k) x)
nn i n−i
i F(x) (1−F(x))
• Thus the pdf of X(k) is,
n−1 n i n−i−1 + (n − i) i F(x) (1 − F(x))
n k−1 n−k
gk(x)=G′k(x)=i i F(x)i−1(1−F(x))n−if(x) i=k
=k k F(x) (1−F(x)) f(x)
n i−1 n−i
i i F(x) (1−F(x)) f(x)
−(n−i) i F(x) (1−F(x)) f(x)
n i n−i−1
and similarly
n n! n−1 i i = (i−1)!(n−i)! =n i−1
n n! n−1 (n−i) i =i!(n−i−1)!=n i
which allows some cancelling of terms.
• For example, the first term of the first summation is, n k n−k−1
(k+1) k+1 F(x) (1−F(x)) f(x)
n−1 k n−k−1 =n k F(x) (1−F(x))
• The first term of the second summation is,
n k n−k−1
(n−k) k F(x) (1−F(x)) f(x) n−1 k n−k−1
=n k F(x) (1−F(x)) f(x) • These cancel, and similarly the other terms do as well.
• Hence, the pdf simplifies to,
n k−1 n−k
gk(x) = k k F(x) (1 − F(x)) f(x) • Special cases: minimum and maximum,
g1 (x) = n (1 − F (x))n−1 f (x) gn(x) = nF(x)n−1f(x)
Pr(X(1) >x)=(1−F(x))n Pr(X(n) x) = F (x)n
Alternative derivation of the pdf of X(k)
• Heuristically,
Pr(X(k) ≈x)=Pr(x−12dy
• Putting these together,
gk(x)dy ≈ n! F(x)k−1 (1−F(x))n−k f(x)dy
(k − 1)! 1! (n − k)!
• Dividing both sides by dy gives the pdf of X(k)
Example (boundary estimate)
• X1,…,X4 ∼ Unif(0,θ) • Likelihood is
1 4
L(θ) = θ 0
x3 1 4×3
g4(x)=4 θ θ = θ4 , 0xθ
otherwise (i.e. if θ < xi for some i)
• Maximised when θ is as small as possible, so θˆ = max(Xi) = X(4)
θ 4x3 4x5θ 4 E(X(4)) = x θ4 dx = 5θ4 = 5θ
00 • So the MLE X(4) is biased
• (But 54X(4) is unbiased)
Uniform distribution, n = 4
Order statistics k = 4 (max)
Probability density
• Deriving a one-sided CI for θ based on X(4): 1. Foragiven0
+ pbinom(1, size = 9, prob = 0.5)
[1] 0.9609375
• So a 96.1% confidence interval for m is (19.0, 30.1) 44 of 50
Confidence intervals for arbitrary quantiles
• Argument can be extended to any quantile and any order statistics, • For example, the ith and jth,
1−α = Pr(X(i) < πp < X(j)) =Pr(iW j−1)
pk(1 − p)n−k
Example (income distribution)
• Incomes (in $100’s) for a sample of 27 people, in ascending order: 161, 169, 171, 174, 179, 180, 183, 184, 186,
187, 192, 193, 196, 200, 204, 205, 213, 221,
222, 229, 241, 243, 256, 264, 291, 317, 376
• Want to estimate the first quartile, π0.25 • W is the number of the X’s below π0.25
• W ∼ Bi(27,0.25) ≈ N(μ = 27/4 = 6.75, σ2 = 81/16) • This gives
Pr(X(4) < π0.25 < X(10)) =Pr(4W 9)
= Pr(3.5 < W < 9.5) (continuity correction) 9.5 − 6.75 3.5 − 6.75
=Φ 9/4 −Φ 9/4 = 0.815
• So ($17 400, $18 700) is an 81.5% CI for the first quartile
Order statistics Introduction
Sampling distribution
Quantiles Definitions
Asymptotic distribution Confidence intervals for quantiles
Resampling methods
Resampling
• What if maths is too hard?
• Try a resampling method
• Replaces mathematical derivation with brute force computation
• Used for approximating sampling distributions, standard errors, bias, etc.
• Sometimes work brilliantly, sometimes not at all
• Most popular resampling method: the bootstrap
• Basic idea:
◦ Use the sample cdf as an approximation to the true cdf
◦ Simulate new data from the sample cdf
◦ Equivalent to sampling with replacement from the actual data
• Use these bootstrap samples to infer sampling distributions of statistics of interest
• This is an advanced topic
• Only a ‘taster’ is presented. . .
• ...in the lab (week 11)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com