1 Order statistics 1.1 Introduction
Definition (recap)
• Sample: X1,…,Xn
• Arrange them in increasing order:
Copyright By PowCoder代写 加微信 powcoder
Order statistics, quantiles & resampling
(Module 10)
Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022
1 Order statistics 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Samplingdistribution ……………………………………… 3
2 Quantiles 6
2.1 Definitions……………………………………………. 6
2.2 Asymptoticdistribution …………………………………….. 10
2.3 Confidenceintervalsforquantiles………………………………… 10
3 Resampling methods 12
Aims of this module
• Go back to order statistics and sample quantiles
• More detailed definitions
• Derive sampling distributions and construct confidence intervals • See examples of CIs that are not of the form θˆ ± se(θˆ)
• Learn some more distribution-free methods
• See how to use computation to avoid mathematical derivations
Unifying theme
• Use the data ‘directly’ rather than via assumed distributions
• Use the sample cdf and related summaries (such as order statistics)
X(1) = Smallest of the Xi X(2) = 2nd smallest of the Xi
X(n) = Largest of the Xi
• These are called the order statistics
• X(k) is called the kth order statistic of the sample • X(1) is the minimum or sample minimum
• X(n) is the maximum or sample maximum
Motivating example
• TakeiidsamplesX∼N(0,1)ofsizen=9
• What can we say about the order statistics, X(k)? • Simulated values:
[,1] [,2] [,3] [,4] [,5]
[1,] -0.76 -1.94 -1.32 -0.85 -1.96 <-- Minimum
[2,] -0.32 -0.17 -0.53 -0.30 -0.98
[3,] -0.23 0.06 -0.44 0.14 -0.83
[4,] 0.05 0.18 -0.10 0.25 -0.63
[5,] 0.08 0.76 0.17 0.35 -0.47 <-- Median
[6,] 0.18 0.96 0.26 0.68 0.05
[7,] 0.27 1.07 0.60 0.69 0.34
[8,] 0.73 1.42 0.66 1.13 1.26
[9,] 0.91 1.77 1.93 1.98 1.26 <-- Maximum
Standard normal distribution, n = 9
X(1) X(2) ···X(n)
Order statistics
k = 1 (min)
k = 5 (median) k = 9 (max)
−3 −2 −1 0 1 2 3
Probability density
0.0 0.2 0.4 0.6 0.8 1.0
1.2 Sampling distribution
Example (triangular distribution)
• Random sample: X1,...,X5 with pdf f(x) = 2x, 0 < x < 1
• Calculate Pr(X(4) 0.5)
• Occurs if at least four of the Xi are less than 0.5,
Pr(X(4) 0.5) = Pr(at least 4 Xi’s less than 0.5) = Pr(exactly 4 Xi’s less than 0.5)
+ Pr(exactly 5 Xi’s less than 0.5)
• This is a binomial with 5 trials and probability of success given by
• So we have,
• More generally we have,
Pr(X(4) 0.5) = 4 0.25 0.75 + 0.25 = 0.0156
′ 523 2 g(x)=G(x)= 4 4(x ) (1−x )(2x)
=4 4 F(x) (1−F(x))f(x)
• Taking derivatives gives the pdf,
since we know that F(x) = x2.
Triangular distribution, n = 5
Pr(Xi 0.5) =
0.5 0.5 2xdx = x2
= 0.52 = 0.25
xx 2tdt=t2 =x2
F(x)=Pr(Xi x)=
G(x)=Pr(X(4) x)= 4 (x ) (1−x )+(x )
Order statistics k =4
0.0 0.2 0.4
0.6 0.8 1.0
Probability density
Distribution of X(k)
• Sample from a continuous distribution with cdf F(x) and pdf f(x) = F′(x).
• The cdf of X(k) is,
• Thus the pdf of X(k) is,
and similarly
Gk(x) = Pr(X(k) x)
nn i n−i
i F(x) (1−F(x))
gk(x)=G′k(x)=i i F(x)i−1(1−F(x))n−if(x)
+ (n − i) i i=k
=k k F(x) (1−F(x)) f(x)
−(n−i) i F(x) (1−F(x))
n n! n−1 i i = (i−1)!(n−i)! =n i−1
n n! n−1 (n−i) i =i!(n−i−1)!=n i
i n−i−1 F(x) (1 − F(x))
n i−1 n−i
i i F(x) (1−F(x)) f(x)
which allows some cancelling of terms.
• For example, the first term of the first summation is,
n k n−k−1
• The first term of the second summation is,
(n−k) k F(x) (1−F(x))
=n k F(x) (1−F(x)) • These cancel, and similarly the other terms do as well.
• Hence, the pdf simplifies to,
• Special cases: minimum and maximum,
g1 (x) = n (1 − F (x))n−1 f (x) gn(x) = nF(x)n−1f(x)
Pr(X(1) >x)=(1−F(x))n Pr(X(n) x) = F (x)n
(k+1) k+1 F(x) (1−F(x)) f(x)
n−1 k n−k−1 =n k F(x) (1−F(x))
n k n−1 k
gk(x) = k k F(x) (1 − F(x))
Alternative derivation of the pdf of X(k) • Heuristically,
Pr(X(k) ≈x)=Pr(x−12dy
gk(x)dy ≈ n! F(x)k−1 (1−F(x))n−k f(x)dy
(k − 1)! 1! (n − k)! • Dividing both sides by dy gives the pdf of X(k)
Example (boundary estimate)
• X1,…,X4 ∼ Unif(0,θ) • Likelihood is
• So the MLE X(4) is biased • (But 45 X(4) is unbiased)
θ 4×3 4×5θ 4 x θ4 dx = 5θ4 = 5θ
• Maximised when θ is as small as possible, so θˆ = max(Xi) = X(4) • Now,
otherwise (i.e. if θ < xi for some i)
x3 1 4x3
g4(x)=4 θ θ = θ4 , 0xθ
• Deriving a one-sided CI for θ based on X(4): 1. Foragiven0
+ pbinom(1, size = 9, prob = 0.5)
[1] 0.9609375
• So a 96.1% confidence interval for m is (19.0, 30.1)
Confidence intervals for arbitrary quantiles
• Argument can be extended to any quantile and any order statistics, • For example, the ith and jth,
1−α = Pr(X(i) < πp < X(j)) =Pr(iW j−1)
= 1−0.55 −0.55 = 15 ≈ 0.94 16
• Need to use computed binomial probabilities (e.g. R) to determine i and j
• Or use the normal approximation to the binomial
• Note that these confidence intervals do not arise from pivots and cannot achieve 95% confidence exactly
pk(1 − p)n−k
Example (income distribution)
Incomes (in $100’s) for a sample of 27 people, in ascending order:
161, 169, 171, 174, 179, 180, 183, 184, 186, 187, 192, 193, 196, 200, 204, 205, 213, 221, 222, 229, 241, 243, 256, 264, 291, 317, 376
Want to estimate the first quartile, π0.25
W is the number of the X’s below π0.25
W ∼ Bi(27,0.25) ≈ N(μ = 27/4 = 6.75, σ2 = 81/16) This gives
Pr(X(4) < π0.25 < X(10)) =Pr(4W 9)
= Pr(3.5 < W < 9.5) 9.5 − 6.75
(continuity correction) 3.5 − 6.75
=Φ 9/4 −Φ 9/4 = 0.815
So ($17 400, $18 700) is an 81.5% CI for the first quartile
Resampling methods
Resampling
• What if maths is too hard?
• Try a resampling method
• Replaces mathematical derivation with brute force computation
• Used for approximating sampling distributions, standard errors, bias, etc. • Sometimes work brilliantly, sometimes not at all
• Most popular resampling method: the bootstrap • Basic idea:
– Use the sample cdf as an approximation to the true cdf
– Simulate new data from the sample cdf
– Equivalent to sampling with replacement from the actual data
• Use these bootstrap samples to infer sampling distributions of statistics of interest • This is an advanced topic
• Only a ‘taster’ is presented. . .
• ...in the lab (week 11)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com