CS代写 MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022

1 Order statistics 1.1 Introduction
Definition (recap)
• Sample: X1,…,Xn
• Arrange them in increasing order:

Copyright By PowCoder代写 加微信 powcoder

Order statistics, quantiles & resampling
(Module 10)
Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022
1 Order statistics 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Samplingdistribution ……………………………………… 3
2 Quantiles 6
2.1 Definitions……………………………………………. 6
2.2 Asymptoticdistribution …………………………………….. 10
2.3 Confidenceintervalsforquantiles………………………………… 10
3 Resampling methods 12
Aims of this module
• Go back to order statistics and sample quantiles
• More detailed definitions
• Derive sampling distributions and construct confidence intervals • See examples of CIs that are not of the form θˆ ± se(θˆ)
• Learn some more distribution-free methods
• See how to use computation to avoid mathematical derivations
Unifying theme
• Use the data ‘directly’ rather than via assumed distributions
• Use the sample cdf and related summaries (such as order statistics)
X(1) = Smallest of the Xi X(2) = 2nd smallest of the Xi
X(n) = Largest of the Xi

• These are called the order statistics
• X(k) is called the kth order statistic of the sample • X(1) is the minimum or sample minimum
• X(n) is the maximum or sample maximum
Motivating example
• TakeiidsamplesX∼N(0,1)ofsizen=9
• What can we say about the order statistics, X(k)? • Simulated values:
[,1] [,2] [,3] [,4] [,5]
[1,] -0.76 -1.94 -1.32 -0.85 -1.96 <-- Minimum [2,] -0.32 -0.17 -0.53 -0.30 -0.98 [3,] -0.23 0.06 -0.44 0.14 -0.83 [4,] 0.05 0.18 -0.10 0.25 -0.63 [5,] 0.08 0.76 0.17 0.35 -0.47 <-- Median [6,] 0.18 0.96 0.26 0.68 0.05 [7,] 0.27 1.07 0.60 0.69 0.34 [8,] 0.73 1.42 0.66 1.13 1.26 [9,] 0.91 1.77 1.93 1.98 1.26 <-- Maximum Standard normal distribution, n = 9 X(1) 􏰀X(2) 􏰀···􏰀X(n) Order statistics k = 1 (min) k = 5 (median) k = 9 (max) −3 −2 −1 0 1 2 3 Probability density 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Sampling distribution Example (triangular distribution) • Random sample: X1,...,X5 with pdf f(x) = 2x, 0 < x < 1 • Calculate Pr(X(4) 􏰀 0.5) • Occurs if at least four of the Xi are less than 0.5, Pr(X(4) 􏰀 0.5) = Pr(at least 4 Xi’s less than 0.5) = Pr(exactly 4 Xi’s less than 0.5) + Pr(exactly 5 Xi’s less than 0.5) • This is a binomial with 5 trials and probability of success given by • So we have, • More generally we have, Pr(X(4) 􏰀 0.5) = 4 0.25 0.75 + 0.25 = 0.0156 ′ 􏰌5􏰍23 2 g(x)=G(x)= 4 4(x ) (1−x )(2x) =4 4 F(x) (1−F(x))f(x) • Taking derivatives gives the pdf, since we know that F(x) = x2. Triangular distribution, n = 5 Pr(Xi 􏰀 0.5) = 􏰑 0.5 0.5 2xdx = 􏰈x2􏰉 = 0.52 = 0.25 􏰑xx 2tdt=􏰈t2􏰉 =x2 F(x)=Pr(Xi 􏰀x)= G(x)=Pr(X(4) 􏰀x)= 4 (x ) (1−x )+(x ) Order statistics k =4 0.0 0.2 0.4 0.6 0.8 1.0 Probability density Distribution of X(k) • Sample from a continuous distribution with cdf F(x) and pdf f(x) = F′(x). • The cdf of X(k) is, • Thus the pdf of X(k) is, and similarly Gk(x) = Pr(X(k) 􏰀 x) 􏰃n􏰌n􏰍 i n−i i F(x) (1−F(x)) gk(x)=G′k(x)=􏰃i i F(x)i−1(1−F(x))n−if(x) + 􏰃(n − i) i i=k =k k F(x) (1−F(x)) f(x) −􏰃(n−i) i F(x) (1−F(x)) 􏰌n􏰍 n! 􏰌n−1􏰍 i i = (i−1)!(n−i)! =n i−1 􏰌n􏰍 n! 􏰌n−1􏰍 (n−i) i =i!(n−i−1)!=n i i n−i−1 F(x) (1 − F(x)) 􏰌n􏰍 i−1 n−i i i F(x) (1−F(x)) f(x) which allows some cancelling of terms. • For example, the first term of the first summation is, 􏰌n􏰍 k n−k−1 • The first term of the second summation is, (n−k) k F(x) (1−F(x)) =n k F(x) (1−F(x)) • These cancel, and similarly the other terms do as well. • Hence, the pdf simplifies to, • Special cases: minimum and maximum, g1 (x) = n (1 − F (x))n−1 f (x) gn(x) = nF(x)n−1f(x) Pr(X(1) >x)=(1−F(x))n Pr(X(n) 􏰀 x) = F (x)n
(k+1) k+1 F(x) (1−F(x)) f(x)
􏰌n−1􏰍 k n−k−1 =n k F(x) (1−F(x))
􏰌n􏰍 k 􏰌n−1􏰍 k
gk(x) = k k F(x) (1 − F(x))

Alternative derivation of the pdf of X(k) • Heuristically,
Pr(X(k) ≈x)=Pr(x−12dyx+21dy)≈1−F(x)
gk(x)dy ≈ n! F(x)k−1 (1−F(x))n−k f(x)dy
(k − 1)! 1! (n − k)! • Dividing both sides by dy gives the pdf of X(k)
Example (boundary estimate)
• X1,…,X4 ∼ Unif(0,θ) • Likelihood is
• So the MLE X(4) is biased • (But 45 X(4) is unbiased)
􏰑θ 4×3 􏰎4×5􏰏θ 4 x θ4 dx = 5θ4 = 5θ
• Maximised when θ is as small as possible, so θˆ = max(Xi) = X(4) • Now,
otherwise (i.e. if θ < xi for some i) 􏰊x􏰋3 􏰌1􏰍 4x3 g4(x)=4 θ θ = θ4 , 0􏰀x􏰀θ • Deriving a one-sided CI for θ based on X(4): 1. Foragiven0 pbinom(7, size = 9, prob = 0.5) –
+ pbinom(1, size = 9, prob = 0.5)
[1] 0.9609375
• So a 96.1% confidence interval for m is (19.0, 30.1)
Confidence intervals for arbitrary quantiles
• Argument can be extended to any quantile and any order statistics, • For example, the ith and jth,
1−α = Pr(X(i) < πp < X(j)) =Pr(i􏰀W 􏰀j−1) = 1−0.55 −0.55 = 15 ≈ 0.94 16 • Need to use computed binomial probabilities (e.g. R) to determine i and j • Or use the normal approximation to the binomial • Note that these confidence intervals do not arise from pivots and cannot achieve 95% confidence exactly pk(1 − p)n−k Example (income distribution) Incomes (in $100’s) for a sample of 27 people, in ascending order: 161, 169, 171, 174, 179, 180, 183, 184, 186, 187, 192, 193, 196, 200, 204, 205, 213, 221, 222, 229, 241, 243, 256, 264, 291, 317, 376 Want to estimate the first quartile, π0.25 W is the number of the X’s below π0.25 W ∼ Bi(27,0.25) ≈ N(μ = 27/4 = 6.75, σ2 = 81/16) This gives Pr(X(4) < π0.25 < X(10)) =Pr(4􏰀W 􏰀9) = Pr(3.5 < W < 9.5) 􏰌9.5 − 6.75􏰍 (continuity correction) 􏰌3.5 − 6.75􏰍 =Φ 9/4 −Φ 9/4 = 0.815 So ($17 400, $18 700) is an 81.5% CI for the first quartile Resampling methods Resampling • What if maths is too hard? • Try a resampling method • Replaces mathematical derivation with brute force computation • Used for approximating sampling distributions, standard errors, bias, etc. • Sometimes work brilliantly, sometimes not at all • Most popular resampling method: the bootstrap • Basic idea: – Use the sample cdf as an approximation to the true cdf – Simulate new data from the sample cdf – Equivalent to sampling with replacement from the actual data • Use these bootstrap samples to infer sampling distributions of statistics of interest • This is an advanced topic • Only a ‘taster’ is presented. . . • ...in the lab (week 11) 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com