程序代写 MAST20005) & Elements of Statistics (MAST90058)

Order statistics, quantiles & resampling
(Module 10)
Statistics (MAST20005) & Elements of Statistics (MAST90058)
School of Mathematics and Statistics University of Melbourne

Copyright By PowCoder代写 加微信 powcoder

Semester 2, 2022

Order statistics Introduction
Sampling distribution
Quantiles Definitions
Asymptotic distribution Confidence intervals for quantiles
Resampling methods

Aims of this module
• Go back to order statistics and sample quantiles
• More detailed definitions
• Derive sampling distributions and construct confidence intervals • See examples of CIs that are not of the form θˆ ± se(θˆ)
• Learn some more distribution-free methods
• See how to use computation to avoid mathematical derivations

Unifying theme
• Use the data ‘directly’ rather than via assumed distributions • Use the sample cdf and related summaries
(such as order statistics)

Order statistics Introduction
Sampling distribution
Quantiles Definitions
Asymptotic distribution Confidence intervals for quantiles
Resampling methods

Definition (recap)
• Sample: X1,…,Xn
• Arrange them in increasing order:
X(1) = Smallest of the Xi X(2) = 2nd smallest of the Xi
X(n) = Largest of the Xi
• These are called the order statistics
X(1) 􏰀 X(2) 􏰀 · · · 􏰀 X(n)
• X(k) is called the kth order statistic of the sample
• X(1) is the minimum or sample minimum
• X(n) is the maximum or sample maximum 6 of 50

Motivating example
• TakeiidsamplesX∼N(0,1)ofsizen=9
• What can we say about the order statistics, X(k)? • Simulated values:
[,1] [,2] [,3] [,4] [,5]
[1,] -0.76 -1.94 -1.32 -0.85 -1.96 <-- Minimum [2,] -0.32 -0.17 -0.53 -0.30 -0.98 [3,] -0.23 0.06 -0.44 0.14 -0.83 [4,] 0.05 0.18 -0.10 0.25 -0.63 [5,] 0.08 0.76 0.17 0.35 -0.47 <-- Median [6,] 0.18 0.96 0.26 0.68 0.05 [7,] 0.27 1.07 0.60 0.69 0.34 [8,] 0.73 1.42 0.66 1.13 1.26 [9,] 0.91 1.77 1.93 1.98 1.26 <-- Maximum Standard normal distribution, n = 9 Order statistics k = 1 (min) k = 5 (median) k = 9 (max) −3 −2 −1 0 1 2 3 Probability density 0.0 0.2 0.4 0.6 0.8 1.0 Example (triangular distribution) • Random sample: X1,...,X5 with pdf f(x) = 2x, 0 < x < 1 • Calculate Pr(X(4) 􏰀 0.5) • Occurs if at least four of the Xi are less than 0.5, Pr(X(4) 􏰀 0.5) = Pr(at least 4 Xi’s less than 0.5) = Pr(exactly 4 Xi’s less than 0.5) + Pr(exactly 5 Xi’s less than 0.5) • This is a binomial with 5 trials and probability of success given by Pr(Xi 􏰀 0.5) = • So we have, 2xdx = 􏰈x2􏰉 = 0.52 = 0.25 Pr(X(4) 􏰀 0.5) = 4 0.25 0.75 + 0.25 = 0.0156 • More generally we have, F(x)=Pr(Xi 􏰀x)= 􏰑xx 2tdt=􏰈t2􏰉 =x2 0 G(x)=Pr(X(4) 􏰀x)= 4 (x ) (1−x )+(x ) • Taking derivatives gives the pdf, g(x)=G(x)= 4 4(x ) (1−x )(2x) 􏰌5􏰍 3 =4 4 F(x) (1−F(x))f(x) since we know that F(x) = x2. Order statistics k =4 Triangular distribution, n = 5 0.0 0.2 0.4 0.6 0.8 1.0 Probability density Distribution of X(k) • Sample from a continuous distribution with cdf F (x) and pdf f(x) = F′(x). • The cdf of X(k) is, Gk(x) = Pr(X(k) 􏰀 x) 􏰃n􏰌n􏰍 i n−i i F(x) (1−F(x)) • Thus the pdf of X(k) is, n−1 􏰌n􏰍 i n−i−1 + 􏰃(n − i) i F(x) (1 − F(x)) 􏰌n􏰍 k−1 n−k gk(x)=G′k(x)=􏰃i i F(x)i−1(1−F(x))n−if(x) i=k =k k F(x) (1−F(x)) f(x) 􏰌n􏰍 i−1 n−i i i F(x) (1−F(x)) f(x) −􏰃(n−i) i F(x) (1−F(x)) f(x) 􏰌n􏰍 i n−i−1 and similarly 􏰌n􏰍 n! 􏰌n−1􏰍 i i = (i−1)!(n−i)! =n i−1 􏰌n􏰍 n! 􏰌n−1􏰍 (n−i) i =i!(n−i−1)!=n i which allows some cancelling of terms. • For example, the first term of the first summation is, 􏰌n􏰍 k n−k−1 (k+1) k+1 F(x) (1−F(x)) f(x) 􏰌n−1􏰍 k n−k−1 =n k F(x) (1−F(x)) • The first term of the second summation is, 􏰌n􏰍 k n−k−1 (n−k) k F(x) (1−F(x)) f(x) 􏰌n−1􏰍 k n−k−1 =n k F(x) (1−F(x)) f(x) • These cancel, and similarly the other terms do as well. • Hence, the pdf simplifies to, 􏰌n􏰍 k−1 n−k gk(x) = k k F(x) (1 − F(x)) f(x) • Special cases: minimum and maximum, g1 (x) = n (1 − F (x))n−1 f (x) gn(x) = nF(x)n−1f(x) Pr(X(1) >x)=(1−F(x))n Pr(X(n) 􏰀 x) = F (x)n

Alternative derivation of the pdf of X(k)
• Heuristically,
Pr(X(k) ≈x)=Pr(x−12dyx+12dy)≈1−F(x)

• Putting these together,
gk(x)dy ≈ n! F(x)k−1 (1−F(x))n−k f(x)dy
(k − 1)! 1! (n − k)!
• Dividing both sides by dy gives the pdf of X(k)

Example (boundary estimate)
• X1,…,X4 ∼ Unif(0,θ) • Likelihood is
 􏰌 1 􏰍 4 
L(θ) = θ 0
􏰊x􏰋3 􏰌1􏰍 4×3
g4(x)=4 θ θ = θ4 , 0􏰀x􏰀θ
otherwise (i.e. if θ < xi for some i) • Maximised when θ is as small as possible, so θˆ = max(Xi) = X(4) 􏰑θ 4x3 􏰎4x5􏰏θ 4 E(X(4)) = x θ4 dx = 5θ4 = 5θ 00 • So the MLE X(4) is biased • (But 54X(4) is unbiased) Uniform distribution, n = 4 Order statistics k = 4 (max) Probability density • Deriving a one-sided CI for θ based on X(4): 1. Foragiven0 pbinom(7, size = 9, prob = 0.5) –
+ pbinom(1, size = 9, prob = 0.5)
[1] 0.9609375
• So a 96.1% confidence interval for m is (19.0, 30.1) 44 of 50

Confidence intervals for arbitrary quantiles
• Argument can be extended to any quantile and any order statistics, • For example, the ith and jth,
1−α = Pr(X(i) < πp < X(j)) =Pr(i􏰀W 􏰀j−1) pk(1 − p)n−k Example (income distribution) • Incomes (in $100’s) for a sample of 27 people, in ascending order: 161, 169, 171, 174, 179, 180, 183, 184, 186, 187, 192, 193, 196, 200, 204, 205, 213, 221, 222, 229, 241, 243, 256, 264, 291, 317, 376 • Want to estimate the first quartile, π0.25 • W is the number of the X’s below π0.25 • W ∼ Bi(27,0.25) ≈ N(μ = 27/4 = 6.75, σ2 = 81/16) • This gives Pr(X(4) < π0.25 < X(10)) =Pr(4􏰀W 􏰀9) = Pr(3.5 < W < 9.5) (continuity correction) 􏰌9.5 − 6.75􏰍 􏰌3.5 − 6.75􏰍 =Φ 9/4 −Φ 9/4 = 0.815 • So ($17 400, $18 700) is an 81.5% CI for the first quartile Order statistics Introduction Sampling distribution Quantiles Definitions Asymptotic distribution Confidence intervals for quantiles Resampling methods Resampling • What if maths is too hard? • Try a resampling method • Replaces mathematical derivation with brute force computation • Used for approximating sampling distributions, standard errors, bias, etc. • Sometimes work brilliantly, sometimes not at all • Most popular resampling method: the bootstrap • Basic idea: ◦ Use the sample cdf as an approximation to the true cdf ◦ Simulate new data from the sample cdf ◦ Equivalent to sampling with replacement from the actual data • Use these bootstrap samples to infer sampling distributions of statistics of interest • This is an advanced topic • Only a ‘taster’ is presented. . . • ...in the lab (week 11) 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com