留学生辅导 STAT3023: Statistical Inference

The University of of Mathematics and Statistics
Solutions to Tutorial Week 11
STAT3023: Statistical Inference
Lecturers: and

Copyright By PowCoder代写 加微信 powcoder

Semester 2, 2021
1. Suppose X ∼ B(n,θ) and that d(X) is the Bayes procedure based on a U[θ0,θ1] prior under
squared-error loss. Suppose also that for all θ0 < θ < θ1, 􏰎􏰔 ̃ 􏰕2􏰏 lim nEθ d(X)−θ →θ(1−θ). n→∞ Use the Asymptotic Minimax Lower Bound Theorem to show that the maximum likelihood esti- mator of θ is asymptotically minimax (over any interval [a, b] for 0 < a < b < 1). Solution: Using the information above, according to the AMLB theorem, for any procedure (sequence) {dn(·)} and 0 < a < b < 1, lim max nE 􏰂[d (X)−θ]2􏰃≥ max θ(1−θ). n→∞ a≤θ≤b θ n a≤θ≤b The maximum likelihood estimator θˆML = X/n is unbiased so the risk is just the variance: 􏰎􏰔ˆ 􏰕2􏰏 􏰊ˆ 􏰋 􏰌X􏰍 1 nθ(1−θ) θ(1−θ) Eθ θML − θ = Varθ θML = Varθ n = n2 Varθ(X) = n2 = n Thus the maximum (rescaled) risk over [a, b] ⊂ [0, 1] is 􏰎􏰔ˆ 􏰕2􏰏 􏰌θ(1−θ)􏰍 maxnEθ θML−θ = maxn = maxθ(1−θ) a≤θ≤b a≤θ≤b n a≤θ≤b which attains the lower bound (1) above. Thus θˆML is asymptotically minimax. 2. Suppose X = (X1, . . . , Xn) consists of iid random variables with a gamma distribution with known shape α0 but unknown scale parameter θ = Θ = (0, ∞). Consider the decision problem where the decision space is D = Θ and loss is L(d|θ) = (d − θ)2. Write T = 􏰑ni=1 Xi. (a) Define the family of estimators {dkl(·): k,l ∈ R} according to dkl(X)= T+k . R(θ|d )=E 􏰂[d (X)−θ]2􏰃. Solution: Firstly, T has a gamma distribution with shape parameter nα0 and scale parameter θ. Thus Determine the risk Eθ (T) = nα0θ; Varθ (T) = nα0θ2 . Eθ [dkl(X)] = Eθ(T) + k = nα0θ + k ; nα0 + l nα0 + l Biasθ[dkl(X)]=Eθ[dkl(X)]−θ=nα0θ+k−θ(nα0+l)= k−lθ ; nα0 + l nα0 + l Varθ(T) nα0θ2 Varθ [dkl(X)] = (nα0 + l)2 = (nα0 + l)2 . Copyright © 2021 The University of Sydney 1 Thus the risk is 􏰂 2􏰃 2 nα0θ2 + (k − lθ)2 R(θ|dkl) = Eθ [dkl(X) − θ] = Varθ [dkl(X)] + {Biasθ [dkl(X)]} = (nα0 + l)2 . (b) Determine dflat(X), the Bayes procedure using the “flat prior” w(θ) ≡ 1. Solution: The likelihood is 􏰝n 􏰖Xα0−1e−Xi/θ 􏰗 i=1 θα0 Γ(α0) e−T/θ θnα0 Tnα0−1e−T/θ θ(nα0−1)+1Γ(nα0 − 1) so the posterior is the Inverse Gamma(nα0 − 1, T ) distribution (this is the distribution of 1/Y where Y is gamma with shape nα0 − 1 and rate T ). The Bayes procedure is thus the posterior mean (since we are using squared-error loss), which is dflat(X) = T nα0 − 2 (c) Show that for any k,l ∈ R, dkl(X) is asymptotically minimax. You may assume that for 􏰓 any 0 ≤ θ0 < θ1 < ∞, the Bayes procedure d(X) based on the U [θ0, θ1] prior has the same limiting (rescaled) risk as dflat(X): for all θ0 < θ < θ1, lim nR(θ|d)= lim nR(θ|dflat). (2) n→∞ 􏰓 n→∞ Solution: First we need to determine the RHS of (2) above. Note that dflat(X) is a special case of dkl(X) examined in part (a) above, corresponding to k = 0 and l = −2. Therefore we can read the exact risk from that part as so as n → ∞, R(θ|dkl) = nα0θ2 + (2θ)2 (nα0 − 2)2 n2α0θ2 4nθ2 nR(θ|dflat) = (nα0 − 2)2 + (nα0 − 2)2 n2α0θ2 4nθ2 = 􏰊 􏰋2+ 􏰊 􏰋2 n2α2 1− 2 0 nα0 n2α2 1− 2 0 nα0 4θ2 α0 1− 2 nα02 1− 2 → θ2 →0 α0 =􏰊 􏰋2+􏰊 􏰋2 􏰦 􏰥􏰤 􏰧􏰦 􏰥􏰤 􏰧 So for any procedure (sequence) {dn(·)}, by the AMLB Theorem (and the assumption (2) given above), lim max nR(θ|dn) ≥ max S(θ) = . (3) n→∞ a≤θ≤b a≤θ≤b α0 Finally, we need to derive the limiting maximum (rescaled) risk of dkl: max nR(θ|dkl) = max n 􏰎nα0θ2 +(k−lθ)2􏰏 2 a≤θ≤b ≤ max n2α0θ2 n(k − lθ)2 2 + max 2 α0 􏰄α0 + l􏰅2 + 1 max (k − lθ)2 n􏰄α0 + l􏰅2 a≤θ≤b a≤θ≤b (nα0 + l) a≤θ≤b (nα0 + l) 􏰦 􏰥􏰤 􏰧 􏰦 􏰥􏰤 􏰧 which attains the lower bound (3) above. Thus for each fixed k,l, dkl is asymptotically minimax. (d) Show that (i) the maximum likelihood estimator; (ii) dflat (X); (iii) any Bayes procedure based on an Inverse Gamma (conjugate) prior are all asymptotically minimax. Solution: The derivative of the log-likelihood with respect to θ is l′(θ;X)=−nα0 + T ; θ θ2 setting equal to zero and solving gives ˆ T X ̄ θML = nα = α . For any fixed γ0, λ0 > 0, taking as prior the Inverse Gamma(γ0, λ0) density
θγ0+1Γ(γ0)
e−(T+λ0)/θ (T +λ0)nα0+γ0e−(T+λ0)/θ
and is the normalising constant in the beta(α,β) density: xα−1(1 − x)β−1
λγ0 e−λ0 /θ w(θ)= 0
the product of the prior and the likelihood
w(θ)fθ(X) = const. θnα0+γ0+1 = const. θnα0+γ0+1Γ(nα0 + γ0)
so the posterior density is the Inverse Gamma(nα0 +γ0,T +λ0) density. The corresponding Bayes procedure is the posterior mean, which is
T + λ0 . nα0 +γ0 −1
Note that this, dflat(X) and θˆML are all special cases of dkl(X) and thus by the previous part are all asymptotically minimax.
3. The beta function is given by beta(α, β) =
􏰟 1 Γ(α)Γ(β)
xα−1(1 − x)β−1 dx = Γ(α + β)
(where Γ(α) = 􏰨 ∞ xα−1 e−x dx is the gamma function, satisfying Γ(α + 1) = αΓ(α), for all α > 0),
fX(x) = beta(α,β) for 0 < x < 1. Suppose X has the density fX(·) above, and then define Y = 1/X. (a) For α > 1, determine E(Y ).
􏰟 1 E(Y)=E(X−1)= x−1
xα−1(1 − x)β−1 beta(α, β)
􏰨 1 x(α−1)−1(1 − x)β−1 0
beta(α, β) beta(α−1,β)
beta(α, β) Γ(α−1)Γ(β)Γ(α+β) Γ(α + β − 1) Γ(α)Γ(β) Γ(α−1) Γ(α+β)
Γ(α) Γ(α + β − 1) =α+β−1,
using the property Γ(α + 1) = αΓ(α) twice. 3

(b) Determine the density of Y .
Solution: Note that since 0 < X < 1, 1 < Y < ∞. Using the “CDF method”, the CDF ofY is,fory>1,
FY(y)=P(Y ≤y)=P(X−1 ≤y)=P(X≥y−1)=1−P(X 2 this is finite (this corresponds to α > 1 in the previous question).
(f) Show that
(i) the maximum likelihood estimator;
(ii) dflat (X);
(iii) any Bayes procedure based on a (conjugate) prior of the form 1 (θ − 1)β0−1
w(θ) = beta(α0, β0) θα0+β0 , for θ > 1 (4) are all asymptotically minimax. You may assume that for any 1 < θ0 < θ1 < ∞, the Bayes procedure d(X) based on the U [θ0, θ1] prior has the same limiting (rescaled) risk as dflat(X): forallθ0 <θ<θ1, lim nR(θ|d)= lim nR(θ|dflat). n→∞ 􏰓 n→∞ Hint: determine the forms of all the estimators first. Solution: The form of the mle may be obtained by differentiating the log-likelihood l(θ; X) = (T − n) log(θ − 1) − T log(θ), setting to zero and solving: ′ T−nT 􏰌1 1􏰍 n 􏰌θ−(θ−1)􏰍 n l (θ; X) = θ − 1 − θ = T θ − 1 − θ − θ − 1 = T θ(θ − 1) − θ − 1 =T−n yielding θˆML = T/n = X ̄ the sample mean; this also shows that the score function is in the “nice form” indicating that X ̄ is MVU in this case. Using the conjugate prior w(θ) given at (4) above, the product of the likelihood and the prior is of the form (θ − 1)T−n+β0−1 1 (θ − 1)(T−n+β0)−1 fθ(X)w(θ)=const. θT+α0+β0 =const. beta(n+α0,T −n+β0) θ(T−n+β0)+(n+α0) , so the posterior density is that of 1/Y where Y has a beta(n + α0, T − n + β0) distribution. According to question 3 part (a), the mean of this distribution (which is also the posterior mean, i.e. the Bayes procedure) is T + α0 + β0 − 1 . n + α0 − 1 θ(θ − 1) θ − 1 n 􏰌T 􏰍 =θ(θ−1) n−θ Note that all estimators of this form, the mle θˆML and dflat(X) are all special cases of dkl(X) defined in part (b) above. It thus suffices to show that dkl(X) is asymptotically minimax for each k, l. Firstly, for any 1 < θ0 < θ1 < ∞, the Bayes procedure d(X) based on the U[θ0,θ1] prior has limiting risk equal to lim nR(θ|d)= lim nR(θ|dflat). n→∞ 􏰓 n→∞ But since dflat(X) is a special case of dkl(X) (with k = −1, l = −2), we can use the results of part (b) above to determine the limiting (rescaled) risk: 􏰎nθ(θ − 1) + (2θ − 1)2 􏰏 lim nR(θ|dflat) = lim n 2 =θ(θ−1) lim = θ(θ − 1) . 􏰌 n 􏰍2 n→∞ n−2 +(2θ−1)2 lim 2 n→∞ (n−2) term) is “asymptotically negligi- n→∞ n→∞ (n−2) Note also that the contribution from the bias (the (2θ−1)2 (n−2)2 ble” compared to the variance contribution. Thus we have lim nR(θ|d) = θ(θ − 1) = S(θ) n→∞ for all θ0 < θ < θ1. Thus according to the Asymptotic Minimax Lower Bound Theorem, for any other procedure (sequence) {dn(·)}, for any 1 < a < b, lim max nR(θ|dn)≥ max S(θ)= max θ(θ−1)=b(b−1) n→∞ a≤θ≤b a≤θ≤b a≤θ≤b since S(θ) is an increasing function for θ > 1. Now, the maximum risk of dkl(X) is
max R(θ|dkl)≤ n max θ(θ−1)+ 1 max(k−lθ)2. a≤θ≤b (n + l)2 a≤θ≤b (n + l)2 a≤θ≤b
The term (k − lθ)2 is a parabola in θ with a positive coefficient of θ2, so takes its maximum value over a ≤ θ ≤ b at one of the endpoints. So
lim n max R(θ|dkl) ≤ max θ(θ − 1) lim
􏰤 􏰧􏰦 􏰥 􏰌 n 􏰍2
+max􏰆(k−al)2,(k−bl)2􏰇 lim n n→∞ (n + l)2 􏰦 􏰥􏰤 􏰧
= max θ(θ−1)= max S(θ). a≤θ≤b a≤θ≤b
However this upper bound is also the lower bound for any estimator. Therefore for each k,l, dkl(X) is asymptotically minimax. Thus each of the estimators above is too.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com