CS计算机代考程序代写 Chapter 6 Maximum Likelihood Methods

Chapter 6 Maximum Likelihood Methods
6.2 Rao–Crame ́r Lower Bound and Efficiency
1/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Outline
1 Fisher information, from slide 3 to slide 10.
2 Rao–Crame ́r inequality, from slide 11 to slide 18.
3 Explanation of Fisher information, from slide 19 to slide 24.
4 Proof of Rao–Crame ́r inequality, from slide 25 to slide 28.
5 Asymptotic properties of MLE, from slide 29 to slide 33.
2/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Fisher Information
3/33

Definition of Fisher Information
􏰉 Suppose X = {X1, . . . , Xn} are iid random variables drawn from f(x; θ), and the observed data are x1, . . . , xn. The Fisher information contained in the n samples are
􏰆d2 􏰇 In(θ) = E −dθ2 logf(X;θ) .
􏰉 If we have only one sample, the Fisher information is 􏰆d2 􏰇
which says
I1(θ) = E −dθ2 logf(X1;θ) , In(θ) = nI1(θ).
􏰉 Remark: This is not the original definition of the Fisher information. It provides a fast computation of the Fisher information. The original definition will be discussed in slide 22.
4/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Example (6.2.1): Information for Bernoulli R.V.s
Let X1, . . . , Xn ∼ Bern(θ). Compute the Fisher information In(θ). Solution:Step1: computethesecond-orderderivative:
log f (x; θ) = x log θ + (1 − x) log(1 − θ), ∂logf(x;θ) = x − 1−x,
∂θ θ1−θ ∂2logf(x;θ)=−x− 1−x.
∂θ2 θ2 (1−θ)2
Step2: computetheexpectationofnegative2nd-orderderivative:
􏰆−X 1−X 􏰇 I1(θ)=−E θ2 −(1−θ)2
=θ+1−θ=1+1=1. θ2 (1−θ)2 θ (1−θ) θ(1−θ)
Step3: In(θ)=nI1(θ)= n . θ(1−θ)
5/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Example (6.2.1) cont’d
􏰉 If X1, . . . , X160 are iid Bern(0.2), what is the value of Fisher information?
􏰉 Answer:
􏰉 What is the mle of the Fisher information?
􏰉 Answer:ThemleofθisX ̄,sothemleofIn(θ)is ̄ n ̄ . X(1−X)
160 = 1000. 0.2 × 0.8
6/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Example: Information for Poisson R.V.s
Let X1, . . . , Xn ∼ Pois(λ). Compute the Fisher information In(λ). Solution:Step1: computethesecond-orderderivative:
−λ λx f(x;λ)=e x!,
log f (x; λ) = −λ + x log(λ) − log(x!), d logf(x;λ)=−1+x1,
dλ λ d2 1
−dλ2 logf(x;λ)=xλ2.
Step2: computetheexpectationofnegative2nd-orderderivative:
􏰆 d2 􏰇 EX 1 I1(λ)=E −dλ2logf(X;λ) = λ2 =λ.
Step3: In(θ)=nI1(θ)=n.
λ
See the efficiency in slide 17.
7/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Example: Information for Normal R.V.s
Let X1, . . . , Xn ∼ N(μ, σ2). Suppose σ is known. Compute the Fisher information In(μ).
Solution:
Step1: computethesecond-orderderivative:
1 − (x−μ)2
f(x;μ)=√
logf(x;μ) = log √2πσ2 − 2σ2 ,
e 2·σ2 ,
􏰍 1 􏰎 (x−μ)2
2π · σ2
d logf(x;μ)=x−μ, dμ σ2
d2 1 −dμ2 logf(x;μ)= σ2.
8/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Example: Information for Normal R.V.s (cont’d)
Step 2: compute the expectation of negative 2nd-order derivative: 􏰆d2 􏰇1
I1(μ)=E −dμ2logf(X|μ) =σ2.
Step3: In(μ)=nI1(μ)= n.
See efficiency in slide 18.
σ2
9/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Example: Information for Normal R.V.s
Let X1, . . . , Xn ∼ N(μ, σ2). Suppose μ is known. Compute the Fisher information In(σ2).
Solution: Step 1: For convenience, let θ = σ2:
1 − (x−μ)2
f(x;θ) = √
logf(x;θ)=log √2π −2log(θ)− 2θ ,
e 2θ ,
􏰍1􏰎1 (x−μ)2
2π · θ
1 (x−μ)2 d2 1 (x−μ)2
d
dθ logf(x;θ) = −2θ + 2θ2
,
−dθ2 logf(x;θ)=−2θ2 + θ3
􏰆d2 􏰇1θ1
Step3: In􏰀σ2􏰁=In(θ)= n . 2σ4
.
Step2: E−dθ2logf(X;θ) =−2θ2+θ3=2θ2.
10/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Rao–Crame ́r Lower Bound
11/33

Regularity Conditions for Rao–Crame ́r Lower Bound
(R0) The pdfs are distinct; i.e., θ ̸= θ′ → f(xi; θ) ̸= f(xi; θ′).
(R1) The pdfs have common support for all θ.
(R2) The true parameter θ0 is an interior point in Ω.
(R3) The pdf f (x; θ) is twice differentiable as a function of θ.
(R4) The integral 􏰦 f (x; θ)dx can be differentiated twice under the integral sign as a function of θ.
Remark: with those conditions, we can interchange integration and differentiation with respect to θ.
12/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Theorem (6.2.1): Rao–Crame ́r Lower Bound
Let X1,··· ,Xn be i.i.d with common pdf f(x;θ) for θ ∈ Ω. Assume some regularity conditions hold, i.e., (R0) – (R4). Let
Y =y(X1,···,Xn)beastatisticwithmeanE(Y)=k(θ).Then
[k′(θ)]2 Var(Y)≥ nI1(θ).
13/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Corollary (6.2.1)
Under the assumption of Theorem 6.2.1, if Y = y(X1, · · · , Xn) is an unbiased estimator of θ, so Rao-Crame ́r inequality becomes
Var(Y)≥ 1 = 1 . In(θ) nI1(θ)
The variance of an unbiased estimator of θ is no less than the inverse of the Fisher information.
14/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Application of Rao–Crame ́r Inequality
􏰉 Consider the Bernoulli model with probability of success θ which was treated in Example 6.2.1.
􏰉 ForθˆsuchthatEθˆ=θ,
Var(θˆ) ≥ θ(1 − θ).
n
􏰉 This is because I1(θ) = [θ(1 − θ)−1], and the Rao–Crame ́r
inequality gives
􏰉 Consider θˆ MLE
Var(θˆ)≥ 1 =θ(1−θ). In(θ) n
= X ̄, then Eθˆ = θ. It is unbiased. MLE
􏰉 We also Var(θˆ MLE
) = θ(1 − θ)/n, so in this case the variance of the mle attains the Rao–Crame ́r inequality.
􏰉 We say X ̄ has the minimum variance among all the unbiased estimators of θ.
15/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Definition of Efficient Estimator
Let Y be an unbiased estimator of a parameter θ in the case of estimation. The statistic Y is called an efficient estimator of θ if and only if the variance of Y attains the Rao-Crame ́r lower bound.
Remark: Y is efficient if Y is unbiased and it has the minimum variance among all the unbiased estimator of θ.
16/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Example (6.2.3): Efficient Estimator for Poission(λ)
Let X1, . . . , Xn ∼ Pois(λ). Find an efficient estimator of λ. Solution:
􏰉 The Fisher information is n. See slide 7. λ
􏰉 Rao-Crame ́r lower bound: λ. n
􏰉 Consider X ̄. It is unbiased, and its variance is λ. n
􏰉 Thus X ̄ is an efficient estimator of λ.
17/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Example (6.2.3): Efficient Estimator for N(μ, σ2)
Let X1, . . . , Xn ∼ N(μ, σ2). Suppose σ is known. Find an efficient estimator of μ.
Solution:
􏰉 The Fisher information is n . See slide 8. σ2
􏰉 ́ σ2 Rao-Cramerlowerbound: n.
􏰉 ̄ σ2 Consider X. It is unbiased, and its variance is n .
􏰉 Thus X ̄ is an efficient estimator of μ.
18/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Explanation of the Fisher Information
19/33

􏰉 Consider X ∼ f (x; θ). Let us derive the following fast-computation formula of Fisher information:
􏰉 We begin with
􏰆d2 􏰇 I1(θ) = E −dθ2 logf(X;θ) .
􏰒∞
1= f(x;θ)dx,
0=
−∞
􏰒 ∞ ∂f(x;θ)
􏰉 The last expression can be re-written as
0 =
−∞
∂θ dx.
􏰒 ∞ ∂f(x;θ)/∂θ
f(x;θ) f(x;θ)dx, 􏰒 ∞ ∂logf(x;θ)
−∞
0= ∂θ f(x;θ)dx. −∞
20/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

􏰉 The important function
∂logf(X;θ) ∂θ
is called the score function. 􏰉 We have established:
from
􏰆∂logf(X;θ)􏰇
E ∂θ =0
􏰒 ∞ ∂logf(x;θ)
0= ∂θ f(x;θ)dx.
−∞
􏰉 Let us differentiate the last equation again.
21/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

􏰉 It follows that
􏰒∞ ∂2logf(x;θ)
0= ∂θ2 f(x;θ)dx −∞
􏰒 ∞ ∂logf(x;θ)∂logf(x;θ)
+ ∂θ ∂θ f(x;θ)dx.
−∞
􏰉 We observe that
􏰆∂2 logf(X;θ)􏰇 􏰙􏰍∂logf(X;θ)􏰎2􏰚
0 =E ∂θ2 + E ∂θ 􏰉 Define Fisher information:
􏰙􏰍∂logf(X;θ)􏰎2􏰚 I1(θ) = E ∂θ .
􏰉 The fast-computation formula that we have used:
􏰆∂2 logf(X;θ)􏰇 I1(θ) = −E ∂θ2 .
22/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

􏰉 How to interpret
I1(θ) = E ∂θ ?
􏰙􏰍∂logf(X;θ)􏰎2􏰚 􏰆∂logf(X;θ)􏰇
􏰉 Recall
􏰉 Thus Fisher information is the variance of the random variable
E ∂θ =0. (∗) ∂ log f (X ;θ) , i.e., score function:
∂θ
􏰍∂logf(X;θ)􏰎 I1(θ) = Var ∂θ .
􏰉 Remark: The equation determines the estimating equations for the mle, i.e., θˆMLE solves
􏰄n ∂logf(xi;θ)=0. i=1 ∂θ
The equation (∗) will be also used in slide 28.
23/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

􏰉 Let X1,…,Xn be a random sample from f(x;θ). The likelihood L(θ) is the pdf of the random sample:
∂logL(θ,X)=􏰄n ∂logf(Xi;θ). ∂θ i=1 ∂θ
􏰉 Thus the Fisher information contained in the n samples are 􏰍∂logL(θ,X)􏰎
In(θ) = Var ∂θ
􏰐n ∂logf(Xi;θ)􏰑 =Var 􏰄
i=1 ∂θ
􏰄n 􏰍∂ log f (Xi; θ)􏰎
= Var ∂θ i=1
= nI1(θ).
􏰉 Remark: The iid assumption is the key.
Our derivation is for the continuous case, but the discrete case can be handled in a similar manner.
24/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Proof of Rao–Crame ́r Lower Bound
25/33

Theorem (6.2.1): Rao–Crame ́r Lower Bound
Let X1,··· ,Xn be i.i.d with common pdf f(x;θ) for θ ∈ Ω. Assume some regularity conditions hold, i.e., (R0) – (R4). Let
Y =y(X1,···,Xn)beastatisticwithmeanE(Y)=k(θ).Then
[k′(θ)]2 Var(Y)≥ nI(θ).
26/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

􏰉 The proof is for the continuous case, but the proof for the discrete case is quite similar.
􏰉 For Y = y(X1,··· ,Xn), its mean is
E(Y ) = k(θ) 􏰒∞ 􏰒∞
= … y(x1,…,xn)f (x1;θ)···f (xn;θ)dx1 ···dxn. −∞ −∞
􏰉 Differentiating with respect to θ, 􏰒∞ 􏰒∞
k′(θ) = … y(x1,x2,…,xn) −∞ −∞
􏰄
􏰙n 1 ∂f(xi;θ)􏰚
i=1 f(xi;θ) ∂θ
× f (x1; θ) · · · f (xn; θ) dx1 · · · dxn
􏰒∞ 􏰒∞ = …
−∞ −∞
􏰙n ∂logf(xi;θ)􏰚 y(x1,x2,…,xn) 􏰄
× f (x1; θ) · · · f (xn; θ) dx1 · · · dxn.
i=1 ∂θ
27/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

􏰉 Define
Z=􏰄n ∂logf(xi;θ). i=1 ∂θ
􏰉 Then the equation in the last slide is expressed as k′(θ) = E(Y Z).
􏰉 We further have
k′(θ) = E(Y Z) = E(Y )E(Z) + ρσY 􏰓nI(θ).
􏰉 Recall in slide 23 we have shown that E(Z) = 0,
which gives that
k′(θ)
ρ = σY 􏰓nI(θ).
􏰉 Because ρ2 ≤ 1, we have
[k′(θ)]2
σY2 nI(θ) ≤ 1.
28/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Asymptotic Properties of MLE
29/33

Corollary (6.1.1): Consistency of MLE
Assume X1, . . . , Xn are iid with f(x; θ0) under some regularity conditions.
Assume the likelihood equation has a unique solution θˆn, i.e.,
then
θ ˆ n →P θ 0 .
∂ ∂ θ
􏰣 􏰣
l ( θ ) 􏰣􏰣
􏰣 θˆ n
= 0 ,
Remark: Under some regularity conditions, the mle is a consistent estimator of the population parameter.
30/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Theorem (6.2.2): Asymptotic Normality of MLE
Assume X1, . . . , Xn are iid with f(x; θ0) under some regularity conditions. Assume the likelihood equation has a unique solution
θˆn. Then
Remark:
√􏰋ˆ 􏰌D 􏰍 1􏰎 n θn−θ0 →N 0,I(θ0) .
􏰉 We have constructed a central limit theorem for mle.
􏰉 Roughly speaking, when n is large, the distribution of the mle
θˆn is approximately
ˆ·􏰍1􏰎 θn∼N θ0,nI(θ0) .
Recall Rao-Crame ́r inequality: for an unbiased estimator of θ, Var(θˆ)≥ 1 .
nI(θ0)
Thus we say the mle θˆn is asymptotically unbiased and also asymptotically efficient.
31/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Large-sample Confidence Interval
􏰉 The asymptotic standard deviation of the mle θˆ is n
[nI (θ0 )]−1/2 .
􏰉 Because I(θ) is a continuous function of θ,
􏰛
I 􏰋 θ 􏰌 →P I ( θ ) .
n0
􏰉 An approximately (1 − α)100% confidence interval for θ is 
11
􏰛 􏰛  θn −zα/2􏰔 ,θn +zα/2􏰔 .
nI (θˆn ) nI (θˆn )
32/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Corollary (6.2.2): Delta-method
Under the assumptions of Theorem 6.2.2, suppose g(x) is a continuous function of x which is differentiable at θ0 such that g′(θ0) ̸= 0. Then
√ 􏰋 􏰋 􏰌 􏰌 􏰐 g′(θ)2􏰑
ngθn−g(θ0)→DN0, 0 . I (θ0)
􏰛
33/33
Boxiang Wang
Chapter 6 STAT 4101 Spring 2021

Related Posts