4 Best unbiased estimators
1. Notation. In this section we suppose that we have a random sample of size n
with sample random variables X1, X2, …, Xn from a population with distribution function
F (x, θ) with associated p.d.f./p.f. f(x, θ), where θ ∈ Θ.
2. Definition. An unbiased estimator ĝ(X) of the real-valued function g(θ) is a min-
imum variance unbiased estimator (MVUE) if, for any other unbiased estimator g̃(X),
var(ĝ(X))≤ var(g̃(X)) for all θ ∈ Θ.
3. Theorem (Rao-Blackwell). Suppose that g̃(X) is an unbiased estimator of the
real-valued function g(θ) and that T is a sufficient statistic for θ. Then the conditional
expectation, E(g̃(X)|T ) = ĝ(T ), is an unbiased estimator of g(θ) and var(ĝ) ≤ var(g̃) for
all θ ∈ Θ.
4. Example. Let X1, X2…, Xn be i.i.d. Poisson(θ).
5. Note. It follows that, conditioning any unbiased estimator on a sufficient statistic
will result in an improvement. So, in looking for a best unbiased estimator of g(θ), we
need to consider only estimators based on a sufficient statistic. If an estimator is not a
function of a sufficient statistic, then there is another estimator which is a function of a
sufficient statistic and which is at least as good, in the sense of having smaller variance.
6. Complete statistics. Suppose that ĝ = ĝ(X) is an unbiased estimator which is based
on a sufficient statistic T . How do we know that ĝ is best unbiased? If the probability
distribution of T has a property called completeness, then ĝ is the unique MVUE of g(θ).
Definition: Let {ψ(t, θ) : θ ∈ Θ} be a family of p.d.f./p.f. for a statistic T . T is called
a complete statistic if, E(φ(T )) = 0 for all θ ∈ Θ implies that φ(T ) = 0 (a.s.).
7. Theorem. (Completeness theorem for an exponential family) Let X1, X2…, Xn be
i.i.d. observations from a p.d.f./p.f. f(x, θ) that belongs to the exponential family, that
is
f(x, θ) = exp{
∑k
i=1 w(θi)Ki(x) + S(x) + q(θ)}
for all {x : f(x, θ) 6= 0}, where θ = (θ1, …, θk) and the set {x : f(x, θ) 6= 0} does
not depend on θ. Suppose also that the parameter space contains a k dimensional open
rectangle. Then
T (X) =
(
n∑
j=1
K1(Xj),
n∑
j=1
K2(Xj), . . . ,
n∑
j=1
Km(Xj)
)
is a complete sufficient statistic for θ.
8. Theorem. Let T be a complete sufficient statistic for a parameter θ, and let ĝ(T ) be
any estimator based only on T. Then ĝ(T ) is the unique MVUE of its expected value.
9. Remark. Given that there is a complete sufficient statistic T for θ, the search for
a MVUE for g(θ) requires us to find either an unbiased estimator ĝ(T ) or to find any
unbiased estimator g̃(X) and then calculate E(g̃(X)|T ) = ĝ(T ).
10. Example. Let X1, X2…, Xn be i.i.d. (a) N(0, σ
2), (b) Bin(N, θ).
11. Notes. (i) If a minimal sufficient statistic exists, any complete sufficient statistic
14
is also a minimal sufficient statistic. However, it is not true that any minimal sufficient
statistic is necessarily complete.
(ii) In general, lack of completeness of a minimal sufficient statistic prevents our using
Theorem 8 to establish the existence of an MVUE of a real parameter. In searching for
an optimal estimator, we might ask whether there is a lower bound for the variance of
an unbiased estimator. If such a lower bound existed, it would function as a benchmark
against which estimators could be compared. If an estimator achieved this lower bound,
we would know that it could not be improved upon. In the case of unbiased estimators,
the following theorem provides such a lower bound.
12. Theorem (Cramér-Rao). Subject to regularity conditions (see 13), for any unbi-
ased estimator ĝ(X) of the real function g(θ),
var(ĝ(X)) ≥
(g
′
(θ))2
I(θ)
.
where
I(θ) = E
([
∂
∂θ
log f(X, θ)
]2)
.
13. Regularity conditions for Cramér-Rao. Written for a continuous distribution,
these are:
(i) {x : f(x, θ) 6= 0} is independent of θ ∈ Θ and Θ contains an open interval in R1;
(ii)
∂f(x,θ)
∂θ
exists for all x and d
dθ
∫
…
∫
Rn
f(x,θ)dx =
∫
…
∫
Rn
[ ∂
∂θ
f(x,θ)]dx = 0;
(iii) E([ ∂
∂θ
log f(X,θ)]2) exists for all θ ∈ Θ; and,
(iv) d
dθ
∫
…
∫
Rn
ĝ(x)f(x,θ)dx =
∫
…
∫
Rn
ĝ(x)[ ∂
∂θ
f(x,θ)]dx, where ĝ(x) is an unbiased
estimator of the differentiable function g(θ).
For a discrete distribution just replace the
∫
signs by
∑
signs.
14. Note. Suppose that the regularity conditions are satisfied.
(i) If V = ∂
∂θ
log f(X,θ) then E(V ) = 0.
(Indeed, E(V ) =
∫
…
∫
Rn
[ ∂
∂θ
log f(x,θ)]f(x,θ)dx =
∫
…
∫
Rn
[ ∂
∂θ
f(x,θ)]dx = 0.)
(ii) Write I(θ) = var(V ) = E(V 2). Then I(θ) = nE([ ∂
∂θ
log f(X1, θ)]
2).
(Indeed, V = ∂
∂θ
log f(X,θ) =
∑n
k=1
∂
∂θ
log f(Xk, θ) and, as in (i), E(
∂
∂θ
log f(Xk, θ)) = 0
for k = 1, 2, …, n. Also, since X1, X2, …, Xn are i.i.d., the random variables
∂
∂θ
log f(X1, θ),
∂
∂θ
log f(X2, θ) , . . . ,
∂
∂θ
log f(Xn, θ)
are also i.i.d. Hence,
E([ ∂
∂θ
log f(X,θ)]2) = var( ∂
∂θ
log f(X,θ)) =
∑n
k=1 var(
∂
∂θ
log f(Xk, θ)) = nE([
∂
∂θ
log f(X1, θ)]
2).)
15. The Fisher information. The quantity
I(θ) = E([
∂
∂θ
log f(X, θ)]2)
is called the (total) Fisher information. This terminology reflects the fact that the amount
of information gives a bound on the variance of the best unbiased estimator of g(θ). As
15
the information gets bigger and we have more information about θ, we have a smaller
bound on the variance of the best unbiased estimator. By 14-(ii),
I(θ) = ni(θ)
where
i(θ) = E([
∂
∂θ
log f(X1, θ)]
2)
is called the one step Fisher information. So, the information in n i.i.d. observations is
n-times that in one of the observations.
16. Note. Subject to the regularity conditions,
i(θ) = −E
(
∂2
∂θ2
log f(X1, θ)
)
.
17. Examples. Let X1, X2…, Xn be i.i.d. (a) N(µ, σ
2), where σ is known, (b) Poisson(θ).
18. Note. The Cramér-Rao Theorem establishes the lower bound for the variance of an
unbiased estimator. If one finds an estimator which attains the bound then it is a MVUE.
But there is no guarantee that the bound is sharp. The value of the Cramér-Rao lower
bound may be strictly smaller than the variance of any unbiased estimator. The following
theorem shows that only in exceptional circumstances can the lower bound be attained.
19. Theorem (Attainment) Subject to the regularity conditions of the Cramér-Rao
Theorem, there exists an unbiased estimator ĝ(X) of g(θ) whose variance attains the
Cramér-Rao lower bound if and only if
∂
∂θ
log f(X,θ) = a(θ)(ĝ(X)− g(θ))
for some function a(θ). Furthermore, in this case, I(θ) = a(θ)g′(θ).
20. Note. If X1, X2, …, Xn are i.i.d. with the common p.d.f./p.f. f(x, θ), then
∂
∂θ
log f(X, θ) =
∂
∂θ
log
n∏
i=1
f(Xi, θ) =
n∑
i=1
∂
∂θ
log f(Xi, θ).
21. Examples. Let X1, X2…, Xn be i.i.d. (a) N(0, σ
2), (b) Poisson(θ) and g(θ) = e−θ.
22. Definition. The efficiency (eff) of an unbiased estimator ĝ(X) of g(θ) is defined to
be C-R lower bound ÷ var(ĝ(X)), i.e.,
eff (ĝ(X)) =
[g
′
(θ)]2
I(θ)var(ĝ(X))
.
Naturally, we look for estimators with efficiency close to 1. We say that an unbiased
estimator ĝ(X) of g(θ) is efficient if eff(ĝ(X)) = 1 (that is, if the variance of ĝ(X) attains
the C-R lower bound). Clearly, if an estimator is efficient, it is a MVUE, but not vice
versa. In Example 21 (a), the MVUE of σ2 is efficient, but in (b), the the MVUE of
ĝ of g(θ) = e−θ is not efficient. Nevertheless, it can be shown that in Example 21 (b),
eff(ĝ) ≈ 1 for large n’s.
16
23. Remarks.
(i) When equality holds in (19).
We have ∂
∂θ
log f(X,θ) = a(θ)(ĝ(X) − g(θ)). Replacing X by x and integrating w.r.t. θ
we have
log f(x,θ) = ĝ(x)
∫
a(θ)dθ −
∫
a(θ)g(θ)dθ + S(x)
or (provided that all the integrals exist and are finite),
f(x,θ) = exp[ĝ(x)p(θ) + S(x) + q(θ)] (say)
so that the distribution must be in the exponential class if the C-R gives the MVUE.
(ii) Cramér-Rao and MLE. If, under the conditions of the Cramér-Rao inequality, θ̂ is
efficient for θ, then θ̂ is the maximum likelihood estimator of θ.
Now, maximizing f(x,θ) is equivalent to maximizing log f(x,θ) and we have
0 = ∂
∂θ
log f(X,θ) = a(θ)(θ̂(X)− θ),
so that the estimator giving the Cramér-Rao lower bound is the MLE.
24. Asymptotic properties of MLE’s. In general, MLE’s are neither unbiased nor
efficient. The main justification of the method of maximum likelihood in terms of the cri-
terion of minimum variance unbiasedness is that it is possible to show that for large sam-
ples, subject to certain regularity conditions, maximum likelihood estimators are nearly
unbiased and have variances nearly equal to the C-R lower bound.
In particular, it can be shown that, for large values of n, the MLE θ̂ of θ has asymptotically
the normal distribution N(θ, 1
I(θ)
).
17