University of California, Los Angeles Department of Statistics
Instructor: Nicolas Christou
Properties of estimators
Let θˆ be an estimator of a parameter θ. We say that θˆ is an unbiased estimator of θ if E ( θˆ ) = θ
Examples:
Let X1, X2, · · · , Xn be an i.i.d.sample from a population with mean μ and standard deviation σ. Show that X ̄ and S2 are unbiased estimators of μ and σ2 respectively.
Statistics 100B
Unbiased estimators:
1
Efficient estimators:
Cram ́er-Rao inequality:
Let X1, X2, · · · , Xn be an i.i.d. sample from a distribution that has pdf f(x) and let θˆ be an unbiased estimator of a parameter θ of this distribution. We will show that the variance of θˆ is at least:
var(θˆ)≥ 1 = 1 nE(∂lnf(x))2 nI(θ)
∂θ
or var(θˆ)≥ 1 = 1 , −nE(∂2lnf(x)) nI(θ)
∂θ2
where I(θ) is the information on one observation. Theorem:
We say that θˆ is an efficient estimator of θ if θˆ is an unbiased estimator of θ and if
var(θˆ) = 1 = 1 nE(∂lnf(x))2 nI(θ)
∂θ
In other words, if the variance of θˆ attains the minimum variance of the Cram ́er-Rao inequality we say that θˆ is an efficient estimator of θˆ.
Example:
Let X1,X2,···,Xn be an i.i.d. sample from a normal population with mean μ and standard deviation σ. Show that X ̄ is a minimum variance unbiased estimator of μ. Verify that the result can be obtained using
I(θ) = E(∂lnf(x))2 = −E(∂2lnf(x)) = var(S), where S is the score function (see page 6). ∂θ ∂θ2
2
Relative efficiency:
If θˆ1 and θˆ2 are both unbiased estimators of a parameter θ we say that θˆ1 is relatively more efficient if var(θˆ1) < var(θˆ2). We use the ratio
var(θˆ1 ) var(θˆ2 )
as a measure of the relative efficiency of θˆ2 w.r.t θˆ1.
Example:
Suppose X1, X2, · · · , Xn is an i.i.d. random sample from a
Poisson
distribution
with parameter λ. Let
λˆ1 = X ̄ and λˆ2 = X1+X2 be two unbiased estimators of λ. Find the relative efficiency of λˆ2 w.r.t. λˆ1.
2
3
Consistent estimators:
Definition:
The estimator θˆ of a parameter θ is said to be consistent estimator if for any positive ε
lim P(|θˆ−θ|≤ε)=1 n→∞
or
lim P(|θˆ−θ|>ε)=0
n→∞
We say that θˆ converges in probability to θ (also known as the weak law of large numbers). In other words: the
average of many independent random variables should be very close to the true mean μ with high probability.
Theorem:
An unbiased estimator θˆ of a parameter θ is consistent if var(θˆ) = 0 as n → ∞.
4
MSE and bias:
The bias B of an estimator θˆ is given by B = E ( θˆ ) − θ
In general, given two unbiased estimators we would choose the estimator with the smaller variance. However this is not always possible (there may exist biased estimators with smaller variance). We use the mean square error (MSE)
M S E = E ( θˆ − θ ) 2
as a measure of the goodness of an estimator. We can show that
MSE = var(θˆ) + B2
Example: (From “Mathematical Statistics with Application”, by Wackerly, Mendenhall, Scheaffer).
The reading on a voltage meter connected to a test circuit is uniformly distributed over the interval (θ,θ+1), where θ is the true but unknown voltage of he circuit. Suppose that X1, X2, · · · , Xn denotes a random sample of such readings.
a. Show that X ̄ is a biased estimator of θ, and compute the bias.
b. Find a function of X ̄ that is an unbiased estimator of θ.
c. Find the MSE when X ̄ is used as an estimator of θ.
d. Find the MSE when the bias corrected estimator is used.
Example: (From “Theoretical Statistics”, by Robert W. Keener).
Let X ∼ b(100, p). Consider the three estimator, pˆ1 = estimator and plot it against p,
X , pˆ2 = 100
X+3 ,pˆ3 = 100
X+3 . 106
Find the MSE for each
0.0 0.2 0.4
0.6 0.8 1.0
p
5
MSE
0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035
Information and Cram ́er-Rao inequality
Let X be a random variable with pdf f(x;θ). Then ∞
f (x; θ)dx = 1 take derivatives w.r.t. θ on both sides −∞
∞ −∞
∞
−∞ f(x;θ)
∞ ∂lnf(x;θ)f(x;θ)dx = 0 −∞ ∂θ
∂f(x;θ)dx = 0 this is the same as: ∂θ
1
∂f(x;θ)f(x;θ)dx = 0 or ∂θ
differentiate again w.r.t. θ
∞ ∂2lnf(x;θ) ∂lnf(x;θ) ∂f(x;θ)
−∞ ∂θ2 f(x;θ)+ ∂θ ∂θ dx=0 or
∞ ∂2lnf(x;θ) ∂lnf(x;θ) 1 ∂f(x;θ)
−∞ ∂θ2 f(x;θ)+ ∂θ f(x;θ) ∂θ f(x;θ) dx=0 or
∞ ∂2lnf(x;θ) −∞ ∂θ2
∂lnf(x;θ)2
f(x;θ)+ ∂θ f(x;θ) dx=0 or
∞ ∂2lnf(x;θ) −∞
−∞
∂2lnf(X;θ) ∂lnf(X;θ)2
E ∂θ2 +E ∂θ =0or
∂lnf(X;θ)2 ∂2lnf(X;θ) E ∂θ =−E ∂θ2
∞ ∂lnf(x;θ)2
∂θ2 f(x;θ)dx+ ∂θ f(x;θ)dx=0 or
The expression ∂lnf(X;θ)2
= I(θ)
E ∂θ
is the so called information for one observation. The information can also be computed using the variance of the score function, S = ∂lnf(x;θ), i.e. I(θ) = var∂lnf(x;θ). Why?
∂θ
∂θ
6
Let’s find the information in a sample: Let X1, X2, · · · , Xn be an i.i.d. random sample from a distribution with pdf f(x;θ). The joint pdf of X1,X2,···,Xn is
L(θ) = f(x1;θ)f(x2;θ)······f(xn;θ) Take logarithms on both sides · · ·
lnL(θ) = lnf(x1; θ) + lnf(x2; θ) + · · · + lnf(xn; θ) Take derivatives w.r.t θ on both sides
∂lnL(θ) = ∂lnf(x1;θ) + ∂lnf(x2;θ) +···+ ∂lnf(xn;θ) ∂θ ∂θ ∂θ ∂θ
When one observation was involved (see previous page) the information was E
dealing with a random sample X1, X2, · · · , Xn and f(x; θ) is replaced by L(θ) (the joint pdf). Therefore, the
∂lnL(θ)2 information in the sample will be E ∂θ .
∂lnL(θ)2 ∂lnf(x1;θ) ∂lnf(x2;θ) ∂lnf(xn;θ)2
∂θ = ∂θ + ∂θ +···+ ∂θ or
∂lnL(θ)2 ∂lnf(x1;θ)2 ∂lnf(x2;θ)2 ∂lnf(xn;θ)2 ∂θ = ∂θ + ∂θ +···+ ∂θ
+ 2∂lnf(x1;θ)∂lnf(x2;θ) +··· ∂θ ∂θ
∂lnf(X;θ)2
∂θ . Now we are
Take expected values on both sides
∂lnL(θ)2 ∂lnf(X1;θ)2 ∂lnf(X2;θ)2 ∂lnf(Xn;θ)2
E ∂θ =E ∂θ +E ∂θ +···+E ∂θ The expected value of the cross-product terms is equal to zero. Why?
We conclude that the information in the sample is: ∂lnL(θ) 2
E ∂θ = I(θ) + I(θ) + · · · + I(θ) or
In(θ) = nI(θ)
The information in the sample is equal to n times the information for one observation.
7
Cram ́er-Rao inequality:
ˆ1ˆ1ˆ1
var(θ)≥
nI(θ)
or var(θ)≥
∂lnf(X;θ) ∂θ
2 or var(θ)≥
2 −nE ∂ lnf(X;θ)
∂θ2
nE
LetX1,X2,···,Xn beani.i.d. randomsamplefromadistributionwithpdff(x;θ),andletθˆ=g(X1,X2,···,Xn) be an unbiased estimator of the unknown parameter θ. Since θˆ is unbiased, it is true that E(θˆ) = θ, or
∞ ∞
······ g(x1,x2,···,xn)f(x1;θ)f(x2;θ)···f(xn;θ)dx1dx2 ···dxn = θ
−∞
−∞
Take derivatives w.r.t θ on both sides
∞ ∞ ······
−∞ −∞
Since
g(x1,x2,···,xn)
i
n 1 ∂f(x;θ)
f(xi;θ ∂θ i=1
f(x1;θ)f(x2;θ)···f(xn;θ)dx1dx2 ···dxn = 1
1
f(xi;θ) ∂θ ∂θ
∂f(xi;θ) = ∂lnf(xi;θ) we can write the previous expression as
n ∂lnf(x;θ)
∂θ
i=1
∞ ∞
······ g(x1,x2,···,xn)Qf(x1;θ)f(x2;θ)···f(xn;θ)dx1dx2 ···dxn = 1
∞ −∞
······
∞ −∞
g(x1,x2,···,xn)
i
f(x1;θ)f(x2;θ)···f(xn;θ)dx1dx2 ···dxn = 1
or
−∞
−∞
where
Q=n ∂lnf(xi;θ)
∂θ
i=1
But also, θˆ = g(X1, X2, · · · , Xn). So far we have EθˆQ = 1. Now find the correlation between θˆ and Q.
8