Chapter 7 Sufficiency
7.2 A Sufficient Statistic for a Parameter
1/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Minimum Variance Unbiased Estimator (MVUE)
For a given sample size n, statistic Y = u(X1, · · · , Xn) will be called a minimum variance unbiased estimator (MVUE) of the parameter θ, if
1 E(Y)=θ;
2 Var(Y ) ≤ Var(T), for any T such that E(T) = θ.
If Y is efficient, say, Y is unbiased and Var(Y ) reaches the Rao-Crame ́r lower bound, then Y must be an MVUE, but often no efficient estimator exists.
Are there general methods to find an MVUE? The journey begins…
2/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Motivation of Sufficiency
We have a coin such that P(head)= θ, where θ ∈ (0, 1) is unknown. I flipped the coin 1, 000, 000 times, and the result is 0,0,1,1,0,1,1,1,0,···. You want the results of my experiment in order to make inference about θ.
Do I need to send you all the data?
How about just one statistic: i xi = 567, 111?
3/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Sufficient Statistic
Assume that X1, . . . , Xn is a random sample from the distribution for which the pdf (pmf) is f(x;θ), θ ∈ Ω. Let Y1 = u1(X1,…,Xn) be a statistics.
We call Y1 a sufficient statistic for θ, if the conditional joint distribution of X1, . . . , Xn given Y1 = y1 does not depend on the unknown value of θ.
In a sense Y1 exhausts all the information about θ that is contained in the sample. Knowing Y1 is as good as knowing the entire data set.
4/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example (7.2.1): Bernoulli Distribution
Let X1, . . . Xn denote a random sample from the distribution with pmf
θx(1−θ)1−x x=0,1; 0<θ<1 f (x; θ) = 0 elsewhere.
Find a sufficient statistic for θ.
5/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
ThestatisticY1 =X1 +X2 +...+Xn followsBin(n,θ):
nθy1(1−θ)n−y1 y1 = 0,1,...,n fY1 (y1;θ)= y1
0 elsewhere. Consider the conditional probability:
P (X1 = x1,X2 = x2,...,Xn = xn|Y1 = y1) = P(A|B). We have P(A|B) = P(A)/P(B) since A ⊂ B. Thus
θx1(1−θ)1−x1θx2(1−θ)1−x2 ...θxn(1−θ)1−xn nθy1(1−θ)n−y1
y1 θxi(1−θ)n−xi
=nθxi(1−θ)n−xi y1
1 =n,
y1
which does not depend upon θ, so Y is a sufficient statistic for θ.
6/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Theorem (7.2.1): Factorization Theorem
Let X1, · · · , Xn denote a random sample from a distribution that has pdf or pmf f(x;θ),θ ∈ Ω. The statistic Y1 = y1(X1,··· ,Xn) is a sufficient statistic for θ if and only if we can find two nonnegative functions, C1 and C2, such that
f(x1;θ)···f(xn;θ) = C1y1;θ C2(x1,··· ,xn),
where C2(x1, · · · , xn) does not depend on θ.
Remark:
The theorem can help us find sufficient statistics.
iid
1 If X1,...,Xn ∼ Bern(p), then Y1 =
iid
2 If X1,...,Xn ∼ Pois(λ), then Y1 =
Xi is sufficient for p. Xi is sufficient for λ.
7/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example (7.2.5)
Let X1, . . . , Xn denote a random sample from a distribution with pdf
θxθ−1 0
n θ−1 n θ 1 θn xi =θn xi ni=1xi
i=1 i=1 n
= C1 xi;θ C2(x1,··· ,xn). i=1
Thus ni=1 Xi is a sufficient statistic for θ.
8/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example (7.2.6)
Let Y1 < Y2 < ... < Yn denote the order statistic of a random sample of size n from the distribution with pdf
f(x; θ) = e−(x−θ)I(θ,∞)(x).
Find a sufficient statistic for θ. Solution: The joint pdf of X1, X2, X3 is
3
e−(xi−θ)I(θ,∞) (xi) = e3θI(θ,∞) (min xi) i=1
3
exp − xi i=1
= C1 (minxi;θ) C2(x1,··· ,xn). Thus min Xi is a sufficient statistic for θ.
9/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example
Let X1, . . . , Xn denote a random sample from Γ(α, β). Find a sufficient statistic for (α, β).
10/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
The joint pdf of X1,...,Xn is
1 nn α−1
n exp − i=1
β
xi
·1
Γ(α)βα
xi i=1
nn =C1 Xi,Xi;α,β
C2(x1,··· ,xn). Thus (ni=1 Xi, ni=1 Xi) is a sufficient statistic for (α, β).
i=1 i=1
11/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example: 7.2.4
Let X1, . . . , Xn denote a random sample from N(θ, σ2). Find a sufficient statistic for θ.
12/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
The joint pdf of X1,...,Xn may be written 1 n ni=1(xi−θ)2
√2πσexp− 2σ2
1 n ni=1 x2i θ ni=1 xi nθ2
=√2πσexp−2σ2+σ2 −2σ2
1 n nθ2 θni=1xi ni=1x2i
=√2πσexp−2σ2exp σ2 exp−2σ2 n
=C1 xi;θ C2(x1,··· ,xn), if σ is known, i=1
then ni=1 Xi is a sufficient statistic for θ;
1 n nθ2 θni=1xi ni=1x2i
=√2πσexp−2σ2expσ2 exp−2σ2
nn
=C1 xi,x2i;θ,σ2 ·1, ifσisunknown,
i=1 i=1
then (ni=1 Xi, ni=1 Xi2) is a sufficient statistic for (θ, σ2).
13/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Chapter 7 Sufficiency
7.3 Properties of a Sufficient Statistic
14/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Suppose our goal is to estimate a parameter θ, or more generally, a function of the parameter τ(θ).
Then minimum variance unbiased estimator (MVUE) could be a good choice.
The following theorem gives a general hint on to find MVUE for τ(θ) in general.
15/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Theorem (7.3.1): Rao-Blackwell Theorem
Let Y1 = u1(X1, . . . , Xn) be a sufficient statistics for θ, and let Y2 = u2(X1, . . . , Xn) be an unbiased estimator for θ. Then,
1 φ(Y1) = E(Y2|Y1) is an unbiased estimator for θ.
2 MSE(φ(Y1)) = Var(φ(Y1)) ≤ Var(Y2)
16/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Proof
Recall that
EX = E[E(X|Y )];
VarX = E[Var(X|Y )] + Var[E(X|Y )]. Eve’s law
It follows that
and
Eφ(Y1) = E[E(Y2|Y1)] = EY2 = θ,
Var(Y2) = E[Var(Y2|Y1)] + Var[E(Y2|Y1)] ≥ Var[E(Y2|Y1)]
= Var[φ(Y1)].
Note E(Y2|Y1) is indeed a statistic since Y1 is a sufficient statistic of θ. The distribution of Y2|Y1 does not depend on θ.
17/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example
Let X1, . . . , Xn be a random sample from Pois(λ). Find an unbiased estimator of e−λ. Also improve this estimator using Rao-Blackwell theorem.
18/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example of Conditioning on an Insufficient Statistic
Let X1 and X2 be a random sample from N(θ, 1). The statistic X ̄ = (X1 + X2)/2 has
EX ̄ = θ, and Var(X ̄) = 1/2.
Consider conditioning on X1, which is not sufficient.
Define φ(X1) = E(X ̄|X1). It follows that Eφ(X1) = θ and Varφ(X1) ≤ Var(X ̄) by Rao-Blackwell theorem.
It seems that φ(X1) is better than X ̄. However, φ(X1) = E(X ̄|X1)
= 1E(X1|X1) + 1E(X2|X1) 22
= 1X1 + 1θ, 22
which is not a statistic!
19/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Theorem (7.3.2): Connect MLE with Sufficiency
Let X1, · · · , Xn denote a random sample from a distribution f (x; θ). Suppose the maximum likelihood estimator θˆ of θ is unique, then for any sufficient statistic Y , θˆ is a function of Y .
L(θ;x1,x2,...,xn) = f (x1;θ)f (x2;θ)···f (xn;θ) = fY [y;θ]H (x1,x2,...,xn).
20/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Non-uniqueness of Sufficient Statistic
A sufficient statistics is not unique.
It is always true that the entire sample X is a sufficient
statistic.
Factorization theorem:
f (x1; θ) f (x2; θ) · · · f (xn; θ) = C1(x1, . . . , xn|θ) · C2,
where C2 = 1.
Thus Y1 = (X1, . . . , Xn) is a sufficient statistic.
21/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Non-uniqueness of Sufficient Statistic (cont’d)
It also follows that any one-to-one function of a sufficient statistic is also a sufficient statistic.
For if Y1 = u1(X1,...,Xn) is a sufficient statistics for θ and Y2 = g(Y1), where g is a one-to-one function, then Y2 is also sufficient.
22/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Consider X1,X2,...,Xn ∼ N(μ,σ2), where σ is known. We have shown ni=1 Xi is a sufficient statistic for μ.
From below, we also see X ̄ is a sufficient statistic for μ. 1n n
σ√2π exp − (xi − θ)2 /2σ2 i=1
exp−n (x −x)2/2σ2 22 i=1i
= exp −n(x−θ) /2σ (σ√2π)n
.
If σ is unknown, then (ni=1 Xi, ni=1 Xi2) is a sufficient statistic for (μ, σ2), and so is (X ̄ , S2).
23/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Chapter 7 Sufficiency
7.4 Completeness and Uniqueness
24/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Complete Family and Complete Statistics
Let X1, · · · , Xn denote a random sample from a distribution with a pdf or pmf that is from a family f(x;θ),θ ∈ Ω.
Suppose that Y1 is a sufficient statistics for θ. We say that Y1 is complete if
Eθ(g(Y1))=0 ⇒ Pθ(g(Y1)=0)=1 foranyθ∈Ω.
If Y1 is both complete and sufficient, we say Y1 is a complete
sufficient statistic for θ.
When people say Y1 is complete, they really mean that the family of pdf’s of the statistic Y1 is complete.
25/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example of Complete Family: Binomial Distribution
Suppose that T has binomial(n, θ) distribution with θ ∈ (0, 1) and g is a function such that Eθ(g(T)) = 0 ∀θ. Then
0=Eθ[g(T)] =
= (1−θ) g(k) k 1−θ
k=0 Ifweputr= θ weseethatthisequals
n n n k
n nk n−k
k=0
g(k) k θ (1−θ)
n n n θ k
1−θ
(1−θ)
g(k) k r
k=0
which is a polynomial in r of order n. Since this is a constant equal to zero for all r > 0, it must be that g(k) = 0 for each k = 0, . . . , n. Since, for each θ, T is supported on {0, . . . , n} it follows that Pθ(g(T) = 0) = 1 ∀θ so T is complete.
26/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example (7.4.1): Exponential Distribution
Consider the family of pdfs {h(z; θ) : 0 < θ < ∞}. Suppose Z has a pdf in this family given by
1e−z/θ 0
1∞ −z/θ
u(z)e dz=0, ∀θ.
θ
The integral in the left-hand side is the Laplace transform of u(z). Then u(Z) = 0 except on a set of points with probability zero is the only function that can be transformed to be a zero function.
0
27/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Intuitive Understanding of Completeness
Suppose our goal is to find an “optimal” unbiased estimator of g(θ). (You know our ultimate goal is MVUE.)
If we find two unbiased estimators based on Y1, say φ(Y1) and ψ(Y1), then Eθ[φ(Y1) − ψ(Y1)] = 0 for all θ ∈ Ω.
Therefore, regardless of θ, φ(Y1) and ψ(Y1) coincide (with probability one).
In other word, if we find one unbiased estimator of θ based on Y1, we have essentially found all of them. Mission complete!
Uniqueness?
28/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Theorem (7.4.1): Lehmann and Scheffe ́
Assume that X1, . . . , Xn is a random sample from the distribution for which the pdf (pmf) is f(x;θ), θ ∈ Ω. Let Y1 = u1(X1,…,Xn) be a complete sufficient statistics for θ. If φ(Y1) is an unbiased estimator of θ, then this function of Y1 is the unique (in probability) MVUE of θ.
Proof:
1 By Rao-Blackwell, if Y2 is any unbiased estimator of θ, then E(Y2|Y1) is an unbiased estimator of θ with
Var[E(Y2|Y1)] ≤ Var[Y2].
2 But E(Y2|Y1) is a function of Y1, so by completeness it must coincide with φ(Y1).
3 Thus, Varθ[φ(Y1)] ≤ Varθ[Y2] for all θ ∈ Ω.
29/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example (7.4.2): Uniform Distribution
Let X1, . . . , Xn be a random sample from the uniform distribution with pdf f(x;θ) = 1/θ, 0 < x < θ, θ > 0, and zero elsewhere. Find the MVUE of θ.
30/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
Example (7.3.1)
Let X1, . . . , Xn be a random sample from the exponential distribution with pdf
0 < x < ∞,θ > 0 elsewhere
f(x;θ) = θe θ 0
Find the MVUE of 1/θ.
1−x
31/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021
In the next section, we will talk about an important class of distributions, including Bernoulli, Poisson, normal, exponential, …. This class is called the exponential class.
We will show the exponential class possesses complete and sufficient statistics which are readily determined from the distribution.
32/32
Boxiang Wang
Chapter 7 STAT 4101 Spring 2021