Sta$s$cal Inference STAT 431
Lecture 7: Inferences for Single Samples (II)
Review: Large Sample Inferences for Mean
• Problem setup
– A random sample of size n from the popula$on distribu$on F
• F is not necessarily normal!
– Parameter of interest: the mean μ of F
– We assume that the sample size n is large [typically n ≥ 30 is large enough]
– We also assume that the SD σ of F is known
• For large sample, if σ is unknown, we can always replace σ by s and act as if σ = s
• In large sample inference for mean, the following pivotal random variable plays the key role:
X ̄ − μ d
Z= σ/√n ≈N(0,1)
STAT 431 2
Other Inference Problems
1. Inferences for μ with small sample, with σ known
2. Inferences for μ with small sample, with σ unknown
3. Inferences for σ2
• We are mainly interested in confidence interval and hypothesis tes$ng problems
• For all the above problems, we assume that the popula$on distribu$on is normal
i.i.d. 2
X1,…,Xn ∼ N(μ,σ )
STAT 431 3
Small Sample Inferences for Mean (Known σ) • As in the large sample case, the pivotal random variable is
X ̄ − μ
Z= σ/√n ∼N(0,1)
• 100(1 − α)% two-sided CI for μ: ̄σ ̄σ
[X − zα/2 √n , X + zα/2 √n ] – Exercise:writedowntheone-sidedCI’sforμ
z-interval
x ̄−μ0 • Hypothesis tes$ng for μ : observed test sta$s$c z = σ/√n
P-value
1−Φ(z) Φ(z)
2[1 − Φ(|z|)]
H0 H1 Rejec$on region (level α ) μ≤μ μ>μ0 z>zα↔x ̄>μ0+zα σ
0 √n
z-test σ
μ ≥ μ 0 μ=μ0
μ < μ 0 μ=μ0
σ z < − z α ↔ x ̄ < μ 0 − z α √ n
|z|>zα/2 ↔|x ̄−μ0|>zα/2√n
STAT 431 4
Small Sample Inferences for Mean (Unknown σ ) i.i.d. 2
X1,…,Xn ∼ N(μ,σ ) X ̄ − μ
• In this case, we s$ll have Z = σ/√n ∼ N(0,1)
– However,itisnolongerqualifiedasapivotalr.v.forμ!
– When σ is unknown, Z depends not only on μ, but also on the nuisance parameter σ
• It is natural to replace the unknown σ by the sample counterpart S , this leads to a
new sta$s$c:
• Ques$ons: What is its distribu$on? Is it a pivotal r.v. for μ ?
• To answer these ques$ons, we start with a closer look at sample variance.
STAT 431 5
X ̄ − μ T = S/√n
• No$cethat 0= i =d Zi2 σ2 σ
Sample Variance
i.i.d.
1 n
S 02 = n ( X i − μ ) 2
• Setup: X1,…Xn ∼ N(μ,σ2)
• Suppose we know μ , then we could use the following es$mator for σ2
i=1
n S 2 n X − μ 2 n
where Zi ∼ N(0,1).
• Therefore, to understand the sampling distribu$on of S02 = to understand the distribu$on of ni=1 Zi2 !
STAT 431
i.i.d.
i=1 i=1
6
• Defini+on
For n ≥ 1, let Zi
Chi-Square Distribu$on
∼ N (0, 1). Then the distribu$on of the random variable
n
X= Zi2 i=1
i.i.d.
is called the Chi-square distribu+on with n degrees of freedom.
• In abbrevia$on, we write X ∼ χ2n .
• Key proper$es
– Mean and variance: E(X) = n, Var(X) = 2n – Momentgenera$ngfunc$on:
M (t) = E(etX ) = (1 − 2t)− n , t < 1 22
STAT 431
7
Chi-Square Distribu$on (Cont’d) • Probability density func$on [no need to memorize]
f(x) = 1 xn/2−1e−x/2, 2n/2 Γ(n/2)
• Shape of the Chi-square distribu$on
– Alwaysposi$velyskewed
– The level of skewness decreases as
the degrees of freedom (d.f.) increase – Thecurvesshietotheright
as the d.f. increase
– Forlarged.f.,thedensitycurvehas
x ≥ 0
Chi−Square Densities
an approximate bell shape
0 10 20 30 40 50 x
STAT 431
8
n = 5, 10, 15, 25
Density
0.00 0.05 0.10 0.15
Sample Variance (Cont’d) • Recall that S02 = 1 ni=1(Xi − μ)2, and so
n
nS2 nXi−μ2dn
0= = Zi2∼χ2n σ2 σ
i=1 i=1
• Using proper$es of the Chi-square distribu$on, we obtain
– S02 is an unbiased es$mator for σ2 : E(S02) = σ2
– Inaddi$on,weobtainitsvarianceviathevarianceoftheChi-square distribu$on:
n S 02
Var σ2 =2n =⇒
2 Var(S0)=Var
n S 02 σ2
σ 4 2 σ 4 ·n2 = n
STAT 431
9
Sample Variance: Unknown Mean
i.i.d.
n
• SupposeX1,...Xn ∼ N(μ,σ2), but this $me, we do not know μ
• In this case, S02 = 1 ni=1(Xi − μ)2 can no longer be used
• Instead, we could use the usual sample variance 1 n
S 2 = n − 1 ( X i − X ̄ ) 2 i=1
• Ques$on: What is the sampling distribu$on of S 2 ?
• To answer this ques$on, we need a few prepara$ons.
• First, observe that
(n−1)S2 n Xi −X ̄2 n Xi −μ X ̄ −μ2 n
STAT 431
σ2 = σ = σ − σ = (Zi−Z ̄)2 i=1 i=1 i=1
i.i.d.
where Zi ∼ N(0,1).
10
Sample Variance: Unknown Mean (Cont’d) Focus on ni=1(Zi − Z ̄)2, we have the following facts:
1. BothZ ̄andZi−Z ̄,i=1,...,narenormallydistributedwithmean0 2. The covariance between Z ̄ and each Zi − Z ̄ is
3. Wehavetheiden$ty
n n Z ̄ 2 +
n i=1
C o v ( Z ̄ , Z i − Z ̄ ) = 0
( Z i − Z ̄ ) 2 =
4. We know that ni=1 Zi2 ∼ χ2n and that √nZ ̄ ∼ N (0, 1)
Claim: The above facts leads to the conclusion that ni=1(Zi − Z ̄)2 ∼ χ2n−1. [Proof given in class.]
STAT 431 11
i=1
Z i2 .
Back to The T Sta$s$c • With proper transforma$ons, we obtain that
̄ X ̄ − μ X ̄ − μ
X −μ 1/√n σ/√n d N(0,1)
T = S / √ n = √ S 2 = S 2 / σ 2 = χ 2n − 1
• Note that we have derived that • Defini+on
n−1 X ̄ is independent of S2
Let Z ∼ N (0, 1) be independent of U ∼ χ2n . The distribu$on of X = Z
U/n
is called Student’s t distribu$on with n degrees of freedom. • In abbrevia$on, we write X ∼ tn.
STAT 431 12
Student’s t-Distribu$on • Probability density func$on [no need to memorize]
n+1 −n+1 Γ( ) t2 2
f(x)=√2n 1+ ,x∈R nπ Γ( 2 ) n
W. S. Gosset (a.k.a. Student)
(picture from wikipedia)
t−Distribution Densities
Density
0.0 0.1 0.2 0.3 0.4
• Shape of the t distribu$on
– Symmetricaround0
– HaveheaviertailsthanN(0,1)
– Larger d.f.èlighter tails
– Asd.f.tendstoinfinity,thedensity
curve converges to that of N (0, 1)
STAT 431
−4 −2 0 2 4
n = 1, 2, 5; N(0,1)
13
Rescaled Sample Mean (Cont’d)
• Recall that
T = S/√n = √S2 = S2/σ2 = χ2n−1 ∼tn−1
̄ X ̄ − μ X ̄ − μ
X −μ 1/√n σ/√n d N(0,1)
• Therefore,
– When we do not know σ, the rescaled sample mean T = S/√n
is a pivotal random variable for μ.
– When the sample size is large, the distribu$on of T ∼ tn−1 ≈ N (0, 1). – For large sample size, S gives very accurate es$mate of σ
è the difference between T and Z is small
– Ques$on: If we know σ, T is s$ll a pivotal random variable for μ.
Shall we base our inference on T or Z ?
STAT 431 14
n−1
X ̄ − μ
• Key points of this class
– SamplevarianceandChi-squaredistribu$on
– Student’stdistribu$on:theTsta$s$c
• Proper$es and connec$ons to the normal and the chi-square distribu$ons
• Reading: Sec$ons 5.2, 5.3 and 7.2 of the textbook
• Next class: Inferences for Single Samples (III) (Sec.7.2 & 7.3)
STAT 431 15