Statistical Inference STAT 431
Lecture 10: Inferences for Two Samples (II)
Inferences for Two Samples
• Two random samples (Independent samples design)
– Two populations F1 and F2
• With population means μ1 and μ2
• With population standard deviations 1 and 2
– Independent samples with sample sizes n1 and n2 from F1 and F2 , respectively
iid
• X1,…Xn1 ⇠ F1 independent of Y1,…Yn2 ⇠ F2
– Primary problem: comparison of the two population means
– Another problem: comparison of the two population variances
• In addition, we consider a different type of two sample problem
– Matched pairs design
– Primary interest: comparison of the means
STAT 431
iid
Comparison of Variances – Two normal populations N (μ1 , 12 ) and N (μ2 , 2 )
• Basic statistical setup
– Independent samples design
iid
X1,…,Xn1 ⇠ N(μ1, 1) independent of Y1,…,Yn2 ⇠ N(μ2, 2)
– Sample variances from two samples: S12 and S2
• Goal: comparison of the two population variances
– Confidence intervals for the ratio 12/ 2
– Testsof H0 : 12 = 2 vs.H1 : 12 6= 2 [orofH0 : 12 2 ,orofH0 : 12 2 ] • Note: all the hypotheses can be expressed in terms of the ratio 12/ 2
2
iid
STAT 431
F Distribution
Let U ⇠ 2k1 and V ⇠ 2k2 be independent. Then the distribution of the ratio
W = U/k1 V /k2
is an F distribution with k1 d.f. in the numerator and k2 d.f. in the denominator.
• In abbreviation, we writeW ⇠ Fk1,k2 .
• Useful properties of the F distributions – IfX⇠Fk1,k2 ,then1/X⇠Fk2,k1 . – If T ⇠tn,thenT2 ⇠F1,n
• R routines: {d,p,q,r}f
• Definition
STAT 431
• So,
• Note that
⇥2n1 1/(n1 1)
F = S2/ 2 = ⇥2n2 1/(n2 1) ⇠ Fn1 1,n2 1
Pivotal Random Variable for the Ratio /I IIIIIII 12
• Since both populations are normal
(n1 1)S12 2
independent of
(n2 1)S2 2
2
⇠ ⇥n2 1
22
12
⇠ ⇥n1 1 S12/ 12
S12/ 12 S12/S2 F = S2/ 2 = 12/ 2
– Definition depends only the parameter of interest 12/ 2
– Distribution free of unknown parameters
STAT 431
• Constructing 100(1 ↵)% CI for 12/ 2
– Formula 1 S12 1 S12
fn1 1,n2 1, /2 S2 , fn1 1,n2 1,1 /2 S2 ✓ S12 /S2
– Comes from
P 12/ 2 fn1 1,n2 1,1 ↵/2 12/ 2 fn1 1,n2 1,↵/2
◆
= 1 ↵
• TestforH0 : 12 = 2 vs.H1 : 12 6= 2
– Observed test statistic: f = s21/s2
– Decision rule for level ↵ test: reject H0 if
f < fn1 1,n2 1,1 /2 or f > fn1 1,n2 1, /2 [Confidence interval approach]
• Could also test one-sided hypotheses
STAT 431
Example: Schizophrenia Data
• Question : Are there any physiological indicators associated with schizophrenia?
– Early postmortem studies suggest that certain areas of brain may be different in persons afflicted with schizophrenia than in others
• Data collected by researchers in 1990 (Suddath et al., New England Journal of Medicine):
Volumes (in cm3) of one sub-region of the
left hippocampus on 15 pairs of monozygotic twins, one of the twins was schizophrenic and the other was not
(Image from Wikipedia)
1.0 1.5
2.0 2.5
Volumes (cm^3) of the left hippocampus of the twins
●
●
●
● ●
●
● ●
●
● ●
● ●
● ●
STAT 431
unaffected
Can the observed difference be attributed to chance?
1.0 1.5
2.0 2.5
affected
Basic Statistical Setting – Two normal populations: N (μ1 , 12 ) and N (μ2 , 2 )
• Matched pairs design
– Pairedobservations(Xi,Yi),Xi ⇠N(μ1, 12)andYi ⇠N(μ2, 2) ,with
Corr(Xi, Yi) = – The pairs (Xi , Yi ) are mutually independent
• Differences from the independent samples design
– The observations are correlated pairs
– Equal number of observations for both populations
• Primary goal: comparison of the two population means
– CI’s and tests for the difference μ1 μ2
– Usually work with the observed differences Di = Xi Yi
STAT 431
• RecallDi=Xi Yi ,then
D ̄ = n
1 Xn
D i = n ( X i Y i ) = X ̄ Y ̄
Estimation of the Difference
1 Xn
i=1
• Fact: D ̄ is a good estimator of the difference μ1 μ2
– Bias: E(D ̄) = E(X ̄) E(Y ̄) = μ1 μ2 , so unbiased
– Variance:
Var(D ̄)= 1Var(Di)= 1Var(Xi Yi) nn
= 1 ⇥Var(Xi) + Var(Yi) 2Cov(Xi, Yi)⇤ n
= 1 [ ⇥ 12 + ⇥ 2 2 2 ⇥ 1 ⇥ 2 ] = 1 ⇥ D2 nn
i=1
• If > 0, the variance is smaller than the case of independent samples design • Typically, we do not know D2 = Var(Di ) , and we estimated it by
1 Xn SD2=n 1 (Di D ̄)2
i=1 STAT 431
Pivotal Random Variable for μ1 μ2 IIIIIIIIII
• Under our setup, the differences
• So, the pivotal r.v. for μ1 μ2 is
iid 2 Di ⇠N(μ1 μ2, D)
D ̄ ( μ 1 μ 2 )
T = SD/pn ⇠ tn 1
• 100(1 ↵)% CIforμ μ : ̄ SD 1 2 D ± tn 1, /2 n
• TestforH0 :μ1 μ2 =0vs.H1 :μ1 μ2 6=0
– Observed test statistic:
t = sD/pn
d ̄
– Decision rule for level ↵ test: reject H0 if
– P-value:
STAT 431
Example: Schizophrenia Data (Cont’d)
Pair # 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
Unaffected
Affected
Difference
• Summary statistics
d ̄= 0.199, sD = 0.238
• 95% CI for the difference in hippocampus volume (cm3)
[0.667, 0.331]
• TestforH0:μ1 μ2=0vs. H1 : μ1 μ2 6= 0
– Observed test statistic t = 3.229
– t-distribution d.f. = 14
– P-value = 0.006 R routine: t.test
1.94 1.27 0.67 1.44 1.63 -0.19 1.56 1.47 0.09 1.58 1.39 0.19 2.06 1.93 0.13 1.66 1.26 0.40 1.75 1.71 0.04
1.77 1.67 0.10
1.78 1.28 0.50
1.92 1.85 0.07 1.25 1.02 0.23 1.93 1.34 0.59 2.04 2.02 0.02
1.62 1.59
2.08 1.97 0.11
0.03 • STAT 431
Comparison of The Two Designs
• Independent samples design
– Pro: Data collection relatively easy
– Con: The two samples may be quite different in some other attributes than the one of interest, which could vitiate desired conclusions (Simpson’s paradox)
• Matched pairs design
– Pro: The two samples are well matched at attributes other than the current one of interest
• E.g., monozygotic twins are well matched in other physiological aspects – Con: Matched pairs are hard to find, so sample size is typically small;
sometimes may not be representative for the general population
STAT 431
• Key points for this class
– Independent samples design
• Comparison of two population variances
– Matched samples design
• Comparison of two population means
– Pros and cons of the two different designs • Reading Sections 8.3—8.4 of the textbook
Class Summary
STAT 431