1
Comparison of two means
Most studies are comparative in that they compare outcomes from one group
with outcomes from another, for example the mean blood pressure in reponse to two
different treatments.
1 Paired t-test – Deveaux et al Chapter 25
• in the matched-pairs design each subject in one group is paired with a similar
subject in the other group
• one treatment is randomly assigned to one member of the pair – the other treat-
ment is given to the other
• example: to compare two treatments for a disease, pair subjects who are simi-
larly affected, same sex, age, etc.
• in many cases, the two treatments are given to the same subject in random order
– with a ‘wash-out’ period between.
• eg. population of patients with high blood pressure currently receiving no med-
ication. Blood pressure is measured on a sample of patients at beginning of
study, then they are put on a statin. After one year on the treatment, the blood
pressure is measured again. Does the statin reduce blood pressure?
• the difference between the two measurements in a pair should only reflect the
different
treatments or experimental conditions
• Individuals act as their own controls, so that the between individual source of
variability is removed. When comparing means, the usual approach is to take
the difference of means. The direction of the difference is important. For exam-
ple, if the difference is (after-before), the natural alternative hypothesis in the
statin example is that the mean difference is less than 0.
• recall that the mean of the differences equals the difference of the means
• if we can assume the differences are normally distributed, they can be analyzed
using the one-sample t test or confidence interval
2
Example: Suppose we are comparing costs of auto repairs at two locations. We
get an estimate at both places for the 6 same cars that have recently been involved
in collisions:
Car Cost at garage 1 Cost at garage 2 Difference
1 760 730 30
2 1020 910 110
3 950 840 110
4 130 150 –20
5 300 270 30
6 630 580 50
Is there evidence that mean costs are different at the two locations?
• repair costs vary considerably between cars
• for each car, the first location tends to be more expensive than the second
• Hypotheses:
H0 : µd = 0
Ha : µd 6= 0
• for the column of differences,
n = 6, df = 5, ȳ = 51.667, s = 50.761
• so test statistic is
t =
ȳ − µ0
s/
√
n
=
51.667− 0
50.761/
√
6
= 2.493.
• 2.493 is between 2.015 and 2.571, so
P (T > 2.493) is between .025 and .05
• P -value is between 2(0.025) and 2(0.05), that is, between 0.05 and 0.10
(double because two-sided alternative)
• there is only weak evidence of a difference in costs
3
Example: Ten patients were randomly selected to take part in a nutritional pro-
gram designed to lower blood cholesterol. Two months following the
commencement of the program, the pediatrician measured the blood cholesterol lev-
els of the 10 patients again. The results are as follows:
Patient Before After Difference
1 210 212 -2
2 217 210 7
3 208 210 -2
4 215 213 2
5 202 200 2
6 209 208 1
7 207 203 4
8 200 199 1
9 221 218 3
10 218 214 4
Construct a 95% confidence interval for the mean improvement in serum chole-
strol.
• a plot of the data shows little apparent
difference between before and after
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
before/after
y
0.5 1.0 1.5 2.0 2.5
2
0
0
2
0
5
2
1
0
2
1
5
2
2
0
4
• once the paired points are joined, however, it becomes clear that most values
are lower after the nutritional program
• the mean and standard deviation of the
differences are ȳ = 2.0 and s = 2.749
• with 9 df, the table value is t∗ = 2.262
• the 95% confidence interval is
2.0± 2.262(2.749)/
√
10
or
2.0± 1.966
or
.034, 3.966
• because this interval does not include 0, the difference is significantly different
from 0 at the α = .05 level
• this may seem surprising considering the
individual 95% confidence intervals,
for before: (205.74, 215.66),
for after: (204.24, 213.16)
• the paired analysis removes the variation due to the subject, so the difference
has a small standard error
5
2 Comparison of means of two independent samples – pooled t procedure
(DVB Ch. 24, p 668-671)
• matching is not always possible
• however, can divide individuals at random into the two groups to be compared
– give one group one treatment and the other group the other
• or can take random sample from each of two populations
• because of randomization, groups should be similar in all respects apart from
treatment
• any differences are attributable to the
treatment
• unlike in the matched pairs
experiment, the two samples are independent
Notation
Population Sample
Group Mean SD Size Mean SD
1 µ1 σ n1 ȳ1 s1
2 µ2 σ n2 ȳ2 s2
• important assumption: the population standard deviations are the same in
the two groups. (more on this later)
• call this common SD σ
• want to make confidence intervals for µ1 − µ2 or test hypotheses about µ1 − µ2
• idea: base inferences on ȳ1 − ȳ2
– center for confidence interval
– numerator of test statistic
Theory
• mean of ȳ1 − ȳ2 is µ1 − µ2
• SD of ȳ1 − ȳ2 is σ
√
1
n1
+ 1
n2
6
• standardized difference is
z =
(ȳ1 − ȳ2)− (µ1 − µ2)
σ
√
1
n1
+ 1
n2
• problem: as before, don’t know σ
• use pooled sample variance
s2p =
(n1 − 1)s21 + (n2 − 1)s22
n1 + n2 − 2
• a weighted average of the two sample
variances
• larger sample has larger weight
• now follow usual steps (but with slight changes)
• replace σ by sp
• replace normal distribution by t distribution with n1 + n2 − 2 d.f.
• confidence interval for µ1 − µ2 is
(ȳ1 − ȳ2)± t∗n1+n2−2sp
√√√√ 1
n1
+
1
n2
• test of H0 : µ1 = µ2 uses
t =
ȳ1 − ȳ2
sp
√
1
n1
+ 1
n2
7
Example: Nine observations of surface-soil pH were made at each of two different
locations.
Location 1 8.53 8.52 8.01 7.99 7.93
7.89 7.85 7.82 7.80
Location 2 7.85 7.73 7.58 7.40 7.35
7.30 7.27 7.27 7.23
Construct a 99% confidence interval for the difference in mean suface-soil pH at
the two locations, using the following summaries.
n ȳ s
Location 1 9 8.038 .285
Location 2 9 7.442 .224
Minitab
• the pooled two-sample confidence interval and test can be done in Minitab
• output follows for the pH example
MTB > set c1
DATA> 8.53 8.52 8.01 7.99 7.93 7.89 7.85 7.82 7.80
DATA> set c2
DATA> 7.85 7.73 7.58 7.40 7.35 7.30 7.27 7.27 7.23
DATA> set c3
DATA> (1)9
DATA> set c4
DATA> (2)9
DATA> mplot c1 c3 c2 c4
–
–
8.50+ 2
–
–
–
–
8.00+ 2
– 2
– 3 B
– B
– B
7.50+
– B
– 4
– B
–
—-+———+———+———+———+———+–
1.00 1.20 1.40 1.60 1.80 2.00
A = C1 vs. C3 B = C2 vs. C4
MTB > twosample .99 c1 c2;
SUBC> pooled.
TWOSAMPLE T FOR C1 VS C2
N MEAN STDEV SE MEAN
C1 9 8.038 0.285 0.095
C2 9 7.442 0.224 0.075
99 PCT CI FOR MU C1 – MU C2: (0.242, 0.949)
TTEST MU C1 = MU C2 (VS NE): T= 4.92 P=0.0002 DF=16
POOLED STDEV = 0.257
8
• the plot shows
– the values are higher at location 1
– the spread of the values is nearly the same at the two locations
– there is no strong evidence that the
values are not from normal population
• note the subcommand ‘pooled” must be used
• the subcommand ‘alternative” can be used to specify a one-sided alternative
9
Example: To assess whether the level of iron in the blood is the same for children with cystic
fibrosis as for healthy children, a random sample is selected from each population. The n1 = 9 healthy
children have average serum iron level ȳ = 18.9µmol/l and standard deviation s1 = 5.9µmol/l. The
n2 = 13 children with cystic fibrosis have average iron level ȳ = 11.9µmol/l with sample standard
deviation s2 = 6.3µmol/l. Is there a true difference in
population means?
• the hypotheses are, H0 : µ1 = µ2 and
Ha : µ1 6= µ2
• pooling seems appropriate here
s2p =
(9− 1)5.92 + (13− 1)6.32
9 + 13− 2
=
754.76
20
= 37.738
so sp = 6.1431
• the test statistic is
t =
ȳ1 − ȳ2
sp
√
1
n1
+ 1
n2
=
18.9− 11.9
6.1431(.4336)
=
7
2.6636
= 2.6280
• now, with 20 degrees of freedom,
P (t > 2.528) = .01 and
P (t > 2.845) = .005),
so .005 < P (t > 2.628) < .01
• doubling, because the alternative is two-sided, the P value is between .01 and .02
• there is strong evidence against the null hypothesis of no diffence in iron level.
10
Two independent samples vs.paired
• can be difficult to tell whether data should be treated as paired or not
• if the two samples are of different sizes, the data cannot be paired
• if the two samples are the same size, the data might be paired, but might not be
• to decide, read the description of the data
• key words for paired problem: paired, matched, before/after
• conclusions can be totally wrong, if wrong analysis is used
• typically, using a two-sample procedure when a paired procedure should be used leads to
– a larger P value
– a wider confidence interval because the pooled variance estimate is much larger than the
variances of the differences