程序代写代做代考 7.3 Procedures for two independent random samples

7.3 Procedures for two independent random samples

In this section we will look at procedures for when we have two independent

random samples available, both of which can be used to estimate the value of

the same parameter in the two populations. The aim will be to construct a

confidence interval for the difference between the values of this parameter in the

populations from which the samples were drawn.

7.3.1 Confidence interval for the difference between two unknown

normal means, variances known

Let X11, . . . , X1n1 be a random sample from the N(µ1, σ
2
1) distribution where

µ1 is unknown but σ
2
1 is known. Also, let X21, . . . , X2n2 be a second random

sample, independent of the first one, from the N(µ2, σ
2
2) distribution where µ2

is unknown but σ22 is known.

Let X̄1 =
1
n1

∑n1
i=1X1i and X̄2 =

1
n2

∑n2
j=1X2j . An unbiased estimator of the

difference µ1 − µ2 is given by X̄1 − X̄2 where X̄1 ∼ N(µ1, σ21/n) independently
of X̄2 ∼ N(µ2, σ22/n). The distribution of the difference between these two
independent normal random variables is

N

(
µ1 − µ2,

σ21
n1

+
σ22
n2

)
.

Thus, in the same manner as before, we may write

P


−zα/2 ≤ (X̄1 − X̄2)− (µ1 − µ2)√

σ21
n1

+
σ22
n2

≤ zα/2


 = 1− α .

Rewriting this probability statement in terms of an inequality for µ1 − µ2, we
see that 

(X̄1 − X̄2)− zα/2

σ21
n1

+
σ22
n2

, (X̄1 − X̄2) + zα/2


σ21
n1

+
σ22
n2


is a 100(1− α)% confidence interval for µ1 − µ2.

Example 7.6. Suppose that we have a random sample of size n1 = 40 from the

N(µ1, 3
2) distribution and a second random sample, independent of the first, of

size n2 = 20 from the N(µ2, 4
2) distribution. The sample means were found to

1

be x̄1 = 14.6 for the first sample and x̄2 = 15.9 for the second sample. We want

to use these data to construct a 95% CI for µ1 − µ2. The estimated value of
µ1 − µ2 is x̄1 − x̄2 = 14.6− 15.9 = −1.3. The standard error of this difference is√

9
40

+ 16
20

= 1.012. The estimated CI is then

(−1.3− 1.96× 1.012, −1.3 + 1.96× 1.012) ,

i.e. (−3.28, 0.68). Note that this interval contains the value zero, meaning that
zero is a plausible value for µ1 − µ2. Thus it is plausible that µ1 = µ2.

Suppose now that the two populations from which we are sampling

are non-normal but the two variances are known. To construct an ap-

proximate 100(1− α)% confidence interval for µ1 − µ2 we use exactly the same
procedure described above for the case when the populations are normal. How-

ever, this time we are appealing to the central limit theorem so that the normal

critical value and the confidence level are both approximate.

7.3.2 Confidence interval for the difference between two unknown

normal means, variances unknown

Let X11, . . . , X1n1 be a random sample from the N(µ1, σ
2
1) distribution where

both µ1 and σ
2
1 are unknown. Also, letX21, . . . , X2n1 be a second random sample,

independent of the first one, from the N(µ2, σ
2
2) distribution where also both µ2

and σ22 are unknown. Let X̄1 =
1
n1

∑n1
i=1X1i and X̄2 =

1
n2

∑n2
j=1X2j . Then

X̄1 − X̄2 ∼ N
(
µ1 − µ2,

σ21
n1

+
σ22
n2

)
.

and so
(X̄1 − X̄2)− (µ1 − µ2)√

σ21
n1

+
σ22
n2

∼ N(0, 1) .

However, as we do not know the values of σ21 and σ
2
2, we plug in the unbiased

estimators S21 and S
2
2 , respectively. Using more sophisticated asymptotic theory

1

than is available to us in this course, it can be proved that this substitution leads

1For those who are interested, the relevant result is called Slutsky’s theorem.

2

to an approximate normal distribution,

(X̄1 − X̄2)− (µ1 − µ2)√
S21
n1

+
S22
n2

∼ N(0, 1) approximately ,

provided n1 ≥ 30 and n2 ≥ 30. We can then write

P


−zα/2 ≤ (X̄1 − X̄2)− (µ1 − µ2)√

S21
n1

+
S22
n2

≤ zα/2


 ≈ 1− α ,

leading to
(X̄1 − X̄2)− zα/2


S21
n1

+
S22
n2

, (X̄1 − X̄2) + zα/2


S21
n1

+
S22
n2


as an approximate 100(1− α)% confidence interval for µ1 − µ2.

Example 7.7. Suppose that we have a random sample of size n1 = 50 from

a normal distribution with mean µ1 and variance σ
2
1, both of whose values are

unknown. We also have a second random sample of size n2 = 45, independent

of the first, from a normal distribution with mean µ2 and variance σ
2
2, both

of which are also unknown. The first sample of data gave summary statistics

x̄1 = 17.5 and s
2
1 = 8.7 while the summary statistics calculated from the second

sample were found to be x̄2 = 15.6 and s
2
2 = 10.2. We want to use these data to

construct an approximate 95% CI for µ1 − µ2.
The estimated difference in means is 17.5 − 15.6 = 1.9 while the estimated

standard error of the difference in sample means is

8.7
50

+ 10.2
45

=

0.4007 =

0.6330. Our approximate 95% CI is then given by

(1.90− 1.96× 0.6330, 1.90 + 1.96× 0.6330)

i.e. (0.66, 3.14). Note that the interval does not contain zero, thus it is not

plausible that µ1 = µ2.

3

7.3.3 Confidence interval for the difference between two unknown

normal means, variances unknown but equal (i.e. σ21 = σ
2
2 = σ

2)

We can use the following sample estimator of the common variance σ2:

σ̂2 =
(n1 − 1)S21 + (n2 − 1)S

2
2

n1 + n2 − 2
,

which is easily shown to be an unbiased estimator of σ2.

We do not simply pool all the data in the two samples together and calculate

the sample variance about an overall, common sample mean. This would not

work, as the means in the two populations may be different. Instead, by using

the above estimator we calculate the total sum of squares in the two samples

about the respective sample means and then divide by n1 +n2− 2. This ensures
that E(σ̂2) = σ2, provided that E(S21) = σ

2 and E(S22) = σ
2, which is the case if

they are based on divisors n1 − 1 and n2 − 1, respectively.
It then follows that

(X̄1 − X̄2)− (µ1 − µ2)

σ̂

1
n1

+ 1
n2

∼ t(n1 + n2 − 2) .

We then have that

P


−tα/2 ≤ (X̄1 − X̄2)− (µ1 − µ2)

σ̂

1
n1

+ 1
n2

≤ tα/2


 = 1− α ,

where tα/2 is the upper α/2 point of a t distribution with n1 + n2 − 2
degrees of freedom. Hence,[

(X̄1 − X̄2)− tα/2 σ̂

1

n1
+

1

n2
, (X̄1 − X̄2) + tα/2 σ̂


1

n1
+

1

n2

]
is a 100(1− α)% confidence interval for µ1 − µ2.

Example 7.8. We will now use the above data to calculate a 95% CI for µ1−µ2,
assuming that σ21 = σ

2
2 = σ

2.

We have that σ̂2 = 49×8.7+44×10.2
50+45−2 = 9.410 so that σ̂ = 3.068. The upper

0.025 point of a t(93) distribution is t0.025 = 1.9858, thus the 95% CI is given by

4

end points

x̄1 − x̄2 ± 1.9858× σ̂

1

n1
+

1

n2

which for the above data gives(
1.9− 1.9858× 3.068×


1

50
+

1

45
, 1.9 + 1.9858× 3.068×


1

50
+

1

45

)
= (1.90− 1.9958× 0.6304, 1.90 + 1.9858× 0.6304)

= (0.648, 3.152) .

Again, the interval does not contain zero and so it is not plausible that µ1 =

µ2. This confidence interval is wider than the approximate (asymptotic) 95%

confidence interval found assuming different variances, because it accounts for

the additional uncertainty present when using an estimated value of σ2.

When we have data from two non-normal distributions, we can also

use the asymptotic procedure discussed in Section 7.3.2 to construct a CI for µ1−
µ2 provided n1, n2 ≥ 30. If the variances in the two populations are unknown,
and assumed not to be equal, then we plug in the estimates s21 and s

2
2 for the

unknown σ21 and σ
2
2, and the critical value zα/2 for an approximate 100(1−α)%

confidence interval. If the variances can be assumed to be equal in the two

populations then we still use the same standard normal critical value but estimate

the common value of σ2 by σ̂2 as described above.

7.3.4 Confidence interval for the difference between two unknown

population proportions

Let X11, . . . , X1n1 be an independent random sample of size n1 from Bi(1, p1),

with p1 unknown. Suppose that we also have a second independent random sam-

ple, X21, . . . , X2n2 , independent of the first one, from Bi(1, p2) with p2 unknown.

Let p̂1 = X̄1 and p̂2 = X̄2.

We know that, by the central limit theorem, for large n1 and n2 we have p̂1 ∼
N(p1, p1(1 − p1)/n1) and p̂2 ∼ N(p2, p2(1 − p2)/n2) approximately. Moreover,
p̂1 and p̂2 are independent. Thus

p̂1 − p̂2 ∼ N
(
p1 − p2,

p1(1− p1)
n1

+
p2(1− p2)

n2

)
approximately ,

5

or equivalently

(p̂1 − p̂2)− (p1 − p2)√
p1(1−p1)

n1
+

p2(1−p2)
n2

∼ N(0, 1) approximately .

However, the random variable above cannot be used to construct a confidence

as the denominator is unknown. More advanced asymptotic theory outside the

scope of this course can be used to prove that the approximate distribution is

unaffected if the denominator is replaced by a sample estimate, i.e.

(p̂1 − p̂2)− (p1 − p2)√
p̂1(1−p̂1)

n1
+

p̂2(1−p̂2)
n2

∼ N(0, 1) approximately .

This leads to [
(p̂1 − p̂2)− zα/2


p̂1(1− p̂1)

n1
+
p̂2(1− p̂2)

n2
,

(p̂1 − p̂2) + zα/2


p̂1(1− p̂1)

n1
+
p̂2(1− p̂2)

n2

]
.

as an approximate 100(1− α)% confidence interval for p1 − p2.

Example 7.9. Two different processes for manufacturing components are under

consideration. Components are randomly sampled from the production lines of

both processes, with the aim of identifying which process is best. Suppose that

75 of 1500 items sampled from Process 1 are defective, and 80 out of 2000 items

sampled from Process 2 are defective. Let p1 and p2 be the probability that

a randomly selected component is defective, from Process 1 and 2 respectively.

Find a 90% confidence interval for the difference p1 − p2. The point estimate is
p̂1− p̂2 = 75/1500−80/2000 = 0.05−0.04 = 0.01. The estimated standard error
of p̂1 − p̂2 is √

0.05× 0.95
1500

+
0.04× 0.96

2000
= 0.00713 .

Moreover z0.05 = 1.645, and so the 90% CI has end-points 0.01±1.645×0.00713.
Hence the 90% CI is (−0.0017, 0.0217). This interval contains 0, thus it is plau-
sible that p1 = p2. Hence there is no reason to believe that Process 2 is better

than Process 1.

6

Procedures for two independent random samples
Confidence interval for the difference between two unknown normal means, variances known
Confidence interval for the difference between two unknown normal means, variances unknown
Confidence interval for the difference between two unknown normal means, variances unknown but equal (i.e. 12=22=2)
Confidence interval for the difference between two unknown population proportions