程序代写代做代考 10 Hypothesis testing (Part 3)

10 Hypothesis testing (Part 3)

Procedures for two independent samples

10.1 Introduction

In this chapter we will extend hypothesis testing to the scenario in which there

are two independent samples of data, and the aim is to make an inference about

the difference in the means of the two populations from which the data have

been sampled.

To this end, let X11, . . . , X1n1 be a random sample of size n1 from a dis-

tribution with mean µ1 and variance σ
2
1. Also, let X21, . . . , X2n2 be a second

random sample, independent from the first, from a distribution with mean µ2

and variance σ22. Suppose that we wish to test

H0 : µ1 − µ2 = φ,

where φ is a constant (often φ = 0), versus one of the following alternative

hypotheses at the 100α% significance level:

(i) H1 : µ1 − µ2 > φ (one-sided)

(ii) H1 : µ1 − µ2 < φ (one-sided) (iii) H1 : µ1 − µ2 6= φ (two-sided) 10.2 Both underlying distributions normal with known vari- ances σ21 and σ 2 2 An unbiased estimator of µ1 − µ2 = φ is given by X̄1 − X̄2 where X̄k = 1 nk nk∑ i=1 Xki , k = 1, 2 . This estimator satisfies Var ( X̄1 − X̄2 ) = σ21 n1 + σ22 n2 . 1 We have seen in Chapter 4 that both X̄1 and X̄2 are normally distributed so their difference will also be normal. In fact X̄1 − X̄2 ∼ N ( µ1 − µ2, σ21 n1 + σ22 n2 ) , and, when H0 is true, µ1 − µ2 = φ. For a test statistic we will use the standardized distance between the sample estimate of φ and its hypothesized value, i.e. Z = X̄1 − X̄2 − φ√ σ21 n1 + σ22 n2 . Under H0, Z ∼ N(0, 1). We again find the critical value of our test by fixing the probability of a type I error to be α, i.e. P(reject H0 | H0 is true) = α. This idea was described in detail for single sample inference in Chapter 9. Below we list the rejection regions corresponding to the three possible alternative hypotheses introduced in Section 10.1. (i) For H1 : µ1 − µ2 > φ, we reject H0 at the 100α% significance level if
Z > zα, where zα satisfies Φ(zα) = 1− α. Equivalently, we reject H0 if

X̄1 − X̄2 > φ+ zα

√
σ21
n1

+
σ22
n2

E.g. if α = 0.05 then z0.05 = 1.645.

(ii) For H1 : µ1 − µ2 < φ, we reject H0 at the 100α% significance level if Z < −zα. Equivalently, we reject H0 if X̄1 − X̄2 < φ− zα √ σ21 n1 + σ22 n2 . E.g. if α = 0.05 then −z0.05 = −1.645. (iii) For H1 : µ1 − µ2 6= φ, we reject H0 at the 100α% significance level if |Z| > zα/2. Equivalently, we reject H0 if

|(X̄1 − X̄2)− φ| > zα/2

√
σ21
n1

+
σ22
n2

E.g. if α = 0.05 then z0.025 = 1.96.

10.3 Both distributions normal with unknown variances

10.3.1 Unequal variances (i.e. σ21 6= σ
2
2)

As the true values of σ21 and σ
2
2 are unknown, we estimate them using the sample

variances given by

S2k =
1

nk − 1

nk∑
i=1

(Xki − X̄k)2, k = 1, 2 .

Considering the estimated standardized difference between X̄1 − X̄2 and φ we
have that, under H0,

Y =
X̄1 − X̄2 − φ√

S21
n1

+
S22
n2

∼ N(0, 1) approximately

when n1 and n2 are large, e.g. n1 > 30 and n2 > 30. To achieve an approxi-

mate significance level of 100α%, the rejection regions for the three alternative

hypotheses introduced in Section 10.1 are:

(i) For H1 : µ1 − µ2 > φ, reject H0 if Y > zα

(ii) For H1 : µ1 − µ2 < φ, reject H0 if Y < −zα (iii) For H1 : µ1 − µ2 6= φ, reject H0 if |Y | > zα/2

10.3.2 Equal variances (i.e. σ21 = σ
2
2 = σ

If we are prepared to assume that the unknown variances of the two normal

distributions are equal, i.e. σ21 = σ
2
2 = σ

2, then the common variance σ2 may be

estimated using the estimator described in Chapter 7, i.e.

σ̂2 =
(n1 − 1)S21 + (n2 − 1)S

2
2

n1 + n2 − 2
.

The test statistic is then

T =
X̄1 − X̄2 − φ

σ̂
√

1
n1

+ 1
n2

which can be shown to have a Student t-distribution with (n1 + n2 − 2) degrees
of freedom when H0 is true.

The rejection regions for the three alternative hypotheses in Section 9.1 are:

(i) For H1 : µ1−µ2 > φ, we reject H0 if T > tα, where tα is the upper α point
of a t distribution on n1 + n2 − 2 degrees of freedom.

(ii) For H1 : µ1 − µ2 < φ, we reject H0 if T < −tα. (iii) For H1 : µ1 − µ2 6= φ, we reject H0 if |T | > tα/2.

Each rejection region above defines a test with an exact significance level of

100α%.

Example 10.1. An investigation was carried out comparing a new drug with

a placebo. A random sample of n1 = 40 patients was treated with the new

drug, while an independent sample of n2 = 36 patients was given the placebo.

A response was measured for each patient. Under the new drug, the response

had sample mean x̄1 = 10.13 and sample variance s
2
1 = 4.721. Under placebo,

the response had sample mean x̄2 = 12.16 and sample variance s
2
2 = 3.368.

Supposing that the responses in both groups are normally distributed, test

at the 5% significance level whether the population mean response under the

new drug is the same as that under placebo. Conduct your analysis assuming

that (i) σ21 6= σ
2
2 and (ii) σ

2
1 = σ

2
2.

10.4 Both distributions non-normal with variances σ21 and σ
2
2

If both distributions are non-normal then we can appeal to the central limit

theorem. Provided n1 > 30 and n2 > 30, under H0

Y =
X̄1 − X̄2 − φ√

σ21
n1

+
σ22
n2

∼ N(0, 1) approximately .

Below we give a rejection region resulting in an approximate significance level of

100α% for each of the three alternative hypotheses listed in Section 10.1:

(i) For H1 : µ1−µ2 > φ, we reject H0 at the approximate 100α% significance
level if Y > zα.

(ii) For H1 : µ1−µ2 < φ, we reject H0 at the approximate 100α% significance level if Y < −zα. (iii) For H1 : µ1−µ2 6= φ, we reject H0 at the approximate 100α% significance level if |Y | > zα/2.

If the variances of the two distributions are unknown then we substitute the

sample estimators S21 and S
2
2 and proceed as just described for the case of known

variances.

10.5 Bernoulli distributions Bi(1, p1) and Bi(1, p2)

This time we have two independent samples of binary data with E(X1i) = p1,

i = 1, . . . , n1, and E(X2i) = p2, i = 1, . . . , n2. We want to test the null hypothesis

H0 : p1 − p2 = φ,

where φ is a constant (often set equal to zero) against one of the three alternative

hypotheses given by

(i) H1 : p1 − p2 > φ (one-sided)

(ii) H1 : p1 − p2 < φ (one-sided) (iii) H1 : p1 − p2 6= φ (two-sided) at the approximate 100α% significance level. Here we are making an inference about the difference in the proportions of ‘successes’ in the two underlying pop- ulations. When n1 and n2 are both large we have that p̂1 − p̂2 ∼ N ( p1 − p2, p1(1− p1) n1 + p2(1− p2) n2 ) approximately , and an appropriate test statistic is Y = p̂1 − p̂2 − φ√ p̂1(1−p̂1) n1 + p̂2(1−p̂2) n2 , 6 where in the denominator the following sample estimate of the standard error of p̂1 − p̂2 has been used: ŝ. e.(p̂1 − p̂2) = √ p̂1(1− p̂1) n1 + p̂2(1− p̂2) n2 . Provided n1 and n2 are both reasonably large, under H0 the test statistic Y ∼ N(0, 1) approximately by asymptotic results. Note that p̂k = 1 nk nk∑ i=1 Xki = X̄k , k = 1, 2 , which can be expressed as p̂k = rk nk , k = 1, 2 , where rk = ∑nk i=1Xki denotes the number of successes observed in sample k, k = 1, 2. The rejection regions for the three alternative hypotheses given above, using an approximate significance level of 100α%, are: (i) For H1 : p1 − p2 > φ, we reject H0 at the approximate 100α% significance
level if Y > zα

(ii) For H1 : p1 − p2 < φ, we reject H0 at the approximate 100α% significance level if Y < −zα (iii) For H1 : p1 − p2 6= φ, we reject H0 at the approximate 100α% significance level if |Y | > zα/2

The case H0 : p1 = p2

If φ = 0, then under H0 we have p1 = p2 = p, say. An estimate of the common

probability p is given by the ‘pooled estimate’

p̄ =
r1 + r2
n1 + n2

In this case it makes sense to use the estimate p̄ when forming the estimated

standard error of p̂1− p̂2 that appears in the denominator of Y . The revised test

statistic for the case when H0 : p1 = p2 is thus

Y =
p̂1 − p̂2√

p̄(1−p̄)
n1

+
p̄(1−p̄)
n2

The rejection regions are otherwise unchanged.

Example 10.2. In a random sample of n1 = 120 voters from Town I, r1 = 56

indicated that they would support Labour in a general election. In a second

independent random sample of size n2 = 110 from Town II, taken on the same

day as the sample from Town I, r2 = 63 indicated that they would support

Labour in a general election. Carry out an appropriate test at the approximate

5% significance level to examine whether the proportions of voters supporting

Labour are the same in the two towns.

END OF COURSE NOTES

Hypothesis testing (Part 3) Procedures for two independent samples
Introduction
Both underlying distributions normal with known variances 12 and 22
Both distributions normal with unknown variances
Unequal variances (i.e. 12 =22)
Equal variances (i.e. 12 = 22=2)

Both distributions non-normal with variances 12 and 22
Bernoulli distributions Bi(1, p1) and Bi(1, p2)

Related Posts