CHAPTER 3. Theory of Hypothesis Testing
1 Introduction
1. Remark. Recall the notation used for the test procedure in MT130/MT230. We
have:
a null hypothesis H0 and an alternative hypothesis H1 (two ‘conjectures’ about the dis-
tribution of the sample variables);
a test statistic TS ( a function of the sample random variables);
a chosen level of significance (l.o.s.) α with 0 < α < 1;
a critical region R, a subset of R1, such that Pr(TS ∈ R |H0) = α;
a decision method which rejects the null hypothesis H0 at the 100α% l.o.s if the observed
value of the test statistic TS lies in the critical region R.
Associated with this test method we have:
Type I error = rejecting H0 when, in fact, H0 is true - with the probability of committing
a Type I error being simply Pr(TS ∈ R |H0) = α = the chosen l.o.s.; and,
Type II error = accepting H0 when, in fact, H1 is true - with the probability of committing
a Type II error being Pr(TS /∈ R |H1) .
Recall, also, that the only role played by the alternative hypothesis was in determining
the ‘shape’ of the critical region - a left- , right- or two-tail region/test.
In the earlier courses in statistics, the approach to a test of hypothesis has been to
conjure up a test statistic and critical region, and to rely on intuition as the motivation
and justification of the method. The student was not expected to inquire whether an
alternative choice of test statistic and critical region might, in some sense, have been
better, or whether there existed a best test procedure. We now provide a response to the
question in terms of the Neyman-Pearson theory of hypothesis testing. The criterion used
in this theory for distinguishing between two test statistics (and related critical regions) is
the size of the probability of Type II error - the test statistic having the smaller probability
of Type II error is preferred.
In the following introductory example, note that the calculation of the probabilities is
effected by integrating the joint p.d.f. of the sample variables over appropriate regions in
two dimensions.
Change of notation: f(x | θ) = f(x, θ).
2. Example. Let X1, X2 be a sample of size 2 from a population whose distribution has
the p.d.f. f(x | θ) = θxθ−1 for 0 ≤ x ≤ 1. We compare two tests of the null hypothesis H0
(θ = 2) against the alternative hypothesis H1(θ = 1), using the level of significance α:
(i) test statistic S1 = X1 +X2 and critical region R1 = {s : 0 ≤ s ≤ a}; and
(ii) test statistic S2 = max(X1, X2) and critical region R2 = {s : 0 ≤ s ≤ b}.
By the independence of the sample random variables, the joint p.d.f. of X1 and X2 is
f(x, y |θ) = θ2 xθ−1yθ−1 for 0 ≤ x ≤ 1, and 0 ≤ y ≤ 1, reducing to f(x, y | 2) = 4xy under
18
H0 and f(x, y | 1) = 1 under H1.
3. Remark. The above method of calculating the probabilities anticipates the change of
emphasis in hypothesis testing as described in the Neyman-Pearson theory. This change
involves moving away from the test statistic and one-dimensional critical region to a
critical region as a subset of n-dimensional space ( n = 2 in the example).
Writing C1 = {(x1, x2) : 0 ≤ x1 + x2 ≤ a} and C2 = {(x1, x2)) : 0 ≤ max (x1, x2) ≤ b} we
have:
S1 ∈ R1 is equivalent to (X1, X2) ∈ C1; and S2 ∈ R2 is equivalent to (X1, X2) ∈ C2
It follows that the two tests in the Example could be defined purely in terms of the critical
regions C1 and C2, with the null hypothesis rejected if the observed value of (X1, X2) lies
in the respective critical region. The terms a test and a critical region can, in this sense,
be used interchangeably. In this case, a critical region is a subset of R2. In general, when
a sample X1, X2, . . . , Xn of size n is given, a critical region (test) is a subset of Rn.
This provides a more general view of a statistical test than that provided by specifying
a test statistic and a one-dimensional critical region - however, in practice, the general
procedure usually reduces to the one-dimensional model.
4. Notation. We write x = (x1, x2, ..., xn) for a general point in n-dimensional space Rn
and, for the sample random variables X1, X2, ..., Xn, let X = (X1, X2, ..., Xn) denote the
random point defined by these variables.
If, for example, the set C = {x : Σni=1xi ≤ a} then x ∈ C simply means Σni=1xi ≤ a, and
X ∈ C signifies that Σni=1Xi ≤ a, or that X ≤ a/n.
Further, when X1, X2, ..., Xn have the p.d.f. f(x) we have
Pr(X ∈ C) = ∫ ... ∫C f(x)dx,
where dx = dx1dx2...dxn and, when X1, X2, ..., Xn have the p.f. f(x) we have
Pr(X ∈ C) =
∑
...
∑
x∈C f(x).
Although we need the general formulae for our theoretical considerations we do not need
to calculate such integrals or sums directly - fortunately, all our examples reduce to one-
dimensional integrals or sums!
2. The Neyman-Pearson theory
1. Notation.Throughout this section, let X1, X2, ..., Xn denote the sample random vari-
ables, with their joint p.f./p.d.f. f(x | θ) showing the unknown parameter θ. For most of
the applications we consider, the random variables will be independent with a common
p.f./p.d.f. f(x | θ)and, hence, f(x | θ) = f(x1 | θ)f(x2 | θ) · · · f(xn | θ). The parameter θ
will usually be identified with a point in some k-dimensional space, and we will be con-
cerned with statistical hypotheses which relate to the numerical value of the coordinates
of θ.
19
In each test problem, specified by the null and alternative hypotheses H0 and H1, we write
Θ as the set of values of θ under discussion - Θ is termed the set of admissible values
for the problem. The null and alternative hypotheses, H0 and H1, define a decomposition
of Θ = Θ0 ∪ Θ1 into two disjoint sets, where Θ0 is the set of values of θ specified
in H0 and Θ1 the values corresponding to H1 - this is shown by writing H0(θ ∈ Θ0) and
H1(θ ∈ Θ1) = H1(θ ∈ Θ\Θ0). If Θ0 = {θ0}consists of a single element we write H0(θ = θ0)
for H0(θ ∈ Θ0), and, similarly, if Θ1 = {θ1} we write H1(θ = θ1) for H1(θ ∈ Θ1).
2. Definition. A hypothesis is termed simple if it is specified by a single element of Θ,
and composite otherwise.
3. Examples (i) Let X1, X2, ..., Xn have the binomial distribution B(1, θ).
(a) For the simple null hypothesis H0(θ =
1
2
) against the simple alternative hypothesis
H1(θ =
2
3
), the above notation sets Θ = {1
2
, 2
3
}, Θ0 = {12} and Θ1 = {
2
3
}.
(b) For the simple null hypothesis H0 (θ =
1
3
) against the composite hypothesis H1(θ ≥ 23),
we write Θ = {θ : θ = 1
3
or θ ≥ 2
3
} with Θ0 = {13} and Θ1 = {θ : θ ≥
2
3
}.
(ii) Let X1, X2, ..., Xn have the normal distribution N(µ, σ
2) with µ and σ unknown. To
test the composite null hypothesis H0(µ = 0) against the composite alternative H1(µ 6= 0),
we set
Θ = {θ = (µ, σ): −∞ < µ <∞, σ > 0}, Θ0 = {θ = (0, σ): σ > 0},
Θ1 = {θ = (µ, σ) : µ 6= 0, σ > 0}.
4. Structure. (i) In the Neyman-Pearson structure, a test of a null hypothesis H0
against an alternative hypothesis H1 consists of:
sample random variables X1, X2, …, Xn with a functional form for their joint p.d.f./p.f.
f(x, θ) for θ ∈ Θ and observed sample data values x1, x2, …, xn;
a null hypothesis H0(θ ∈ Θ0) and an alternative hypothesis H1(θ ∈ Θ1) = H1(θ ∈ Θ\Θ0);
a test, that is, an n-dimensional critical region C (we identify a test by its critical region
C and speak of a test C);
the decision procedure that the null hypothesis is rejected if and only if, the observed
value (x1, x2, …, xn) ∈ C.
(ii) We say that C is a test of size α (0 ≤ α ≤ 1) if
sup
θ∈Θ0
Pr(X ∈ C|θ) = α
We say that C is a test of significance level α (0 ≤ α ≤ 1) if
sup
θ∈Θ0
Pr(X ∈ C|θ) ≤ α
(iii) Note that some authors use the terms size and level defined in (ii) interchangeably.
The distinction between these two becomes important in some models where it is compu-
tationally impossible to construct a size α test (see Note 7(i) and Example 8 (ii) below).
Note also that the set of level α tests contains the set of size α tests.
20
(iv) The power function of the test C is the function of θ defined by
β(θ) = Pr(X ∈ C|θ)
for θ ∈ Θ.
Thus the probability of Type I error is β(θ) for θ ∈ Θ0 (bounded above by α) and the
probability of Type II error is 1− β(θ) for θ ∈ Θ1.
(v) Given two tests C1 and C2 of size α, we say C1 is uniformly more powerful than
C2 if
Pr(X /∈ C1 | θ) ≤ Pr(X /∈ C2 | θ) for all θ ∈ Θ1.
Note, that if C1 is uniformly more powerful than C2, by complementation we have
βc1(θ) ≥ βc2(θ) for all θ ∈ Θ1.
(vi) If the test C is uniformly more powerful than any other test of size α then we say
that C is uniformly most powerful (UMP).
If H0 and H1 are both simple hypotheses we omit the word uniformly and describe a UMP
test simply as MP (most powerful).
(vii) We require one further piece of terminology to describe a desirable property for a
test C. It seems reasonable to suggest that the test should be less likely to reject H0 when
it is true than when it is false.
A test C with power function β(θ) is unbiased if β(θ′) ≤ β(θ′′) for every θ′ ∈ Θ0 and
θ′′ ∈ Θ1.
Clearly, if C is a test of size α, that is, supθ∈Θ0 β(θ) = α, then unbiasedness implies that
β(θ) ≥ α, for every θ ∈ Θ1,
sup
θ∈Θ0
β(θ) ≤ inf
θ∈Θ1
β(θ).
5. Introducing the Neyman-Pearson fundamental lemma.
Suppose Θ = {θ0,θ1} with H0(θ = θ0) and H1(θ = θ1) as the simple null and alternative
hypotheses respectively. In the Theorem below we consider a critical region C of the form
(*) C = {x : f(x | θ1) ≥ kf(x | θ0)} where k > 0 is a constant.
Example Suppose the sample variables X1, X2, …, Xn are independent and have the
distribution N(θ, 1).
For convenience of notation, we write the following discussion for the case when the sample
variables have a continuous distribution with joint p.d.f. f(x | θ) (all calculations given
in terms of integration translate, somewhat cumbersomely, into summation notation for
a discrete joint distribution).
21
6. Theorem. The Neyman-Pearson fundamental lemma
If there exists a positive constant k such that the region
(%) C = {x : f(x | θ1) ≥ kf(x | θ0)} satisfies Pr(X ∈ C | θ0) = α,
then C is a most powerful (MP) test of size α for the simple null hypothesis H0(θ = θ0)
against the simple alternative hypothesis H1(θ = θ1). Further, the test C is unbiased.
7. Remarks. (i) Note the conditional ‘if’ in the statement of the fundamental Lemma.
There is no guarentee that we can find C and k as required. This is particularly obvious
in the case of discrete random variables when there may be no region D of any shape with
Pr(X∈ D|θ0) = α, let alone one of the required shape specified in the lemma.
(ii) Note that the proof makes no use of the independence of the sample variables
X1, X2, …, Xn – all that is used is knowledge of the joint p.d.f./p.f.
(iii) The MP region (if it exists) is essentially unique – we can modify it only by ‘including
or deleting’ a set N with Pr(X∈ N |θi) = 0 for i = 1, 2.
(iv) One further observation follows from an examination of the proof. The requirement
that the region D provides a test of size α is not required – the proof carries through
provided D is a test at significance level α (i.e., with size α1 ≤ α). Hence, the test C may
be described as being more powerful than any test of significance level α (i.e., than any
test of size ≤ α).
(v) The result that the N-P region is a MP test of significance level α of θ0 against θ1
has a considerable intuitive appeal. Consider the ratio f(x|θ1)/f(x|θ0). Any x for which
this ratio is large provides evidence that θ1 rather than θ0 is true (this is obvious in the
discrete case). If we must choose a subset of possible observations which indicate that θ1
is the true value of the parameter, then it seems sensible to put into this subset those x’s
for which the ratio f(x|θ1)/f(x|θ0) is large – in other words to choose a subset of the form
{x : f(x|θ1) ≥ kf(x|θ0)}. The N-P analysis, based on probabilities of error, now gives us
a basis to choose k so that Pr{f(X|θ1) ≥ kf(X|θ0)|θ = θ0} = α
8. Examples. (i) Suppose the sample variables X1, X2, …, Xn are independent with
the normal distribution N(θ, 1). We apply the Neyman-Pearson lemma to test the null
hypothesis H0(θ = θ0) against the alternative hypothesis H1(θ = θ1), where θ1 > θ0.
Notes. (a) The critical region provided by the Neyman-Pearson lemma is equivalent to
the test statistic X and one-dimensional critical region {x : x ≥ k1/n} used in M130.
(b) For the null hypotheses H0(θ = θ0) against the alternative hypothesis H1(θ = θ1),
where, now, θ1 < θ0, the inequality (%) is equivalent to
∑n
i=1 xi ≤ k1 (since θ1 − θ0 < 0).
(ii) Suppose that the sample random variables X1, X2, ..., Xn are independent with the
binomial distribution B(1, θ), and consider the hypotheses H0(θ =
1
2
) and H1(θ =
3
4
).
Now, the common p.f. of the sample variables is f(x | θ) = θx(1 − θ)1−x for x = 0 or 1,
and the joint p.f. of X1, X2, ..., Xn is therefore f(x|θ) = θt(1− θ)n−t, where t = Σni=1xi.
(iii) Suppose that the sample random variables X1, X2, ..., Xn are independent with the
common exponential p.d.f. f(x, θ) = θe−θx, where x ≥ 0 and θ > 0. We seek a test of the
null hypothesis H0(θ = 1) against the alternative hypothesis H1(θ = 2).
22
9. Simple null hypothesis, composite alternative hypothesis. Suppose the set Θ of
admissible values of the parameter is written as Θ = Θ0 ∪Θ1 where Θ0 = {θ0} consists of a
single element, and Θ1 consists of more than one element. We extend the Neyman-Pearson
lemma idea to construct (where possible) a UMP test with an n-dimensional critical region
C for the simple null hypothesis H0(θ = θ0) against the composite alternative H1(θ ∈ Θ1).
A UMP test C of size α, if such a region exists, must, by definition, provide a MP test
for the simple hypothesis H0(θ = θ0) against each alternative hypothesis H1(θ = θ1) for
every choice of θ1 ∈ Θ1. This suggests that, in the search for a UMP region, we apply the
fundamental lemma to these simple hypotheses, obtaining a critical region C(θ1) which,
in principle, depends on the choice of θ1 ∈ Θ1. Should it happen that C = C(θ1) is the
same for all θ1 ∈ Θ1, then C must be a UMP test for the null hypothesis H0(θ = θ0)
against the alternative hypothesis H1(θ ∈ Θ1).
10. Example. Suppose that the sample variables X1, X2, …, Xn are independent with
the normal distribution N(θ, 1). Let Θ = {θ : θ ≥ θ0}, Θ0 = {θ0} and Θ1 = {θ : θ > θ0},
then we require a test of the null hypothesis H0(θ = θ0) against the alternative hypothesis
H1(θ > θ0).
Referring to Example 8(i) for a test of H0(θ = θ0) against H1(θ = θ1), where θ1 > θ0, the
critical region C(θ1) is given by C(θ1) = {x:
∑n
i=1 xi ≥ k1} where k1 =
log k+n(θ21−θ
2
0)/2)
θ1−θ0
is
a function of θ1 (θ0 is known!). However, using Pr(X ∈ C(θ1) | θ0) = α, the value of k1
is, apparently, obtained from 1−Φ(
√
n[k1
n
− θ0]) = α which shows that k1, and hence the
critical region C(θ1), is independent of θ1.
The critical region C = {x:
∑n
i=1 xi ≥ k1} therefore provides a UMP test of size α for
the simple null hypothesis H0(θ = θ0) against the composite alternative hypothesis H1(θ
> θ0).
We may also read-off, from Example 8(i), the power function
β(θ) = Pr(X∈ C|θ) = 1− Φ(
√
n
(
k1
n
− θ)
)
= 1− Φ(z −
√
n(θ − θ0))
where z =
√
n(k1
n
− θ0) is fixed by Φ(z) = 1− α.
Since Φ(x) increases, as x increases we see that Φ(z −
√
n(θ − θ0)) ≤ Φ(z) for θ > θ0,
and, therefore β(θ) ≥ α for all θ ∈ Θ. Further, since Φ(x) → 0 as x decreases to − ∞,
we note that:
β(θ1)→ 1 for each θ1 > θ0 as n→ +∞; and β(θ1)→ 1 as θ1 increases to +∞,
confirming that the test is unbiased, and showing that it has large power if either the
sample size is large, or if θ1 is much greater than θ0.
Similarly, the critical region {x:
∑n
i=1 xi ≤ k1} will provide a UMP test for the null
hypothesis H0(θ = θ0) against the alternative hypothesis H1(θ < θ0).
11. Remarks. (i) The procedure that is successful in Example 10, has, however, a
limited range of application. For example, suppose the sample variables X1, X2, ..., Xn are
independent with the normal distribution N(θ, 1), and we wish to test the null hypothesis
H0(θ = θ0) against the alternative hypothesis H1(θ 6= θ0).
23
Following the method of Example 10, we have for all θ1 > θ0 the common critical region
C1 with shape {x:
∑n
i=1 xi ≥ b}. Similarly for all θ1 < θ0, we obtain the common critical
region C2 with shape {x:
∑n
i=1 xi ≤ a}. It follows that there is no UMP critical region
for these hypotheses H0 and H1. For, if C is a UMP critical region, we must have
Pr(X ∈ C | θ1) ≥ Pr(X ∈ C1 | θ1) for θ1 > θ0, and,
Pr(X ∈ C | θ1) ≥ Pr(X ∈ C2 | θ1) for θ1 < θ0.
However, we have shown that C1 and C2 are each UMP for the alternatives θ > θ0 and
θ < θ0 respectively, and this implies the inequalities are reversed. These contradictions
show that a UMP critical region cannot exist for H0(θ = θ0) against H1(θ 6= θ0).
(ii) In M130 the test statistic X and the one-dimensional critical region R = {x : |x−θ0| ≥
c} was used for the two-tail test of H0(θ = θ0) against H1(θ 6= θ0). Translating R into
the n-dimensional framework we obtain the critical region C = {x: |x −θ0| ≥ c}, which
is the symmetric combination of C1 = {x: x − θ0 ≥ b} and C2 = {x: x −θ0 ≤ a} with
b = −a = c. This symmetric choice is necessary if we wish to retain the property that
the critical region is unbiased.
12. Remark. The examples in 11 provide contrasting views on the theory of hypothesis
testing. Example 11(i) shows that the criterion of seeking a uniformly most powerful test
is too demanding, while Example 11(ii) indicates that the property of unbiasedness holds
for the equi-tail region. It can be shown that the equi-tail region above is a uniformly
most powerful unbiased (UMPU) test - uniformly more powerful than any other
unbiased test at the same level of significance.
13. Composite hypotheses. We close this section by showing how the Neyman-
Pearson lemma may be applied to obtain a UMP test for certain types of composite null
and alternative hypotheses.
Suppose the decomposition Θ = Θ0 ∪ Θ1 of the set of admissible values of the parameter
produces a composite hypotheses H0(θ ∈ Θ0) and H1(θ ∈ Θ1).
For sets Θ0 of the type {θ : θ ≤ θ0} or {θ : θ ≥ θ0}, a UMP test C of H0(θ ∈ Θ0) against
H1(θ ∈ Θ1) with Pr(X ∈ C | θ0) = α may sometimes be found by the method described
in 10.
Example Suppose the sample random variables X1, X2, ..., Xn are independent with the
common p.d.f. f(x | θ) = θ exp(−θx) for x ≥ 0 and θ > 0, and that the null and alternative
hypotheses are H0(θ ≤ 1) and H1(θ ≥ 2) respectively. Choosing any values θ0 ≤ 1 and θ1
≥ 2, the inequality of the fundamental lemma applied to the null hypothesis H0(θ = θ0)
against the alternative hypothesis H1(θ = θ1) gives, with t =
∑n
i=1 xi – see Example 8(iii)
–
f(x|θ1) = θn1 e−θ1t ≥ kθn0 e−θ0t = kf(x|θ0)
producing, since θ1 − θ0 > 0, a critical region C = {x :
∑n
i=1 xi ≤ k1}.
Now we require, for the test to be at level α, sup{Pr(
∑n
i=1 Xi ≤ k1|θ) : θ ≤ 1} = α.
Writing W = 2θ
∑n
i=1 Xi, we observe that W has the chi-squared distribution χ
2(2n)
[simple rescaling of the variables in 8(iii)] and that we require sup{Pr(W ≤ 2k1θ) : θ ≤
1} = α. Now, since
24
Pr(W ≤ 2k1θ) ≤ Pr(W ≤ 2k1) for θ ≤ 1,
we obtain the required bound by choosing k1 to satisfy Pr(W ≤ 2k1) = α.
Now, it is not difficult to show that C = {x :
∑n
i=1 xi ≤ k1} is a UMP test for H0(θ ≤ 1)
against H1(θ ≥ 2).
14. Remark. Similarly, it may be shown that:
(i) when the sample variables X1, X2, …, Xn are independent with the normal distribution
N(θ, 1) the region {x : x ≥ c} is UMP for the hypotheses H0(θ ≤ θ0) and H1(θ ≥ θ1) where
θ1 ≥ θ0, and region {x : x ≤ c} is UMP for the hypotheses H0(θ ≥ θ0) and H1(θ ≤ θ1)
where θ1 ≤ θ0.
(ii) when the sample variables X1, X2, …, Xn are independent with the binomial distribu-
tion B(1, θ) the region {x : x ≥ c} is UMP for the hypotheses H0(θ ≤ θ0) and H1(θ ≥ θ1)
where 0 < θ0 ≤ θ1 ≤ 1.
3. The Likelihood Ratio Test
1. Notation. The notation of the previous section continues to apply: the sample
random variables X1, X2, ..., Xn have the joint p.f./p.d.f. f(x|θ); the set Θ of admissible
values of the parameter θ is decomposed as a disjoint union Θ0 ∪ Θ1; and, the null and
alternative hypotheses are H0(θ ∈ Θ0) and H1(θ ∈ Θ1) respectively.
2. Remarks. When both the null and alternative hypotheses are simple, the search
for a best test has been dealt with, in an unequivocal way, by the fundamental lemma.
Further, the examples in section 2 show how the idea of the lemma may be extended,
in suitable cases, to provide best tests for some composite hypothesis situations. The
likelihood ratio test (LRT) is a general method for defining a critical region should the
hypotheses be such that the considerations of the last section fail to provide a best test.
Although intuitively appealing, as a combination of the idea of the fundamental lemma
and the principle of maximum likelihood, the critical region defined by the LRT is not
supported by any general theorem outlining desirable properties of a test. However, as
our examples will show, the LRT does produce what are regarded as natural tests in the
familiar situations.
3. The likelihood ratio test (LRT). To introduce the formal definition of the LRT
recall that:
(a) the fundamental lemma rejects θ = θ0 in favour of θ = θ1 if the ratio f(x|θ0) / f(x|θ1)
is too small; and
(b) regarding the values x1, x2, ..., xn as fixed, the principle of maximum likelihood esti-
mates θ by maximising f(x|θ).
Combining these notions, the LRT procedure is to compute first, regarding the values
x1, x2, ..., xn as fixed, the maxima:
(i) M = M(x) = sup{f(x|θ) : θ ∈ Θ} - this being the general model maximum;
(ii) M0 = M0(x) = sup{f(x|θ) : θ ∈ Θ0} - this being the restricted model maximum,
restricted in the sense that H0(θ ∈ Θ0) is assumed to be true.
25
Then, define Λ(x) = M0(x)/M(x) and the critical region C as having the shape
C = {x : Λ(x) ≤ λ}.
The value λ (necessarily ≤ 1) is determined, via supθ∈Θ0Pr(Λ(X) ≤ λ|θ) = α.
Note that the intuitive base for the LRT is that, when H0 is false, the ratio Λ(x) is likely
to be small.
4. Remarks. (i) It is not difficult to see that the LRT procedure is equivalent to the
fundamental lemma when Θ = {θ0, θ1} and Θ0 = {θ0} - since, in this case,
sup {f(x|θ) : θ ∈ Θ} = f(x|θ0) or f(x|θ1) and sup {f(x|θ) : θ ∈ Θ0} = f(x |θ0),
so that Λ(x) ≤ λ ≤ 1 implies that f(x|θ0) ≤ λf(x|θ1).
(ii) Alternative descriptions of the LRT are:
(a) to express the critical region in terms of M(x)/M0(x) being ‘large’ - a trivial inversion;
(b) to use the ratio M1(x)/M0(x), where M1(x) = sup{f(x|θ) : θ ∈ Θ1}, with ‘large’
values suggesting that H0 is false.
(iii) Assuming f(x|θ) is continuous as a function of θ, the maximum of f(x|θ) for
θ ∈ Θ is just the value f(x|θ̂), where θ̂ is the maximum likelihood estimate of θ. Similarly
maximizing f(x|θ) over θ ∈ Θ0 will give the value f(x|θ̂0) where θ̂0 is the restricted (by
θ ∈ Θ0 ) maximum likelihood estimate of θ.
5. Examples Testing the normal mean. These examples illustrate the application of
the LRT to the parameters of a normal distribution. Suppose X1, X2, ..., Xn is a random
sample from the distribution N(µ, σ2), so that the joint p.d.f. is
f(x|µ , σ) = 1
(σ
√
2π)n
exp(− 1
2σ2
∑n
i=1(xi − µ)
2).
We test the null hypothesis H0(µ = µ0) against the alternative hypothesis H1(µ 6= µ0) in
the two cases, σ known and σ unknown.
(i) σ =σ0 known. Here, Θ = {θ = (µ, σ0) : −∞ < µ <∞} and Θ0 = {(µ0, σ0)}. The
numerator in Λ(x) requires no maximizing and is M0(x) = f(x |µ0, σ0). Since µ̂ = x is
the MLE of µ (see Section 2.2), the denominator is f(x|µ, σ0) − that is
M(x) = sup {f(x|µ, σ0) : −∞ < µ <∞} = 1(σ0
√
2π)n
exp
[
− 1
2σ20
∑n
i=1(xi − x)
2
]
.
Thus,
Λ(x) =
f(x |µ0, σ0)
M(x)
=
1
(σ0
√
2π)n
exp(− 1
2σ20
∑n
i=1(xi − µ0)
2)
1
(σ0
√
2π)n
exp
[
− 1
2σ20
∑n
i=1(xi − x)2
]
= exp[−
1
2σ20
{
n∑
i=1
(xi − µ0)2 −
n∑
i=1
(xi − x)2}].
Hence, Λ(x) ≤ λ is equivalent to
26
n(x− µ0)2 =
∑n
i=1(xi − µ0)
2 −
∑n
i=1(xi − x)
2 ≥ λ1 or
√
n|x− µ0|/σ0 ≥ λ2,
which reduces to the one-dimensional critical region R = {x : |x − µ0| ≥ c} and test
statistic X with
√
n|X − µ0|/σ0 being distributed N(0, 1).
(ii) σ unknown. Now we must write θ = (µ, σ), showing both µ and σ, and put
Θ = {(µ, σ) : − ∞ < µ < ∞, σ > 0} and Θ0 = {(µ0, σ) : σ > 0},
thereby expressing the null hypothesis in the form H0(θ ∈ Θ0).
From Section 2.2, the MLE of (µ,σ) is (µ̂,σ̂), where µ̂ = x and σ̂2 = 1
n
∑n
i=1(xi−x)
2. So,
the denominator in Λ(x) is M(x) = f(x |µ̂, σ̂) = 1
(σ̂
√
2π)n
exp(−n
2
).
Similarly, with µ = µ0 fixed, we maximize f(x|µ,σ) over σ to obtain M0(x) = f(x|µ0,
σ̂0), where σ̂
2
0 =
1
n
∑n
i=1(xi − µ0)
2. Thus, M0(x) =
1
(σ̂0
√
2π)n
exp(−n
2
) and
Λ(x) = (
σ̂2
σ̂20
)
n
2 =
( ∑n
i=1(xi − x)
2∑n
i=1(xi − µ0)2
)n
2
≤ λ
is equivalent to
√
n|x− µ0|/s ≥ λ1 where s2 = 1n−1
∑n
i=1(xi − x)
2.
(Note that (a
2
b2
)n/2 ≤ λ is equivalent to
√
b2−a2
a2
≥
√
λ−2/n − 1 and now make the obvious
substitutions, using n(x− µ0)2 =
∑n
i=1(xi − µ0)
2 −
∑n
i=1(xi − x)
2.)
Thus the LRT procedure, reduced to the one-dimensional critical region and test statistic,
recovers the test structure used in the basic t-test.
27