8 Hypothesis testing (Part I)
8.1 Introduction
As we have discussed earlier in the module, one of the main aims of a statistical
analysis is to make inferences about the unknown values of population parame-
ters based on a sample of data from the population. We previously considered
both point and interval estimation of such parameters. Here we instead explore
how to test hypotheses about the values of parameters.
A statistical hypothesis is a conjecture or proposition regarding the dis-
tribution of one or more random variables. In order to specify a statistical
hypothesis we need to specify the family of the underlying distribution (e.g.
normal, Poisson, or binomial) as well as the set of possible values of any param-
eters. A simple hypothesis specifies the distribution and the parameter values
uniquely. In contrast, a composite hypothesis specifies several different possibil-
ities for the distribution, most commonly corresponding to different possibilities
for the parameter values.
An example of a simple hypothesis is ‘the data arise from N(5, 12)’. An
example of a composite hypothesis is ‘the data arise from N(µ, 12), with µ > 5’.
The elements of a statistical test:
(i) The null hypothesis, denoted by H0, is the hypothesis to be tested.
This is usually a ‘conservative’ or ‘skeptical’ hypothesis that we believe by
default unless there is significant evidence to the contrary.
(ii) The alternative hypothesis, denoted by H1, is a hypothesis about the
population parameters which we will accept if there is evidence that H0
should be rejected.
For example, when assessing a new medical treatment it is common for the
null hypothesis to correspond to the statement that the new treatment is
no better (or worse) than the old one. The alternative hypothesis would
be that the new treatment is better.
In this module the null hypothesis will always be simple, while the alterna-
tive hypothesis may either be simple or composite. For example, consider
the following hypotheses about the value of the mean µ of a normal distri-
bution with known variance σ2:
1
• H0: µ = µ0, where µ0 is a specific numerical value, is a simple null
hypothesis.
• H1: µ = µ1 (with µ1 6= µ0) is a simple alternative hypothesis.
• H1: µ > µ0 is a one-sided composite alternative hypothesis.
• H1: µ < µ0 is a one-sided composite alternative hypothesis. • H1: µ 6= µ0 is a two-sided composite alternative hypothesis. How do we use the sample data to decide between H0 and H1? (iii) Test statistic. This is a function of the sample data whose value we will use to decide whether or not to reject H0 in favour of H1. Clearly, the test statistic will be a random variable. (iv) Acceptance and rejection regions. We consider the set of all possible values that the test statistic may take, i.e. the range space of the statistic, and we examine the distribution of the test statistic under the assumption that H0 is true. The range space is then divided into two disjoint subsets called the acceptance region and rejection region. On observing data, if the calculated value of the test statistic falls into the rejection region then we reject H0 in favour of H1. If the value of the test statistic falls in the acceptance region then we do not reject H0. The rejection region is usually defined to be a set of extreme values of the test statistic which together have low probability of occuring if H0 is true. Thus, if we observe such a value then this is taken as evidence that H0 is in fact false. (v) Type I and type II errors. The procedure described in (iv) above can lead to two types of possible errors: (a) Type I error - this occurs if we reject H0 when it is in fact true. (b) Type II error - this occurs if we fail to reject H0 when it is in fact false. The probability of making a type I error is denoted by α and is also called the significance level or size of the test. The value of α is usually specified in advance; the rejection region is chosen in order to achieve this value. A common choice is α = 0.05. Note that α = P(reject H0 |H0). 2 The probability of making a type II error is β = P(do not reject H0 |H1). For a good testing procedure, β should be small for all values of the pa- rameter included in H1. Example 8.1. Is a die biased or not? It is claimed that a particular die used in a game is biased in favour of the six. To test this claim the die is rolled 60 times, and each time it is recorded whether or not a six is obtained. At the end of the experiment the total number of sixes is counted, and this information is used to decide whether or not the die is biased. The null hypothesis to be tested is that the die is fair, i.e. P(rolling a six) = 1/6. The alternative hypothesis is that the die is biased in favour of the six so that P(rolling a six) > 1/6. Let the probability of rolling a six be denoted by p.
We can write the above hypotheses as:
H0 : p = 1/6
H1 : p > 1/6 .
Let X denote the number of sixes thrown in 60 attempts. If H0 is true then
X ∼ Bi(60, 1/6), whereas if H1 is true then X ∼ Bi(60, p), with p > 1/6. H0 is
a simple hypothesis, whereas H1 is a composite hypothesis.
If H0 were true, we would expect to see 10 sixes, since E(X) = 10 under H0.
However, the actual number observed will vary randomly around this value. If
we observe a large number of sixes, then this will constitute evidence against H0
in favour of H1. The question is, how large does the number of sixes need to be
so that we should reject H0 in favour of H1?
The test statistic here is x and the rejection region is
{x : x > k} ,
for some k ∈ N. Above, we choose the smallest value of k that ensures a signifi-
cance level α < 0.05, i.e. the smallest k such that
α = P(X > k |H0) < 0.05 .
Note that for k = 14, P(X > k |H0) = 0.0648, while for k = 15, P(X > k |H0) =
0.0338. Thus we select k = 15. In this case, the actual significance level of the
3
test is 0.0338.
When, as in this case, the test statistic is a discrete random variable, for many
choices of significance level there is no corresponding rejection region achieving
that significance level exactly (e.g. α = 0.05 above).
In summary, under H0 the probability of observing more than 15 sixes in 60
rolls is 0.0338. This event is sufficiently unlikely under H0 that if it occurs then
we reject H0 in favour of H1. It is possible that by rejecting H0 we may make
a type I error, with probability 0.0338 if H0 is true. If 15 or fewer sixes are
obtained, then this is within the acceptable bounds of random variation under
H0. Thus, in this case we would not reject the null hypothesis that the die is
unbiased. However in making this decision we may be making a type II error, if
H1 is in fact true.
Probability of correctly rejecting H0 when it is false
The probability of correctly rejecting H0 when it is false satisfies
P(reject H0 | p) = 1− P(type II error) .
Ideally we would like the probability on the left to be high. It is straightfor-
ward to evaluate this probability for particular values of p > 1/6. Specifically,
P(reject H0 | p) = P(X > 15 | p), where X ∼ Bi(60, p). For example, the follow-
ing values have been computed using R:
p P(reject H0 | p)
0.2 0.1306
0.25 0.4312
0.3 0.7562
Clearly, the larger the true value of p, the more likely we are to correctly
reject H0.
4
Hypothesis testing (Part I)
Introduction