6 Likelihood for discrete data
6.1 The likelihood function
The parameter estimators we have considered so far have mostly been motivated
by intuition. For example, the sample mean X̄ is an intuitive estimator of the
population mean. However in many situations, it is not obvious how to define
an appropriate estimator for the parameter(s) of interest.
One method for deriving an estimator, which works for almost any parameter
of interest, is the method of maximum likelihood. The estimators derived
in this way typically have good properties. The method revolves around the
likelihood function, which is of great importance throughout Statistics. The
likelihood function is used extensively in estimation and also hypothesis testing,
which we discuss in a later chapter.
LetX1, . . . , Xn be an i.i.d. random sample from the discrete distribution with
p.m.f. p(x | θ), where θ is a parameter whose value is unknown. Given observed
data values x1, . . . , xn from this model, the likelihood function is defined as
L(θ) = P(X1 = x1, X2 = x2, . . . , Xn = xn | θ) .
In other words,
the likelihood is the joint probability of the observed data
considered as a function of the unknown parameter θ.
By independence, we can rewrite the likelihood as follows:
L(θ) = p(x1 | θ)× · · · × p(xn | θ) .
Example 6.1. Let x1, . . . , xn be a sample obtained from the Poisson(λ) distri-
bution with p.m.f.
p(x |λ) =
λxe−λ
x!
, x = 0, 1, 2, . . .
The likelihood function for this sample is given by:
L(λ) =
n∏
i=1
p(xi |λ) =
n∏
i=1
λxie−λ
xi!
=
e−nλλ
∑n
i=1 xi∏n
i=1 xi!
, for λ > 0 .
1
6.2 Maximum likelihood estimation
In the discrete case, given sample data x1, . . . , xn the maximum likelihood
estimate for θ is the value θ̂ that maximizes the joint probability of the observed
data, i.e. that maximizes the value of the likelihood function L(θ).
Maximization of L(θ) =
∏n
i=1 p(xi | θ) leads to a numerical value θ̂ for the
estimate of θ. The value of θ̂ depends on the observed sample values x1, . . . , xn,
i.e. θ̂ is a function of the data,
θ̂ = h(x1, . . . , xn) .
We can also consider θ̂ as a function of the random sample, X1, . . . , Xn,
θ̂ = h(X1, . . . , Xn) ,
in which case θ̂ is a random variable called the maximum likelihood estima-
tor. The maximum likelihood estimator possesses its own sampling distribution,
which will be studied in later Statistics modules.
In simple cases, the maximum likelihood estimate can be found by standard
calculus techniques, i.e. by solving
dL(θ)
dθ
= 0 . (1)
However, it is usually much easier algebraically to find the maximum of the
log-likelihood l(θ) = logL(θ) because for i.i.d. data,
logL(θ) = log
[
n∏
i=1
p(xi | θ)
]
=
n∑
i=1
log p(xi | θ) .
Hence, the log likelihood is additive as opposed to the likelihood which is multi-
plicative. This is advantageous because it is far easier to differentiate a sum of
functions than to differentiate a product of functions.
To find the value of θ that maximizes l(θ) we instead find θ̂ that solves:
dl(θ)
dθ
=
n∑
i=1
d log p(xi | θ)
dθ
= 0 . (2)
The solution is a maximum if
d2l(θ)
dθ2
< 0 at θ = θ̂. The estimate found by 2 this method, i.e. by maximizing the log-likelihood, is identical to the one found by maximizing the likelihood directly, because the logarithm is a monotonically increasing function. Example 6.2. Let X1, . . . , Xn be a random sample from the Poisson(λ) distri- bution. Find the maximum likelihood estimator of λ. We have seen that L(λ) = e−nλλ ∑n i=1Xi∏n i=1Xi! , so that l(λ) = −nλ+ ( n∑ i=1 Xi ) log λ− log ( n∏ i=1 Xi! ) . Solving dl(λ) dλ = 0, we obtain dl dλ ∣∣∣∣ λ=λ̂ = −n+ ∑n i=1Xi λ̂ = 0 , which implies that λ̂ = X̄ . Checking the second derivatives, we see that d2l dλ2 ∣∣∣∣ λ=λ̂ = − ∑n i=1Xi λ̂2 = −n X̄ < 0 . Therefore, λ̂ = X is indeed the maximum likelihood estimator of λ. If we have a set of data x1, . . . , xn then the maximum likelihood estimate of λ is λ̂ = x̄, the sample mean. This is an intuitively sensible estimate, as the mean of the Poisson(λ) distribution is equal to λ. Example 6.3. Let X1, . . . , Xn be a random sample from a Bi(1, p) distribution. Find the maximum likelihood estimator of p. In this example then the likelihood function is L(p) = n∏ i=1 pXi(1− p)1−Xi = p ∑n i=1Xi(1− p)(n− ∑n i=1Xi) , so that the log-likelihood is given by l(p) = n∑ i=1 Xi log p+ ( n− n∑ i=1 Xi ) log(1− p) . 3 Solving dl dp ∣∣∣ p=p̂ = 0, we obtain dl dp ∣∣∣∣ p=p̂ = ∑n i=1Xi p̂ − (n− ∑n i=1Xi) 1− p̂ = 0 , Hence, multiplying all sides by p̂(1− p̂), n∑ i=1 Xi − p̂ n∑ i=1 Xi − p̂n+ p̂ n∑ i=1 Xi = 0 , and so n∑ i=1 Xi = np̂ . Thus, the maximum likelihood estimator of p is p̂ = ∑n i=1Xi n = X̄, i.e. the sample proportion. We have previously seen that this is unbiased for p. Note that it is worth checking the second derivative at p = p̂, d2l dp2 ∣∣∣∣ p=p̂ = − ∑n i=1Xi p̂2 − (n− ∑n i=1Xi) (1− p̂)2 = − n p̂ − n (1− p̂) = − n p̂(1− p̂) , which is negative, and so p̂ = X̄ does indeed maximize the likelihood. 6.3 Poisson likelihood examples In this section we will look at two examples of the Poisson likelihood function. The first example is based on some simulated Poisson data while the second uses data on the numbers of hourly births over a 24 hour period in an Australian hospital. The R function written and used to compute the Poisson likelihood and log-likelihood functions is as follows: pois.lik <- function(x, lmin, lmax){ nl <- 1000 n <- length(x) lval <- numeric(nl) 4 pl <- numeric(nl) lpl <- numeric(nl) lval <- seq(from=lmin, to=lmax, length.out=nl) for(k in 1:nl){ pl[k] <- prod(dpois(x,lambda=lval[k])) lpl[k] <- sum(log(dpois(x,lambda=lval[k]))) } pl.res <- data.frame(lval, pl, lpl) return(pl.res) } The data are in the argument x while the minimum and maximum λ values to be considered are passed to the function in the arguments lmin and lmax. The function returns a data frame called pl.res comprising three columns. The first contains the sequence of λ values used, the second contains the corre- sponding likelihood values and the third the corresponding log-likelihood values. Example 6.4. (Simulated data). The data in this example are a random sample of n = 30 simulated from the Po(λ = 10) distribution. The data are simulated via: > xp <- rpois(n=30, lambda=10) The following code produces the likelihood and log-likelihood functions for these data: > pl.res4 <- pois.lik(xp, lmin=7, lmax=13) > names(pl.res4)
[1] “lval” “pl” “lpl”
This can be plotted as follows:
> plot(pl.res4$lval, pl.res4$pl, type=”l”,
xlab=”lambda”, ylab=”L(lambda)”,
main=”Poisson likelihood, simulated data”)
> plot(pl.res4$lval, pl.res4$lpl, type=”l”,
xlab=”lambda”, ylab=”l(lambda)”,
main=”Poisson log-likelihood, simulated data”)
This gives the following plots.
5
7 8 9 10 11 12 13
0e
+0
0
1e
-3
3
2e
-3
3
3e
-3
3
4e
-3
3
5e
-3
3
Poisson likelihood, simulated data
lambda
L(
la
m
bd
a)
7 8 9 10 11 12 13
-9
0
-8
5
-8
0
-7
5
Poisson log-likelihood, simulated data
lambda
l(l
am
bd
a)
Figure: likelihood (left) and log-likelihood (right) functions
for the simulated Poisson data (n = 30, λ = 10)
The maximum likelihood estimate can be computed approximately via direct
numerical maximization of the likelihood or log-likelihood:
> lopt1 <- pl.res4$lval[which.max(pl.res4$pl)] > lopt1
[1] 10.23123
> lopt2 <- pl.res4$lval[which.max(pl.res4$lpl)] > lopt2
[1] 10.23123
The maximum likelihood estimate of λ from the two plots is calculated to be
10.23. We know that the maximum likelihood estimate can be determined ana-
lytically as the sample mean which is equal to 10.23.
> mean(xp)
[1] 10.23333
The reason for the slight discrepancy between the two results is the discretization
error arising from the use of a discrete set of λ values in the first method.
Please note that if you run the above code yourself, you will get slightly
different results because you will have sampled a different set of data using the
function rpois.
Example 6.5. (Australian birth data). The data give the number of births per
hour over a 24-hour period on the 18 December 1997 at the Mater Mother’s
6
Hospital in Brisbane, Australia. There were a total of n = 44 births. At the
time, this was a record number of births in one 24-hour period in this hospital.
We denote the number of births in the ith hour by Xi and fit the model
Xi ∼ Po(λ) , i = 1, . . . , n ,
with the Xi assumed to be independent. The data can be read in to R as follows:
> birth <- read.table(file="birth_freq.txt", header=T) > names(birth)
[1] “hour” “number”
> birth
hour number
1 1 1
2 2 3
3 3 1
4 4 0
5 5 4
6 6 0
7 7 0
8 8 2
9 9 2
10 10 1
11 11 3
12 12 1
13 13 2
14 14 1
15 15 4
16 16 1
17 17 2
18 18 1
19 19 3
20 20 4
21 21 3
22 22 2
23 23 1
7
24 24 2
The code to produce the likelihood plots is as follows:
> pl.res.birth <- pois.lik(birth$number, lmin=0, lmax=4) > plot(pl.res.birth$lval, pl.res.birth$pl, type=”l”,
xlab=”lambda”, ylab=”L(lambda)”, main=”Poisson likelihood
function for Australian birth data”)
> plot(pl.res.birth$lval, pl.res.birth$lpl, type=”l”,
xlab=”lambda”, ylab=”l(lambda)”, main=”Poisson log-likelihood
function for Australian birth data” )
This results in the following figures:
0 1 2 3 4
0.
0e
+0
0
5.
0e
-1
8
1.
0e
-1
7
1.
5e
-1
7
2.
0e
-1
7
2.
5e
-1
7
Poisson likelihood function for Australian birth data
lambda
L(
la
m
bd
a
0 1 2 3 4
-2
50
-2
00
-1
50
-1
00
-5
0
Poisson log-likelihood function for Australian birth data
lambda
l(l
am
bd
a
Figure: the likelihood (left) and log-likelihood (right) functions for the
Australian births data (n = 44)
The maximum likelihood estimate is 1.83, which can be found by direct
numerical maximization of the likelihood or log-likelihood function.
> lopt1 <- pl.res.birth$lval[which.max(pl.res.birth$pl)] > lopt1
[1] 1.833834
> lopt2 <- pl.res.birth$lval[which.max(pl.res.birth$lpl)] > lopt2
[1] 1.833834
The result can be compared back to the sample mean, x̄, which gives the same
result up to discretization error.
8
> mean(birth$number)
[1] 1.833333
9
Likelihood for discrete data
The likelihood function
Maximum likelihood estimation
Poisson likelihood examples