程序代写代做代考 6 Likelihood for discrete data

6 Likelihood for discrete data

6.1 The likelihood function

The parameter estimators we have considered so far have mostly been motivated

by intuition. For example, the sample mean X̄ is an intuitive estimator of the

population mean. However in many situations, it is not obvious how to define

an appropriate estimator for the parameter(s) of interest.

One method for deriving an estimator, which works for almost any parameter

of interest, is the method of maximum likelihood. The estimators derived

in this way typically have good properties. The method revolves around the

likelihood function, which is of great importance throughout Statistics. The

likelihood function is used extensively in estimation and also hypothesis testing,

which we discuss in a later chapter.

LetX1, . . . , Xn be an i.i.d. random sample from the discrete distribution with

p.m.f. p(x | θ), where θ is a parameter whose value is unknown. Given observed
data values x1, . . . , xn from this model, the likelihood function is defined as

L(θ) = P(X1 = x1, X2 = x2, . . . , Xn = xn | θ) .

In other words,

the likelihood is the joint probability of the observed data

considered as a function of the unknown parameter θ.

By independence, we can rewrite the likelihood as follows:

L(θ) = p(x1 | θ)× · · · × p(xn | θ) .

Example 6.1. Let x1, . . . , xn be a sample obtained from the Poisson(λ) distri-

bution with p.m.f.

p(x |λ) =
λxe−λ

x!
, x = 0, 1, 2, . . .

The likelihood function for this sample is given by:

L(λ) =

n∏
i=1

p(xi |λ) =
n∏
i=1

λxie−λ

xi!
=
e−nλλ

∑n
i=1 xi∏n

i=1 xi!
, for λ > 0 .

1

6.2 Maximum likelihood estimation

In the discrete case, given sample data x1, . . . , xn the maximum likelihood

estimate for θ is the value θ̂ that maximizes the joint probability of the observed

data, i.e. that maximizes the value of the likelihood function L(θ).

Maximization of L(θ) =
∏n
i=1 p(xi | θ) leads to a numerical value θ̂ for the

estimate of θ. The value of θ̂ depends on the observed sample values x1, . . . , xn,

i.e. θ̂ is a function of the data,

θ̂ = h(x1, . . . , xn) .

We can also consider θ̂ as a function of the random sample, X1, . . . , Xn,

θ̂ = h(X1, . . . , Xn) ,

in which case θ̂ is a random variable called the maximum likelihood estima-

tor. The maximum likelihood estimator possesses its own sampling distribution,

which will be studied in later Statistics modules.

In simple cases, the maximum likelihood estimate can be found by standard

calculus techniques, i.e. by solving

dL(θ)


= 0 . (1)

However, it is usually much easier algebraically to find the maximum of the

log-likelihood l(θ) = logL(θ) because for i.i.d. data,

logL(θ) = log

[
n∏
i=1

p(xi | θ)

]
=

n∑
i=1

log p(xi | θ) .

Hence, the log likelihood is additive as opposed to the likelihood which is multi-

plicative. This is advantageous because it is far easier to differentiate a sum of

functions than to differentiate a product of functions.

To find the value of θ that maximizes l(θ) we instead find θ̂ that solves:

dl(θ)


=

n∑
i=1

d log p(xi | θ)

= 0 . (2)

The solution is a maximum if
d2l(θ)
dθ2

< 0 at θ = θ̂. The estimate found by 2 this method, i.e. by maximizing the log-likelihood, is identical to the one found by maximizing the likelihood directly, because the logarithm is a monotonically increasing function. Example 6.2. Let X1, . . . , Xn be a random sample from the Poisson(λ) distri- bution. Find the maximum likelihood estimator of λ. We have seen that L(λ) = e−nλλ ∑n i=1Xi∏n i=1Xi! , so that l(λ) = −nλ+ ( n∑ i=1 Xi ) log λ− log ( n∏ i=1 Xi! ) . Solving dl(λ) dλ = 0, we obtain dl dλ ∣∣∣∣ λ=λ̂ = −n+ ∑n i=1Xi λ̂ = 0 , which implies that λ̂ = X̄ . Checking the second derivatives, we see that d2l dλ2 ∣∣∣∣ λ=λ̂ = − ∑n i=1Xi λ̂2 = −n X̄ < 0 . Therefore, λ̂ = X is indeed the maximum likelihood estimator of λ. If we have a set of data x1, . . . , xn then the maximum likelihood estimate of λ is λ̂ = x̄, the sample mean. This is an intuitively sensible estimate, as the mean of the Poisson(λ) distribution is equal to λ. Example 6.3. Let X1, . . . , Xn be a random sample from a Bi(1, p) distribution. Find the maximum likelihood estimator of p. In this example then the likelihood function is L(p) = n∏ i=1 pXi(1− p)1−Xi = p ∑n i=1Xi(1− p)(n− ∑n i=1Xi) , so that the log-likelihood is given by l(p) = n∑ i=1 Xi log p+ ( n− n∑ i=1 Xi ) log(1− p) . 3 Solving dl dp ∣∣∣ p=p̂ = 0, we obtain dl dp ∣∣∣∣ p=p̂ = ∑n i=1Xi p̂ − (n− ∑n i=1Xi) 1− p̂ = 0 , Hence, multiplying all sides by p̂(1− p̂), n∑ i=1 Xi − p̂ n∑ i=1 Xi − p̂n+ p̂ n∑ i=1 Xi = 0 , and so n∑ i=1 Xi = np̂ . Thus, the maximum likelihood estimator of p is p̂ = ∑n i=1Xi n = X̄, i.e. the sample proportion. We have previously seen that this is unbiased for p. Note that it is worth checking the second derivative at p = p̂, d2l dp2 ∣∣∣∣ p=p̂ = − ∑n i=1Xi p̂2 − (n− ∑n i=1Xi) (1− p̂)2 = − n p̂ − n (1− p̂) = − n p̂(1− p̂) , which is negative, and so p̂ = X̄ does indeed maximize the likelihood. 6.3 Poisson likelihood examples In this section we will look at two examples of the Poisson likelihood function. The first example is based on some simulated Poisson data while the second uses data on the numbers of hourly births over a 24 hour period in an Australian hospital. The R function written and used to compute the Poisson likelihood and log-likelihood functions is as follows: pois.lik <- function(x, lmin, lmax){ nl <- 1000 n <- length(x) lval <- numeric(nl) 4 pl <- numeric(nl) lpl <- numeric(nl) lval <- seq(from=lmin, to=lmax, length.out=nl) for(k in 1:nl){ pl[k] <- prod(dpois(x,lambda=lval[k])) lpl[k] <- sum(log(dpois(x,lambda=lval[k]))) } pl.res <- data.frame(lval, pl, lpl) return(pl.res) } The data are in the argument x while the minimum and maximum λ values to be considered are passed to the function in the arguments lmin and lmax. The function returns a data frame called pl.res comprising three columns. The first contains the sequence of λ values used, the second contains the corre- sponding likelihood values and the third the corresponding log-likelihood values. Example 6.4. (Simulated data). The data in this example are a random sample of n = 30 simulated from the Po(λ = 10) distribution. The data are simulated via: > xp <- rpois(n=30, lambda=10) The following code produces the likelihood and log-likelihood functions for these data: > pl.res4 <- pois.lik(xp, lmin=7, lmax=13) > names(pl.res4)

[1] “lval” “pl” “lpl”

This can be plotted as follows:

> plot(pl.res4$lval, pl.res4$pl, type=”l”,

xlab=”lambda”, ylab=”L(lambda)”,

main=”Poisson likelihood, simulated data”)

> plot(pl.res4$lval, pl.res4$lpl, type=”l”,

xlab=”lambda”, ylab=”l(lambda)”,

main=”Poisson log-likelihood, simulated data”)

This gives the following plots.

5

7 8 9 10 11 12 13

0e
+0
0

1e
-3
3

2e
-3
3

3e
-3
3

4e
-3
3

5e
-3
3

Poisson likelihood, simulated data

lambda

L(
la
m
bd
a)

7 8 9 10 11 12 13

-9
0

-8
5

-8
0

-7
5

Poisson log-likelihood, simulated data

lambda

l(l
am
bd
a)

Figure: likelihood (left) and log-likelihood (right) functions

for the simulated Poisson data (n = 30, λ = 10)

The maximum likelihood estimate can be computed approximately via direct

numerical maximization of the likelihood or log-likelihood:

> lopt1 <- pl.res4$lval[which.max(pl.res4$pl)] > lopt1

[1] 10.23123

> lopt2 <- pl.res4$lval[which.max(pl.res4$lpl)] > lopt2

[1] 10.23123

The maximum likelihood estimate of λ from the two plots is calculated to be

10.23. We know that the maximum likelihood estimate can be determined ana-

lytically as the sample mean which is equal to 10.23.

> mean(xp)

[1] 10.23333

The reason for the slight discrepancy between the two results is the discretization

error arising from the use of a discrete set of λ values in the first method.

Please note that if you run the above code yourself, you will get slightly

different results because you will have sampled a different set of data using the

function rpois.

Example 6.5. (Australian birth data). The data give the number of births per

hour over a 24-hour period on the 18 December 1997 at the Mater Mother’s

6

Hospital in Brisbane, Australia. There were a total of n = 44 births. At the

time, this was a record number of births in one 24-hour period in this hospital.

We denote the number of births in the ith hour by Xi and fit the model

Xi ∼ Po(λ) , i = 1, . . . , n ,

with the Xi assumed to be independent. The data can be read in to R as follows:

> birth <- read.table(file="birth_freq.txt", header=T) > names(birth)

[1] “hour” “number”

> birth

hour number

1 1 1

2 2 3

3 3 1

4 4 0

5 5 4

6 6 0

7 7 0

8 8 2

9 9 2

10 10 1

11 11 3

12 12 1

13 13 2

14 14 1

15 15 4

16 16 1

17 17 2

18 18 1

19 19 3

20 20 4

21 21 3

22 22 2

23 23 1

7

24 24 2

The code to produce the likelihood plots is as follows:

> pl.res.birth <- pois.lik(birth$number, lmin=0, lmax=4) > plot(pl.res.birth$lval, pl.res.birth$pl, type=”l”,

xlab=”lambda”, ylab=”L(lambda)”, main=”Poisson likelihood

function for Australian birth data”)

> plot(pl.res.birth$lval, pl.res.birth$lpl, type=”l”,

xlab=”lambda”, ylab=”l(lambda)”, main=”Poisson log-likelihood

function for Australian birth data” )

This results in the following figures:

0 1 2 3 4

0.
0e
+0
0

5.
0e
-1
8

1.
0e
-1
7

1.
5e
-1
7

2.
0e
-1
7

2.
5e
-1
7

Poisson likelihood function for Australian birth data

lambda

L(
la
m
bd
a

0 1 2 3 4

-2
50

-2
00

-1
50

-1
00

-5
0

Poisson log-likelihood function for Australian birth data

lambda

l(l
am
bd
a

Figure: the likelihood (left) and log-likelihood (right) functions for the

Australian births data (n = 44)

The maximum likelihood estimate is 1.83, which can be found by direct

numerical maximization of the likelihood or log-likelihood function.

> lopt1 <- pl.res.birth$lval[which.max(pl.res.birth$pl)] > lopt1

[1] 1.833834

> lopt2 <- pl.res.birth$lval[which.max(pl.res.birth$lpl)] > lopt2

[1] 1.833834

The result can be compared back to the sample mean, x̄, which gives the same

result up to discretization error.

8

> mean(birth$number)

[1] 1.833333

9

Likelihood for discrete data
The likelihood function
Maximum likelihood estimation
Poisson likelihood examples