University of California, Los Angeles Department of Statistics
Statistics 100B Instructor: Nicolas Christou
Method of maximum likelihood
Suppose x1, x2, · · · , xn is a random sample of size n from a distribution that has parameter θ. The joint probability density of these n random variables is
f(x1,x2,···,xn;θ)
We also refer to this function as the likelihood function and it is denoted with L. In this function the parameter θ is unknown and it will be estimated with the method of maximum likelihood. In principle, the method of maximum likelihood consists of selecting the value of θ that maximizes the likelihood function (the value of θ that makes the observed data more likely).
Since x1, x2, · · · , xn are independent the likelihood function can be expressed as the product of the marginal densities:
L = f(x1, x2, · · · , xn; θ) = f(x1; θ) × f(x2; θ) × · · · × f(xn; θ)
We will maximize this function w.r.t. θ. It is often easier to maximize the log likelihood function w.r.t. θ. Therefore, we will take the derivative of the log likelihood function w.r.t. θ, set it equal to zero and solve for θ. The result will be denoted with θˆ and we refer to it as the mle of the parameter θ.
Example:
Let X1, X2, · · · , Xn be a random sample of size n from an exponential distribution with parameter λ. Find the mle of λ.
1
Example:
Let X1,X2,···,Xn be a random sample of size n from a normal distribution with mean μ and variance σ2. Find the mle of μ and σ2.
2
Method of maximum likelihood – An empirical investigation
We will estimate the parameter λ of the exponential distribution with the method of maximum likelihood. Let X ∼ exp(2) (see figure below).
X~exp(2)
0 2 4 6 8 10
x
Let’s pretend that λ is unknown. From this distribution we will select a random sample of size n = 100 (see observations on the next page). This sample gave 100 x = 49.86463 and
i=1 i
sample mean x ̄ = 0.4986463. Therefore, the method of maximum likelihood estimate of λ is:
λˆ = 1 = 1 = 2.005429. x ̄ 0.4986463
For different values of the parameter λ we compute the log-likelihood function as follows:
100
ln(L) = nln(λ) − λ xi i=1
These calculations are shown on the next page. We then plot the values of the log likelihood function against λ and we observe that the maximum occurs at the value of λˆ = 2.005429 that was computed above.
3
f(x)
0.0 0.5 1.0 1.5 2.0
Observations of a random sample of size n = 100 from exponential distribution with λ = 2:
[1] 1.695824351 0.066702402 0.674994950 0.736106579 1.161993229
[6] 0.296223724 0.043937990 0.508988160 0.294233621 0.024084168 [11] 0.150176375 0.396972182 0.095883055 0.387135421 0.248432954 [16] 0.661809923 0.142542189 0.171455182 1.212420122 0.180640216 [21] 0.009212488 0.160395423 0.188922063 0.884223028 0.240872947 [26] 0.033885428 0.080997465 0.318024634 0.410324188 0.502538879 [31] 0.422821270 0.329996007 0.446404769 0.522652992 0.154471200 [36] 0.064116746 0.268321347 0.263458486 0.581443048 1.031375370 [41] 0.203961618 2.562959307 0.073292671 1.025867874 0.173630370 [46] 0.263878938 0.171617840 0.028656404 1.961520632 0.242559879 [51] 0.491987590 0.410541936 0.500918018 0.322782228 1.497851781 [56] 0.157720428 0.629583415 0.652147642 0.135310800 1.936474929 [61] 0.181363227 0.227498170 1.490756486 0.334677184 0.368089615 [66] 0.272378459 0.525470783 0.476837360 0.224213297 0.171204443 [71] 0.119797853 0.716180556 0.111337474 0.376437023 0.588020059 [76] 0.156395280 0.135622347 0.067554610 1.745086826 1.661906995 [81] 0.023611775 0.080141754 0.089054515 0.004390821 1.183269692 [86] 0.199572674 1.043889988 1.136122111 0.545845778 0.234890293 [91] 0.558763671 0.196966494 0.692430989 0.342892071 0.369322342 [96] 0.671608332 0.254633346 0.076204614 0.157962865 2.543944322
Values of the
lambda [1,] 2.00543 [2,] 0.10000 [3,] 0.20000 [4,] 0.30000 [5,] 0.40000 [6,] 0.50000 [7,] 0.60000 [8,] 0.70000 [9,] 0.80000 [10,] 0.90000 [11,] 1.00000 [12,] 1.10000 [13,] 1.20000 [14,] 1.30000 [15,] 1.40000 [16,] 1.50000 [17,] 1.60000 [18,] 1.70000 [19,] 1.80000 [20,] 1.90000 [21,] 2.00000 [22,] 2.10000 [23,] 2.20000 [24,] 2.30000 [25,] 2.40000 [26,] 2.50000 [27,] 2.60000 [28,] 2.70000 [29,] 2.80000 [30,] 2.90000 [31,] 3.00000 [32,] 3.10000 [33,] 3.20000 [34,] 3.30000 [35,] 3.40000 [36,] 3.50000 [37,] 3.60000 [38,] 3.70000 [39,] 3.80000 [40,] 3.90000 [41,] 4.00000 [42,] 4.10000
log likelihood function for different λ:
lnL -30.41417 -235.24497 -170.91672 -135.35667 -111.57492 -94.24703 -81.00134 -70.57273 -62.20606 -55.41421 -49.86463 -45.32007 -41.60539 -38.58759 -36.16325 -34.25043 -32.78304 -31.70704 -30.97766 -30.55740 -30.41453 -30.52198 -30.85644 -31.39773 -32.12823 -33.03249 -34.09688 -35.30931 -36.65901 -38.13634 -39.73265 -41.44013 -43.25172 -45.16102 -47.16218 -49.24989 -51.41927 -53.66583 -55.98547 -58.37438 -60.82907 -63.34627
4
Plot of the log likelihood function against λ:
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●● ●●
● ●●
● ●●
● ●
●
●
●
●
●
●
●
● ●●
●
●
●
01234
λ
5
Log likelihood
−200 −150 −100 −50
Properties of estimators, method of maximum likelihood – Examples Example 1:
Let X follow the uniform distribution on the interval (0, θ). Find the mle of θ. Example 2:
Letf(x;θ)=θxθ−1, 0
Example 6:
In a random sample of 100 men 25 are Democtrats, and in a random sample of 100 women 30 are Democrats. The two samples are independent. Assume that pM is the true proportion of Democrats among all men, and pW is the true proportion of Democrats among alll women. Suppose that pM = pw = p. Find the mle of the common proportion p.
Problem 7
Let X1, X2, · · · , Xn be an i.i.d. random sample from N(μ, σ).
a. Which of the following estimates is unbiased? Show all your work.
n (X −X ̄)2 n (X −X ̄)2 σˆ2= i=1 i , S2= i=1 i
n n−1
b. Which of the estimates of part (a) has the smaller MSE?
Problem 8
Let X1,X2,···,Xn be an i.i.d. random sample from a normal population with mean zero and unknown variance σ2.
a. Find the maximum likelihood estimate of σ2.
b. Show that the estimate of part (a) is unbiased estimator of σ2.
c. Find the variance of the estimate of part (a). Is it consistent?
d. Show that the variance of the estimate of part (a) is equal to the Cramer-Rao lower bound.
6
Problem 9
Let X1, X2, · · · , Xn denote an i.i.d. random sample from the exponential distribution with mean 1.
λ
a. Derive the maximum likelihood estimate of λ.
b. Find the Cramer-Rao lower bound of the estimator of λ.
c. What is the asymptotic distribution of λˆ?
Problem 10
Let X1, X2, · · · , Xn be independent and identically distributed random variables from a Pois- son distribution with parameter λ. We know that the maximum likelihood estimate of λ is λˆ = x ̄ .
a. Find the variance of λˆ. b. Is λˆ an MVUE?
c. Is λˆ a consistent estimator of λ?
Problem 11
Suppose that two independent random samples of n1 and n2 observations are selected from two normal populations. Further, assume that the populations possess a common variance σ2 which is unknown. Let the sample variances be S12 and S2 for which E(S12) = σ2 and E(S2) = σ2.
a. Show that the pooled estimator of σ2 that we derived in class below is unbiased. S2 = (n1 −1)S12 +(n2 −1)S2
n1 +n2 −2 b. Find the variance of S2.
Problem 12
In a basket there are green and white marbles. You randomly select marbles with replacement until you see a green marble. You found the first green marble on the 10th trial. Then, your friend does the same. He randomly selects marbles until he obtains a green marble. His green marble was seen on the 15th trial. Use the method of maximum likelihood to find an estimate of p, the proportion of green marbles in the basket.
7
Theorem
Asymptotic efficiency of maximum likelihood estimates.
Why do maximum likelihood estimates have an asymptotic normal distribution? Let X1, X2, . . . , Xn
be i.i.d. random variables from a probability density function f(x|θ). Then if θˆ is the MLE
of θ the theorem states that θˆ ∼ N(θ, 1 ). nI(θ)
Proof
We will use Taylor series. This says that for a function h h(y) ≈ h(y0) + h′(y0)(y − y0).
Start with the likelihood function L = Πni=1f(xi|θ). Then the log-likelihood is n
ln(L) = lnf(xi|θ). i=1
Now obtain the derivative w.r.t. θ. ∂ ln(L) = n ∂ lnf(xi|θ).
∂θ i=1 ∂θ ˆˆ
Now letting θ be the MLE of θ we write this as a Taylor series about that θ:
n∂ n∂ n∂2 ˆˆˆ
∂θlnf(xi|θ)≈ ∂θlnf(xi|θ)+
i=1 √ i=1
∂θ2lnf(xi|θ) (θ−θ)
1 n ∂2
i=1
Now divide left and right by n to get:
1 n ∂ 1 n ∂ ˆˆˆ
√n i=1 ∂θlnf(xi|θ) ≈ √n i=1 ∂θlnf(xi|θ) + √n i=1 ∂θ2 lnf(xi|θ) (θ − θ)
ˆ Note: The first term on the right hand side is zero (because this is what we do to find θ).
Therefore, we have reduced the relationship to the following: 1 n ∂ 1 n ∂2
ˆˆ √n i=1 ∂θlnf(xi|θ) ≈ √n i=1 ∂θ2 lnf(xi|θ) (θ − θ)
Examine the left hand side: This involves the sum of n independent, identically distributed things (Central Limit theorem). Each one of these “things” has mean zero and variance I(θ). Therefore the left hand side follows approximately N(0,I(θ)). Why?
Therefore, the limiting distribution of the right hand side must also be N(0,I(θ), i.e. 1 n ∂2
ˆˆ
√ni=1 ∂θ2lnf(xi|θ) (θ−θ)∼N(0,I(θ)).
Or write it as (watch the n′s and the minus sign!): 1 n ∂ 2 √
ˆˆ
−ni=1 ∂θ2lnf(xi|θ) n(θ−θ)∼N(0,I(θ)).
The expression in the bracket converges to I(θ) (law of large numbers) and therefore we can express the previous expression as
I(θ)√n(θˆ−θ) ∼ N(0,I(θ)), or
θˆ∼N(θ, 1 ). nI(θ)
8