Deriving Binomial MLE and Beta-Binomial Posterior.
Deriving Binomial MLE and Beta-Binomial Posterior.
17 Aug 2021
MLE
f(y|p) =
(
n
y
)
py(1− p)n−y
L = log f(y|p) = log
(
n
y
)
+ y log p + n− y log(1− p)
Differentiating with respect to p:
dL
dp
=
y
p
−
n− y
1− p
= 0
multiplying both sides by p (providing 0 < p < 1): (1− p)y − (n− y)p = y − py − np + py = y − np = 0. Therefore p̂ = y n = ∑ xi n = x̄, where xi are the individual binary outcomes. Differentiating for the second time: d2L dp2 = − y p2 − n− y (1− p)2 = − ( y p2 + n− y (1− p)2 ) < 0 if 0 < y < n. ** Another way is to look at the values of the likelihood at the endpoints and the candidate point:** f(y|p = 0) = ( n y ) 0y(1− 0)n−y = 0 for y > 0
f(y|p = 1) =
(
n
y
)
1y(1− 1)n−y = 0 for y < n
f(y|p = p̂) =
(
n
y
)
p̂y(1− p̂)n−y > 0 for 0 < y < n
Thus, the maximum is reached at p̂.
Note, that the classical maximum likelihood estimation breaks down when y = 0 or y = n. A number
of “corrections” exist in the literature. Most of them add small numbers to both, the numerator and the
denominator of the y/n ratio to ensure that the estimated proportion p̂ is such that 0 < p̂ < 1.
1
Bayesian way
Let y|p ∼ Bin(n, p). Then the likelihood is
f(y|p) =
(
n
y
)
py(1− p)n−y ∝ py(1− p)n−y.
(Remember, to make use of proportionality, we will not be using multiplicative terms not containing the
parameter of interest).
Let p ∼ B(a, b) with the p.d.f.:
f(p) =
1
B(a, b)
pa−1(1− p)b−1 ∝ pa−1(1− p)b−1 for p ∈ (0, 1).
Note, that when a = b = 1, this becomes:
f(p) =
1
B(1, 1)
p1−1(1− p)1−1 = 1 for p ∈ (0, 1).
In other words, a uniform distribution.
We can then use Bayes’ formula to derive the posterior distribution of the parameter p given data y:
f(p|y) ∝ f(y|p)f(p) ∝ py(1− p)n−ypa−1(1− p)b−1
= pa+y−1(1− p)b+n−y−1.
Notice, that the derived equation is a product of powers of p and (1− p) respectively, which is a hallmark of
the Beta density. Thus, we deduce that
p|y ∼ B(a + y, b + n− y).
So, if we observe the outcomes of our binomial trials one at a time, we will just keep adding ‘successes’ to the
first parameter, and ‘failures’ to the second one.
2
MLE
Bayesian way