ANLY-601 Spring 2018
Assignment 5 — REVISED
Due Thursday, April 5 , 2018 — in class
You may use your class notes, the text, or any calculus books — please use no other references
(including internet or other statistics texts). If you use Mathematica to derive results, you must
include the notebook with your solution so I can see what you did.
1. Estimating the parameter of a binomial distribution
I give you a coin and ask you to estimate the probability of a toss resulting in heads. You
decide to construct the estimate by tossing the coin N times and counting the number of
heads n.
(a) The probability of recording n heads from N tosses is given by the binomial distribution
p(n|N,α) =
(
N
n
)
αn (1− α)(N−n) =
N !
n! (N − n)!
αn (1− α)(N−n) (1)
where α is the probability of heads coming up on any particular toss – it is the quantity
we want to estimate.
Derive the maximum likelihood estimate of α, i.e. the value of α that maximizes the log
probability (the log of Eqn. (1)) of throwing the coin N times and observing n heads.
(Notice that the data for this experiment is the single number n.)
(b) Suppose now you decide to construct a MAP estimate of α. Show that the beta distri-
bution
p(α) =
1
B(a, b)
α(a−1) (1− α)b−1 with 1 ≤ a, b ≤ ∞ (2)
is a conjugate prior1 for the likelihood function in (1) and derive the MAP estimate
of α . (The normalization factor B(a, b) is the Euler beta function – but you don’t need
knowledge of this function to complete the problem.)
(c) Since there’s no particular reason to believe that the coin I hand you is grossly unfair,
you opt to pick a and b so that the prior p(α) is maximum at 1/2 and symmetric about
that value. To achieve this, you set a = b. For this choice, it’s clear that with a = b = 1
the prior is flat. For larger values of a = b, the prior distribution gets progressively more
peaked up about α = 1/2; the variance of the prior is var(α) = 1/(4 (2 a + 1)). The
prior distribution is plotted below for several choices of a = b.
1Recall from your notes that for a conjugate prior p(α), the posterior density p(α|D) ∝ p(D|α) p(α) is of the same
algebraic form (here a beta distribution) as the prior density p(α).
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
6
7
8
alpha
p
(a
lp
h
a
)
Use the formula for the MAP estimate of α that you derived in part (b), together with
the above information about the beta distribution to discuss how the value of a (with
b = a) in the prior distribution effects the MAP estimate of α relative to the maximum
likelihood estimate of α. That is, discuss how the change in the shape of the prior
distribution with increasing a is reflected in the MAP estimate.
2. Fitting Constrained Gaussian Models
If you’re asked to fit a Gaussian distribution to a set of m, n−dimensional data points
D = (x(1), x(2), . . . , x(m)), you know that the maximum likelihood estimate of the mean and
covariance for the model are
µ̂ =
1
m
m∑
i=1
x(i)
and
Σ̂ =
1
m
m∑
i=1
(x(i) − µ̂) (x(i) − µ̂)T
respectively.
Suppose now that you wish to construct a model covariance matrix is constrained to be a
(positive) constant times the identity matrix
Σ =
σ2 0 0 . . .
0 σ2 0 . . .
0 0 σ2 . . .
…
. (3)
(a) Write down the log-likelihood L of the set of n−dimensional data vectors D under this
model.
(b) Derive the maximum likelihood estimate of Σ (that is, for σ2).
2
(c) Suppose instead that you wish to construct a model in which the covariance matrix is
diagonal, but with different (but positive) values along the main diagonal
Σ =
λ1 0 0 . . .
0 λ2 0 . . .
0 0 λ3 . . .
…
. (4)
Derive the maximum liklihood estimate for Σ in this case (that is, for λ1, λ2, . . . , λn).
3. Error, Bias, Variance and Paintball
You’re making final selections for members for your paintball team in preparation for the
national championship. You have your final two contestants fire six shots into a target with
the results below:
(A) (B)
Which contestant do you choose for your team? Suppose the guns have adjustable sights,
does that change your decision? Discuss in statistical terms.
3