ANLY-601 Spring 2018
Assignment 5
Due Thursday, April 5 , 2018 — in class
You may use your class notes, the text, or any calculus books — please use no other references (including internet or other statistics texts). If you use Mathematica to derive results, you must include the notebook with your solution so I can see what you did.
1. Estimating the parameter of a binomial distribution
I give you a coin and ask you to estimate the probability of a toss resulting in heads. You decide to construct the estimate by tossing the coin N times and counting the number of heads n.
(a) The probability of recording n heads from N tosses is given by the binomial distribution N N!
p(n|N,α) = n αn (1−α)(N−n) = n!(N−n)!αn (1−α)(N−n) (1)
where α is the probability of heads coming up on any particular toss – it is the quantity we want to estimate.
Derive the maximum likelihood estimate of α, i.e. the value of α that maximizes the log probability (the log of Eqn. (1)) of throwing the coin N times and observing n heads. (Notice that the data for this experiment is the single number n.)
(b) Suppose now you decide to construct a MAP estimate of α. Show that the beta distri-
bution
p(α) = 1 α(a−1)(1−α)b−1 with 1≤a,b≤∞ (2) B(a,b)
is a conjugate prior1 for the likelihood function in (1) and derive the MAP estimate of α . (The normalization factor B(a, b) is the Euler beta function – but you don’t need knowledge of this function to complete the problem.)
(c) Since there’s no particular reason to believe that the coin I hand you is grossly unfair, you opt to pick a and b so that the prior p(α) is maximum at 1/2 and symmetric about that value. To achieve this, you set a = b. For this choice, it’s clear that with a = b = 1 the prior is flat. For larger values of a = b, the prior distribution gets progressively more peaked up about α = 1/2; the variance of the prior is var(α) = 1/(4 (2 a + 1)). The prior distribution is plotted below for several choices of a = b.
1Recall from your notes that for a conjugate prior p(α), the posterior density p(α|D) ∝ p(D|α) p(α) is of the same algebraic form (here a beta distribution) as the prior density p(α).
8 7 6 5 4 3 2 1 0
0 0.2 0.4 0.6 0.8 1 alpha
Use the formula for the MAP estimate of α that you derived in part (b), together with the above information about the beta distribution to discuss how the value of a (with b = a) in the prior distribution effects the MAP estimate of α relative to the maximum likelihood estimate of α. That is, discuss how the change in the shape of the prior distribution with increasing a is reflected in the MAP estimate.
2. Fitting Constrained Gaussian Models
If you’re asked to fit a Gaussian distribution to a set of m, n−dimensional data points D = (x(1), x(2), . . . , x(m)), you know that the maximum likelihood estimate of the mean and covariance for the model are
1 m μˆ=m x(i)
i=1
respectively.
Suppose now that you are told that the model covariance matrix is constrained to be a
(positive) constant times the identity matrix
σ2 0 0 …
i=1 ˆ 1m
and
Σ = m
(x(i) −μˆ)(x(i) −μˆ)T
0σ2 0…
Σ= 2 . (3)
0 0 σ … .
(a) Write down the log-likelihood L of the set of n−dimensional data vectors D under this model.
(b) Derive the maximum likelihood estimate of Σ.
2
p(alpha)
3. Error, Bias, Variance and Paintball
You’re making final selections for members for your paintball team in preparation for the national championship. You have your final two contestants fire six shots into a target with the results below:
(A) (B)
Which contestant do you choose for your team? Suppose the guns have adjustable sights, does that change your decision? Discuss in statistical terms.
3