机器学习模式识别代写: ANLY-601 Assignment 5

ANLY-601 Spring 2018

Assignment 5
Due Thursday, April 5 , 2018 — in class

You may use your class notes, the text, or any calculus books — please use no other references (including internet or other statistics texts). If you use Mathematica to derive results, you must include the notebook with your solution so I can see what you did.

1. Estimating the parameter of a binomial distribution

I give you a coin and ask you to estimate the probability of a toss resulting in heads. You decide to construct the estimate by tossing the coin N times and counting the number of heads n.

(a) The probability of recording n heads from N tosses is given by the binomial distribution 􏰆N􏰇 N!

p(n|N,α) = n αn (1−α)(N−n) = n!(N−n)!αn (1−α)(N−n) (1)

where α is the probability of heads coming up on any particular toss – it is the quantity we want to estimate.
Derive the maximum likelihood estimate of α, i.e. the value of α that maximizes the log probability (the log of Eqn. (1)) of throwing the coin N times and observing n heads. (Notice that the data for this experiment is the single number n.)

(b) Suppose now you decide to construct a MAP estimate of α. Show that the beta distri-

bution

p(α) = 1 α(a−1)(1−α)b−1 with 1≤a,b≤∞ (2) B(a,b)

is a conjugate prior1 for the likelihood function in (1) and derive the MAP estimate of α . (The normalization factor B(a, b) is the Euler beta function – but you don’t need knowledge of this function to complete the problem.)

(c) Since there’s no particular reason to believe that the coin I hand you is grossly unfair, you opt to pick a and b so that the prior p(α) is maximum at 1/2 and symmetric about that value. To achieve this, you set a = b. For this choice, it’s clear that with a = b = 1 the prior is flat. For larger values of a = b, the prior distribution gets progressively more peaked up about α = 1/2; the variance of the prior is var(α) = 1/(4 (2 a + 1)). The prior distribution is plotted below for several choices of a = b.

1Recall from your notes that for a conjugate prior p(α), the posterior density p(α|D) ∝ p(D|α) p(α) is of the same algebraic form (here a beta distribution) as the prior density p(α).

8 7 6 5 4 3 2 1 0

0 0.2 0.4 0.6 0.8 1 alpha

Use the formula for the MAP estimate of α that you derived in part (b), together with the above information about the beta distribution to discuss how the value of a (with b = a) in the prior distribution effects the MAP estimate of α relative to the maximum likelihood estimate of α. That is, discuss how the change in the shape of the prior distribution with increasing a is reflected in the MAP estimate.

2. Fitting Constrained Gaussian Models

If you’re asked to fit a Gaussian distribution to a set of m, n−dimensional data points D = (x(1), x(2), . . . , x(m)), you know that the maximum likelihood estimate of the mean and covariance for the model are

1 􏰈m μˆ=m x(i)

i=1

respectively.
Suppose now that you are told that the model covariance matrix is constrained to be a

(positive) constant times the identity matrix
σ2 0 0 …

i=1 ˆ 1􏰈m

and
Σ = m

(x(i) −μˆ)(x(i) −μˆ)T

0σ2 0…
Σ= 2 . (3)

0 0 σ …  . 

(a) Write down the log-likelihood L of the set of n−dimensional data vectors D under this model.

(b) Derive the maximum likelihood estimate of Σ.

2

p(alpha)

3. Error, Bias, Variance and Paintball

You’re making final selections for members for your paintball team in preparation for the national championship. You have your final two contestants fire six shots into a target with the results below:

(A) (B)

Which contestant do you choose for your team? Suppose the guns have adjustable sights, does that change your decision? Discuss in statistical terms.

3