机器学习模式识别代写: ANLY-601 Assignment 5

ANLY-601 Spring 2018

Assignment 5
Due Thursday, April 5 , 2018 — in class

You may use your class notes, the text, or any calculus books — please use no other references (including internet or other statistics texts). If you use Mathematica to derive results, you must include the notebook with your solution so I can see what you did.

1. Estimating the parameter of a binomial distribution

I give you a coin and ask you to estimate the probability of a toss resulting in heads. You decide to construct the estimate by tossing the coin N times and counting the number of heads n.

(a) The probability of recording n heads from N tosses is given by the binomial distribution 􏰆N􏰇 N!

p(n|N,α) = n αn (1−α)(N−n) = n!(N−n)!αn (1−α)(N−n) (1)

where α is the probability of heads coming up on any particular toss – it is the quantity we want to estimate.
Derive the maximum likelihood estimate of α, i.e. the value of α that maximizes the log probability (the log of Eqn. (1)) of throwing the coin N times and observing n heads. (Notice that the data for this experiment is the single number n.)

(b) Suppose now you decide to construct a MAP estimate of α. Show that the beta distri-

bution

p(α) = 1 α(a−1)(1−α)b−1 with 1≤a,b≤∞ (2) B(a,b)

is a conjugate prior1 for the likelihood function in (1) and derive the MAP estimate of α . (The normalization factor B(a, b) is the Euler beta function – but you don’t need knowledge of this function to complete the problem.)

(c) Since there’s no particular reason to believe that the coin I hand you is grossly unfair, you opt to pick a and b so that the prior p(α) is maximum at 1/2 and symmetric about that value. To achieve this, you set a = b. For this choice, it’s clear that with a = b = 1 the prior is flat. For larger values of a = b, the prior distribution gets progressively more peaked up about α = 1/2; the variance of the prior is var(α) = 1/(4 (2 a + 1)). The prior distribution is plotted below for several choices of a = b.

1Recall from your notes that for a conjugate prior p(α), the posterior density p(α|D) ∝ p(D|α) p(α) is of the same algebraic form (here a beta distribution) as the prior density p(α).

8 7 6 5 4 3 2 1 0

0 0.2 0.4 0.6 0.8 1 alpha

Use the formula for the MAP estimate of α that you derived in part (b), together with the above information about the beta distribution to discuss how the value of a (with b = a) in the prior distribution effects the MAP estimate of α relative to the maximum likelihood estimate of α. That is, discuss how the change in the shape of the prior distribution with increasing a is reflected in the MAP estimate.

2. Fitting Constrained Gaussian Models

If you’re asked to fit a Gaussian distribution to a set of m, n−dimensional data points D = (x(1), x(2), . . . , x(m)), you know that the maximum likelihood estimate of the mean and covariance for the model are

1 􏰈m μˆ=m x(i)

i=1

respectively.
Suppose now that you are told that the model covariance matrix is constrained to be a

(positive) constant times the identity matrix
σ2 0 0 …

i=1 ˆ 1􏰈m

and
Σ = m

(x(i) −μˆ)(x(i) −μˆ)T

0σ2 0…
Σ= 2 . (3)

0 0 σ …  . 

(a) Write down the log-likelihood L of the set of n−dimensional data vectors D under this model.

(b) Derive the maximum likelihood estimate of Σ.

p(alpha)

3. Error, Bias, Variance and Paintball

You’re making final selections for members for your paintball team in preparation for the national championship. You have your final two contestants fire six shots into a target with the results below:

(A) (B)

Which contestant do you choose for your team? Suppose the guns have adjustable sights, does that change your decision? Discuss in statistical terms.

Related Posts