CS计算机代考程序代写 Bayesian flex algorithm ASSIGNMENT 4 (STAT3500/7500)

ASSIGNMENT 4 (STAT3500/7500)

(a) [15 marks]

Consider an observed random sample of size n, w1, . . . , wn, from a normal distribution
N(µ, σ2).

To the 75 observations in the dataset Data-Ass4a.csv apply the EM algorithm to fit via
maximum likelihood the two-component normal mixture density with common variances,

f(w; Ψ) =
2∑

i=1

πi φ(w;µi, σ
2),

where
φ(w;µ, σ2) = (2πσ2)−1/2 exp{−1

2
(w − µ)2/σ2}

and
Ψ = (π1, µ1, µ2, σ

2)T .

To this end,

(i) [1/2 mark]

Specify the EM framework.

(ii) [1/2 mark]

Write down the expressions for the E- and M-steps. on the (k+ 1)th iteration of the EM
algorithm.

(iii) [3 marks]

Use an available program to fit this mixture model via the EM algorithm such as MClust,
FlexMix, and EMMIX, which may be found on CRAN. Explicitly give the starting or
starting points tried in your fitting of the EM algorithm and the stopping criterion
adopted.

(iv) [3 marks]

Let Ψ̂ be the ML estimate of Ψ obtained in (a) above. Plot the fitted two-component
normal mixture density f(w; Ψ̂) on top of a histogram of the n = 75 data points.

Choose the number of bins N for the histogram by consideration of

n ≈ 2N−1

and/or using the formula,

bin width ≈
2× Sample IQR

n1/3
,

to guide in the choice of the number of bins N .

(v) [2 marks]

Carry out a chi-squared goodness-of-fit test to assess the adequacy of the fit of the two-
component normal mixture model with common variances to the n = 75 data points.

(vi) [2 marks]

Fit to this dataset by maximum likelihood via the EM algorithm a two-component normal
mixture model with now unequal component variances. Take the component variances to
be arbitrary (that is, do not constrain them to be equal now) so that this mixture density
is given by

f(w; Ψ) =
2∑

i=1

πi φ(w;µi, σ
2
i ),

where
Ψ = (π1, µ1, µ2, σ

2
1, σ

2
2)

T .

(vii) [2 marks]

Use the nonparametric bootstrap to obtain standard errors of the estimates so obtained
for the parameters π1, µ1, µ2, σ

2
1, and σ

2
2.

(viii) [2 marks]

Use the parametric bootstrap to obtain standard errors of the estimates so obtained for
the parameters π1, µ1, µ2, σ

2
1, and σ

2
2.

(b) [10 marks]

Consider the dataset Data-Ass4b.csv with n = 100 four-dimensional observations.

(i) [4 marks]

Fit a g-component normal mixture model with a common covariance matrix for its four-
dimensional components for g = 1, g = 2, and g = 3.

(ii) [2 marks]

Carry out a test of exact size 0.05 of the null hypothesis H0 : g = 1 versus H1 : g = 2
using a resampling approach.

(iii) [2 marks]

Use the bootstrap with B = 99 bootstrap replications to test the null hypothesis H0 :
g = 2 versus H1 : g = 3.

(iv) [2 marks]

Use the Bayesian information criterion (BIC) to decide on the choice between g = 2 and
g = 3 components.