THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE
STAT6011/7611/8305 COMPUTATIONAL STATISTICS (2019 Fall)
Assignment 3, due on November 28
All numerical computation MUST be conducted in Python, and attach the Python code.
1. Consider an integral
5 sin(x)+1
√
(a) Plot the above integrand function in the range of (−2, 5).
(b) Use the Gaussian Legendre, Chebyshev 1, Chebyshev 2, and Jacobi quadra- tures with 10 nodes and weights to approximate the integral respectively. Present the nodes, weights, and the approximation results.
2. Use the dataset q2.csv, the observed data y = (y1, . . . , yn) are from a mixture of normal distributions, i.e., Yi ∼ kj=1 ωj fj (y), i = 1, . . . , n = 1000, where eachfj isanormaldensityfunctionN(μj,σj2),andωj isthemixingprobability and kj=1 ωj = 1. Consider the complete data (yi, ui), where the missing data ui indicates which distribution yi is from.
(a) Write out the complete-data likelihood.
(b) Derive the marginal distribution of yi.
(c) Supposethatweknowk=2,σ12 =σ2 =1(j=1,2),andω1 =ω2 =0.5,
but μj’s are unknown. Derive the Q(μ|μ(0)) function in the E step, and
derive the estimators {μ(1)} given the previous step values {μ(0)} in the jj
M step. Use the (sample-based) EM algorithm to estimate μj.
(d) Repeat (c) using population EM, i.e., taking the expectation of Yi based on its true mixture density function f(y) = 0.5N(−1,1) + 0.5N(1,1) where the expectation can be computed using Monte Carlo (MCEM). Comment on results in (c) and (d).
(e) Supposethatweknowk=2,σ12 =σ2 =1(j=1,2),butμj andωj areun-
known. If we treat the ui’s as missing data, derive the Q(ω, μ|ω(0), μ(0))
function in the E step, and derive the estimators in a closed form, i.e.,
the iterative equation between {ω(1), μ(1)} and {ω(0), μ(0)} in the M step. jj jj
Use the (sample-based) EM algorithm to estimate μj and ωj.
(f) Repeat (e) using population EM, i.e., taking the expectation of Yi based on its true mixture density function f(y) = ω1N(−1, 1)+ω2N(1, 1) where the expectation can be computed using Gaussian Hermite quadrature.
−2
1
x2 + 1
dx.
Comment on results in (c)–(f), i.e., knowing the true weights helps con- vergence or not (e.g., how many iterations are needed for convergence).
3. Use the EM algorithm to estimate the parameters in the random effects logistic model, for i = 1,…,I and j = 1,…,J,
Yij = β0 +β1xij +ui +εij, ui ∼ N (0, σu2 ),
εij ∼ N(0, σε2).
The unknown parameter vector θ = (β0, β1, σu2, σε2)T.
(a) Write out the complete-data likelihood.
(b) Derive the Q-function and the M-step of the EM algorithm.
(c) Conduct simulations as follows. Set the parameters β0 = 0.5, β1 = 1, σu =1,σε =1,I=100,andJ=2. Foreachdataset,simulatexij from Uniform(0, 1), simulate εij and ui from the corresponding normal distri- butions, and then obtain yij . Use the EM algorithm to obtain the param- eter estimates based on each simulated dataset. Repeat the simulation process 1000 times and present the bias (averaged over 1000 simulations) and standard deviation for θ. Comment on your findings.
2