Workshop 2: Solutions
1. (i) Hereby I will let y=dmft. The first sensible check is to look at the histogram of the data:
As can be observed the value 0 has a high frequency when compared to the remaining values. A further check is to compute the mean and the variance of the data (for the Poisson distribution we know that the mean should be equal to the variance). We obtain the values listed below, clearly showing that the data is overdispersed.
> mean(y) [1] 2.17
Copyright By PowCoder代写 加微信 powcoder
[1] 8.102121
As the authors state in the book: “While the Poisson distribution is usually the first choice to describe the distribution of counts, in medical applications it is often not the best choice. For the Poisson distribution, the counts represent the sum of independent events that happen with a constant average. The dmft-index is the sum of binary responses expressing the caries experience in each of the 20 primary teeth. However, cavities in the same mouth are correlated. This leads to (Poisson-) overdispersion, which means that the variance is larger than the mean.” Therefore, the Poisson distribution is possibly not the best option to model this dataset. Possibly, a zero inflated Poisson distribution or a negative binomial distribution would be better options. For the sake of simplicity, we proceed with the Poisson distribution.
(ii) The likelihood is Poisson and θ is assigned a Gamma(a, b) prior distribution. We have
Using the function qgamma in R, to compute the 2.5% and 97.5% quantiles, we obtain that a 95% credible interval for θ is (1.900, 2.475).
(iii) In order to use the rejection sampling algorithm to simulate from the (unnormalised) posterior distribution, we need to choose a proposal distribution. The suggestion is to use an exponential distribution with mean equal to the mean of the data, i.e.,
From the above figure we can appreciate that there is a ‘large rejection area’. This is confirmed by the acceptance rate of the algorithm, about 6% (see R script for details on the implementa- tion). The histogram of the sampled θ’s is plotted below.
For this ensemble of θ values we obtain that the posterior mean of θ is 2.181, its posterior standard deviation is 0.148, and a 95% credible interval is (1.898, 2.477). These values are very similar to the ones obtained with exact calculations. Note that due to the stochastic nature of the algorithm, you might obtain slightly different values from the ones presented here.
(iv) Weneedtoanswerthesamequestionbutnowusingthesampleimportanceresampling(SIR)al- gorithm. We use the same proposal distribution as in (iii) and we follow the algorithm described on slides 15/16 of the second lecture (file BDA Lecture2.pdf). Note that the SIR algorithm keeps all the sampled values, but they are weighted according to their ‘importance’. Applying the algortihm (see R script for details on the implementation) we obtain
for which the posterior mean of θ is 2.173, its standard deviation 0.145, and a 95% credible interval is (1.902, 2.472).
(v) We will now use the rjags package to answer the same question. We need to write in BUGS language the model (i.e., likelihood and prior). The bulk of the code is
model_string <- "model{ # Likelihood
for(i in 1:n){
y[i] ̃dpois(theta)
We then pass the model to JAGS, along with the data and hyperparameter values. Here we are not providing initial values, so rjags will generate them.
model=jags.model(textConnection(model_string),n.chains=1, data=list(y=y,n=n,a=3,b=1)) We start with a burn in of 1000 iterations
theta ̃dgamma(a,b) }"
update(model,1000,progress.bar="none")
and we then sample a further 10000 iterations
res=coda.samples(model, variable.names=c("theta"),n.iter=10000,progress.bar="none")
As can be seen above, the traceplot and autocorrelation function look OK (this is a very simple model). The summary statistics of the posterior are
> summary(res)
(v) We first need to write in JAGS the model with the mixture prior, which we do as follows
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com