Pro ject
The project consists of 3 exercises, 2 related to McMC methods and one on bootstrap tech- niques for constructing confidence intervals. You should produce a report (.doc or .pdf) that summarises your work and your findings. The report should be no more than 10 sides of A4 (single spaced including Figures). In addition you should include your R code in a separate .R script file. For additional guidelines regarding presentation and writing of R code see: https://google.github.io/styleguide/Rguide.xml
Submission files:
• Report file with findings (.pdf or .doc) • R script file (.R)
Exercises:
1. Consider a set of data relating two score tests, LSAT and GPA, at a sample of 15 American law schools. Of interest is the correlation θ = cor (lsat, gpa) between these measurements and the variance ratio ψ = var(lsat)/var(gpa).
list(lsat = c(576, 635, 558, 578, 666, 580, 555,
661, 651, 605, 653, 575, 545, 572, 594),
gpa = c(3.39, 3.30, 2.81, 3.03, 3.55, 3.07, 3.00,
3.43, 3.36, 3.13, 3.12, 2.74, 2.76, 2.88, 2.96),
n = 15)
(a) Construct 95% bootstrap confidence intervals for the correlation parameter θ using the basic bootstrap interval method and the percentile interval method.
(b) Discuss in detail how you would construct a studentized bootstrap confidence interval for ψ under the assumption that lsat and gpa are normally distributed. Develop a function in R that calculates 95% studentized bootstrap confidence intervals for ψ.
[Marks: 7+8=15]
2. Consider the following dataset
list(t = c(94.3, 15.7, 62.9, 126, 5.24, 31.4, 1.05, 1.05, 2.1, 10.5),
x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22), n = 10)
on observed failures xi of n = 10 power plant pumps. Here ti denotes the length of operation time of the pump (in 1000s of hours). The number of failures Xi is assumed to follow a Poisson distribution Xi | θi ∼ Poisson(θiti). Consider the hierarchical model θi |α,β∼Gamma(α,β)whereα∼Exp(λ),β∼Gamma(γ,δ),λ=1andγ=δ=0.01.
(a) What are the conditional posterior distributions of – θi |α,β,x
– β|θi,α,x – α|θi,β,x
1
(b) Develop a general McMC algorithm in R with components θ,β,α where θ and β are updated from the posterior conditional distribution and log α is updated via a random walk Metropolis step with normal increments.
(c) Let x∗ denote a future observable. Develop an R function that calculates the predic- tive distribution of x∗ | x using Monte Carlo integration.
[Marks: 3+7+5=15]
3. Let x1, . . . , xn be a sample of independent and identically distributed observations assumed to have been generated from a t-distribution with ν degrees of freedom located at μ, i.e.,
f(x | μ) ∝ 1 + ν(x − μ)2−(ν+1)/2 x ∈ R. (1) Assume ν = 10 and suppose μ ∼ N(0,1).
(a) Write down, up to proportionality constant, the posterior μ | x. Is this a recognisable density? Describe in detail and develop in R an McMC algorithm for evaluating the posterior distribution of μ where the updating is done using random walk Metropolis with a normal candidate generator.
(b) Using the fact that the t-distribution is a scale mixture of normals, the sampling model in (1) can alternatively be represented as
xi |μ,zi ∼N(μ,1/zi) where zi are a priori independent of μ and
f(zi) ∝ z(ν/2)−1 exp{−(ν/2) zi}. i
Using this representation together with the prior distribution in (a), write down, up to proportionality constant, the conditional posterior densities of μ and of each zi, i = 1, . . . , n and identify the distributions corresponding to each of these densities. Describe in detail and develop in R a Gibbs sampling algorithm for computing the posterior distribution of μ.
(c) Let x∗ denote a future observable. Develop an R function that calculates the predic- tive distribution of x∗ | x using Monte Carlo integration.
(d) Write a function in R that calculates the pth quantile of the predictive distribution xp : Pr (x∗ ≤ xp | x) = p and use it to construct 95% predictive intervals.
[Marks: 3+7+5+5=20]
2