程序代写代做代考 Bayesian Bayesian Methods – Coursework

Bayesian Methods – Coursework

Dr. Simon Taylor

Deadline: 12noon Friday 21 April 2017 (Uploaded to MOODLE by 5pm)

The coursework for Bayesian Methods is in two parts and comprises of two short reports. The first

report should be about six sides of single spaced A4 including figures and about four sides for the second

part. Use OpenBUGS for your analysis. The data and additional information are available via moodle.

In addition to your report, your final model for both parts should submitted via moodle in separate

files titled exam model.ocd and birth weight model.ocd. The model can be in either code or Doodle

formats, but do not include the data or initial values.

The allocation of marks are as follows:

Part 1 – Exam Performance Part 2 – Low Birth Weight Total
Introduction 5% 5% 10%
Methods 25% 15% 40%
Results 15% 10% 25%
Conclusion 5% 10% 15%
Total 50% 45% 95%

The remaining 5% is awarded based on the presentation of the report.

1 Exam Performance

The file exam data.odc contains the (normalised) exam performance of 4,059 students from 65 schools

in Inner London. The data are taken from:

Goldstein, H., Rasbash, J., et al (1993). A multilevel analysis of school examination results. Oxford

Review of Education, 19, 425–433.

1

The variables in the data are:

school School ID (1–65) to which the pupil belongs.
examscore The normalised examscore for the pupil.
lrtest The pupil’s score in LR rest.
gender Gender of the pupil (0=boy, 1=girl).
schooltype School gender (1=mixed, 2=all boys school, 3=all girls school).
intakescore The school’s mean intake score.
VR Pupils Verbal Reasoning (VR) score band at intake (1=bottom 25%,

2=middle 50%, 3=top 25%).
studentintake Pupil’s band intake score (1=bottom 25%, 2=middle 50%, 3=top 25%).

An initial model has been proposed for the data in exam initial model.odc. The model is a hierarchical

(multi-level) model with the examscore yi of pupil i depending upon the school si and the LR test score

xi. Specifically:

yi ∼ N(αsi + βxi, 1/τ) (i = 1, 2, . . . , 4059)

αj ∼ N(λ, 1/θ) (j = 1, 2, . . . , 65)

According to the range of valid values, parameters λ and β are assigned normal prior distributions, whilst

the precisions τ and θ are given gamma priors. The school specific intercepts, αj , are defined according

to the hierarchical structure, but these are unknown variables and will need to be initialised when using

OpenBUGS.

Note: Both code and doodle are given for this model in a separate file from the data. Since the data set

is large, it is recommended that you keep the two files separate so that loading in the model and data is

easier to do by clicking on the file, choosing ‘Edit–Select All’ and then clicking on the relevant button on

the ‘Specification Tool’.

Use these data and OpenBUGS to answer the following:

1. If we are interested in predicting examscores:

(a) Fit the inital heirarchical model to the data. Justify your choice of prior distribution and

discuss the posterior estimates.

2

(b) Develop a sequence of models in a stepwise variable selection procedure for describing exam-

scores using the provided covariates. There are many possibilities and you are not expected

to consider all of these. However, the hierarchical model given above should feature in your

analysis, although you should feel free to change the prior distributions as appropriate. Choose

an optimal model that best describes the examscore against these covariates.

2. Use a node on the “best” model to find calibrated values for the predicted examscore for:

(a) A female pupil from school 30 with a LR test score of 0.5, a mid-band VR score and a mid-band

student intake score. She is attending an all girls school that has an average intake score of

0.2687752.

(b) A male pupil from school 47 with a LR test score of -0.35, a low-band VR score and a low-band

student intake score. He is attending an mixed gender school that has an average intake score

of -0.139923.

These values should not be calculated manually from the “best” model, but directly evaluated

within OpenBUGS by use of a node for the unknown values.

2 Low Birth Weight

The file birth weight data.odc contains data from a study to identify risk factors associated with giving

birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59

of which had low birth weight babies. The data is taken from:

Hosmer Jr, D.W. and Lemeshow, S. (2000). Applied logistic regression, 2nd ed. John Wiley & Sons.

Four variables that the doctor thought to be of importance in predicting whether a baby has a low birth

weight were the mother’s age, weight of the subject at her last menstrual period, race and the number

of physician visits during the first trimester of pregnancy.

The data file birth weight data.odc contains:

3

LOW Low birth weight indicator (0=Birth Weight ≥ 2500 g, 1=Birth Weight < 2500 g). AGE Age of the mother in years. LWT Weight in pounds (lb) at the last menstrual period. RACE Race (1=White, 2=Black, 3=Other). FTV Number of physician visits during the first trimester. The dependence of low birth weight outcome yi for subject i on the explanatory variables {x1,i, . . . , xp,i} is described by the binary regression model: yi ∼ Bernoulli(φi) (i = 1, 2, . . . , 189) logit(φi) ∼ β0 + β1x1,i + . . .+ βpxp,i. The file birth weight null.odc defines the null binary regression model. Note the logit link function used in the code. 1. The doctor suggests using normal prior distributions with mean 0 and precision 0.1 for all of the unknown co-efficients. Comment on the doctor’s choice of prior distributions using the information provided above or any other appropriate resource. 2. Using the doctor’s suggested prior distributions, develop the null model to include all four covariates. You might need to consider transformations of the covariates to assist with the performance of the Gibbs sampler, in particular, using standardised mother’s age and weight (e.g. the age minus the mean age). Any changes in covariates should ideally be performed within OpenBUGS. 3. Considering each explanatory variable in turn, use the samples drawn from the posterior to calculate the Bayes Factor for the hypothesis test and evaluate the validity of the doctor’s statement. Clearly state the hypotheses that you are investigating. 4