MAST20005/MAST90058: Week 11 Lab
Goals: Bayesian inference for: (i) proportions; (ii) mean of a normal distribution; (iii) simple
linear regression.
1 Bayesian estimation of proportions
Let Y be the number of successes in n independent Bernoulli trials. Then the data and prior distributions are, respectively,
Y | ✓ ⇠ Bi(n,✓), ✓ ⇠ Beta(↵, ). The posterior pdf can be obtained as,
f(✓ | y) = (↵ + + n) ✓↵+y 1(1 ✓) +n y 1, 0 < ✓ < 1, (↵ + y) ( + n y)
i.e. ✓ | y ⇠ Beta(↵ + y, + n y).
1. Suppose we observe y = 8 successes out of n = 20 trials. Plot the prior and posterior densities assuming ↵ = = 1. What kind of information is given by this prior? Comment on the information gain on ✓ gained after observing the data.
a <- 1
b <- 1
y <- 8
n <- 20
theta <- seq(0, 1, 0.01)
prior <- dbeta(theta, a, b)
posterior <- dbeta(theta, a + y, b + n - y)
par(mar = c(4, 4, 1, 1)) # tighter margins
plot(theta, posterior, type = "l", lwd = 2, col = "blue",
xlab = expression(theta),
ylab = expression(f(theta)))
points(theta, prior, type = "l", lty = 2, lwd = 2, col = "red")
0.0 0.2 0.4
Note that Beta(1, 1) = Unif(0, 1), meaning that we consider all values of ✓ to be equally plausible before observing any data. After observing the data we obtain a posterior centered around 0.4 and the uncertainty on ✓ is much reduced.
1
θ
0.6 0.8 1.0
f(θ) 0123
2. Find the posterior mean.
(a + y) / (a + b + n)
## [1] 0.4090909
3. Find the central 95% credible interval.
4. Estimate the posterior odds E ✓ | y by simulating from the posterior distribution. 1 ✓
In the above code, we generated 1000 observations from the posterior, ✓1, . . . , ✓1000 ⇠ Beta(↵ + y, + n y) and then obtained, X
E✓ ✓ y◆⇡ 1 1000 ✓i . 1 ✓ 1000 i=1 1 ✓i
5. The package Bolstad contains some useful routines for simple Bayesian models (if it is not installed, run install.packages("Bolstad") to install it). For example the following carries out a Bayesian analysis for proportions:
qbeta(c(0.025, 0.975), a + y, b + n - y)
## [1] 0.2181969 0.6156456
sample.p <- rbeta(1000, a + y, b + n - y)
mean(sample.p / (1 - sample.p))
## [1] 0.7544855
library(Bolstad)
par(mar = c(4, 4, 1, 1)) # tighter margins binobp(8, 20, 5, 5, plot = TRUE)
Posterior Prior
0.0 0.2 0.4
0.6 0.8 1.0
2
π
Density 012345
## Posterior Mean : 0.4333333
## Posterior Variance : 0.0079211
## Posterior Std. Deviation : 0.0890008
##
## Prob. Quantile
## ---------------
## 0.005 0.2190745
## 0.010 0.2369340
## 0.025 0.2644553
## 0.050 0.2892715
## 0.500 0.4318325
## 0.950 0.5825361
## 0.975 0.6106372
## 0.990 0.6425740
## 0.995 0.6637734
Note that the third and fourth arguments in binobp() are ↵ and . Try di↵erent values for the prior parameters and report how the information a↵ects the posterior distribution. Particularly, recall that the posterior mean can be written as,
E(✓|y)=✓ n ◆⇣y⌘+✓ ↵+ ◆✓ ↵ ◆, ↵+ +n n ↵+ +n ↵+
i.e. a weighted average between the prior mean ↵/(↵ + ) and the MLE y/n.
6. The estimator ✓ ̃ = (Y + 2)/(n + 4) is sometimes used instead of the sample proportion ✓ˆ = Y /n. Note that ✓ ̃ is the posterior mean when using Beta(2, 2) as the prior. Using this pqrior, compare the central 95% credible interval with the confidence interval ✓ ̃ ± 1.96 ⇥
̃ ̃
✓(1 ✓)/n, by simulation for di↵erent values of the true proportion ✓. The following
code computes estimates of the coverage probability for the two types of intervals.
theta <- 0.1
n <- 20
a <- 2
b <- 2
# true parameter
# sample size
# prior parameter (alpha)
# prior parameter (beta)
# number of simulations
nsim <- 1000
# Simulate data.
y <- rbinom(nsim, size = n, prob = theta)
# Calculate limits for credible interval.
l1 <- qbeta(0.025, a + y, b + n - y)
u1 <- qbeta(0.975, a + y, b + n - y)
# Calculate limits for confidence interval.
p.tilde <- (y + 2) / (n + 4)
se <- sqrt(p.tilde * (1 - p.tilde) / n)
l2 <- p.tilde - 1.96 * se
u2 <- p.tilde + 1.96 * se
3
# Calculate coverage probabilities.
mean(l1 < theta & theta < u1)
## [1] 0.868
mean(l2 < theta & theta < u2)
## [1] 0.986
Comment on the accuracy of the interval estimates obtained by the two methods for di↵erent choices of ✓ and n.
2 Normal model
To illustrate how one might use the posterior distribution for inference, consider a study analysing the e↵ect of specific training programs before a standardised test (Gelman, 2006). Assume a random sample of 240 di↵erences of scores. Let yi = z2i z1i, where zi1 is the score for individual i before being coached, and the score z2i is recorded after being coached. Sup- pose that the sample mean of these di↵erences is y ̄ = 7.87 and the standard deviation for the di↵erences, = 64.5, is known. Particularly, we assume the data model y ̄ | ✓ ⇠ N(✓, 2/n) and prior ✓ ⇠ N(✓0, 02). Then the posterior distribution is a normal distribution with mean,
and variance,
02 2 / n
E ( ✓ | y ̄ ) = 02 + 2 / n y ̄ + 02 + 2 / n ✓ 0
( 2/n) 02 v a r ( ✓ | y ̄ ) = 2 / n + 02 .
1. What are the implications of a large value of 02?
2. Find a 95% credible interval for ✓ when 02 ! 1 and compare such an interval with the
confidence interval for ✓. Note that the posterior in this case is, ✓ | y ̄ ⇠ N(y ̄, 2/n).
Thus a 95% credible interval is obtained as follows:
post.mean <- 7.87 # posterior mean post.mean
## [1] 7.87
post.sd <- 64.5 / sqrt(240) # posterior standard deviation post.sd
## [1] 4.163457
qnorm(c(0.025, 0.975), post.mean, post.sd)
## [1] -0.290226 16.030226
4
Note that this is the same as the confidence interval y ̄ ± 1.95 /pn, which we usually calculate using:
3. How does the Bayesian interval change if 0 = 5.1 and the prior mean is ✓0 = 15.2?
4. This example can be extended to illustrate how one can use a Bayesian approach to consider di↵erences in means of multiple populations as in ANOVA. Suppose coaching programs were implemented in 4 schools, with school averages given in the following table.
7.87 + c(-1, 1) * 1.96 * 64.5 / sqrt(240)
## [1] -0.2903759 16.0303759
School i Di↵erence (y ̄i·)
A128 B28 C3-3 D47
Squared standard error (s2i /ni)
15 10 16 11
Assume:
where yij is the jth observation from the ith school, ✓i is the mean e↵ect of school i and
yij | ✓i ⇠ N(✓i, i2)
i2 is the known variance of school i. Further, assume normal priors for each ✓i so that
✓i ⇠ N(7.87, 17.352)
Give a graphical illustration of the prior and posterior distributions using boxplots for
for all i = 1,...,4.
each school. A graph is simply obtained by sampling from the priors:
A <- rnorm(100, 7.87, 17.35)
B <- rnorm(100, 7.87, 17.35)
C <- rnorm(100, 7.87, 17.35)
D <- rnorm(100, 7.87, 17.35)
par(mar = c(4, 4, 1, 1)) # tighter margins
boxplot(A, B, C, D, horizontal = TRUE, col = "lightblue")
●●
−20 0 20 40
5
1234
To represent the posteriors, we assume i2/ni = s2i /ni and draw from the posteriors:
y.bar <- c(28, 8, -3, 7)
se2 <- c(15, 10, 16, 11)
theta.0 <- rep( 7.87, 4)
sigma.0 <- rep(17.35, 4)
post.means <- (sigma.0^2 / (sigma.0^2 + se2)) * y.bar +
( se2 / (sigma.0^2 + se2)) * theta.0
post.sd <- sqrt(sigma.0^2 * se2 / (sigma.0^2 + se2))
A <- rnorm(100, post.means[1], post.sd[1])
B <- rnorm(100, post.means[2], post.sd[2])
C <- rnorm(100, post.means[3], post.sd[3])
D <- rnorm(100, post.means[4], post.sd[4])
par(mar = c(4, 4, 1, 1)) # tighter margins
boxplot(A, B, C, D, horizontal = TRUE, col = "lightblue")
●
●
−10 0 10 20 30
3 Bayesian regression
In this section we look at the familiar regression model Yi ⇠ N(↵+ (xi x ̄), 2) from a Bayesian perspective. To this end, one can consider priors ↵ ⇠ N(μ↵, ↵2), and ⇠ N(μ , 2), while is regarded as known. The function bayes.lin.reg() computes posterior distributions for ↵ and and it enables us to specify either specific parameters for the prior or improper uniform (‘flat’) priors. If unspecified, the regression variance 2 is estimated as usual from the residuals. Consider 11 measurements on heart rate vs oxygen uptake.
O2 <- c(0.47, 0.75, 0.83, 0.98, 1.18, 1.29, 1.40, 1.60, 1.75, 1.90, 2.23)
HR <- c(94, 96, 94, 95, 104, 106, 108, 113, 115, 121, 131)
plot(HR, O2, xlab = "Heart Rate", ylab = "Oxygen uptake (Percent)")
6
1234
●
●
●
●
● ●
●
● ●
●
●
100 110 120 130 Heart Rate
x <- HR - mean(HR)
y <- O2
coef(summary(lm(y ~ x))) # usual regression fit
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.30727273 0.039282918 33.27840 9.840739e-11
## x 0.04265141 0.003379797 12.61952 5.009294e-07
Next compute posterior distributions for the parameters under flat and informative priors. Explore the e↵ect of di↵erent choices of prior on the final inference for the regression slope.
1. Fit the Bayesian regression model with flat priors.
bayes.lin.reg(y = O2, x = HR, slope.prior = “flat”, intcpt.prior = “flat”)
## Standard deviation of residuals: 0.13
## Posterior Mean Posterior Std. Deviation
## ————– ————————
## Intercept: 1.305 0.039253
## Slope: 0.04265 0.0033798
7
Oxygen uptake (Percent) 0.5 1.0 1.5 2.0
Prior, likelihood and posterior for β
Prior, likelihood and posterior for αx
Prior Likelihood Posterior
Prior Likelihood Posterior
0.030 0.040
0.050
1.15 1.25 (intercept)
β
α
(slope)
Predicitions with 95% bounds
1.35 1.45
Predicted value 95% lower bound 95% upper bound
70 80 90 100
120 140
x
2. Re-do it with prior parameters μ↵ = 1.2, ↵ = 0.05, μ = 0.04, = 0.005:
bayes.lin.reg(O2, HR, slope.prior = “n”, intcpt.prior = “n”,
ma0 = 1.2, mb0 = 0.04, sa0 = 0.05, sb0 = 0.005)
## Standard deviation of residuals: 0.13
## Posterior Mean Posterior Std. Deviation
## ————– ————————
## Intercept: 1.266 0.03089
## Slope: 0.04182 0.0028001
8
0 1
2 3 0 20 40 60 80 120
y
0 2 4 6 8 10
Prior, likelihood and posterior for β
Prior, likelihood and posterior for αx
Prior Likelihood Posterior
Prior Likelihood Posterior
70 80 90 100
Exercises
120 140
0.035 0.040
0.045 0.050
Predicitions with 95% bounds
1.20 1.25 (intercept)
β
α
(slope)
1.30 1.35
Predicted value 95% lower bound 95% upper bound
x
1. Consider question 2 from the tutorial problems. For both choices of prior, do the following:
(a) Draw a plot showing the prior and posterior distributions. (b) Calculate a 95% credible interval.
(c) For the drug the be commercially successful, you need it to be e↵ective in at least 3% of patients. What is your posterior probability that it will be successful?
2. Consider part 6 from Section 1. Estimate, via simulations, the coverage of the two interval estimates for n = 20 and ✓ 2 {0.05, 0.10, . . . , 0.95}. Draw a plot comparing these results. What do you notice? What are the relative widths of these intervals?
9
y
0 1 2 3 0 50 100 150
0 2 4 6 8 10 14