Example: Bayesian logistic regression
Bayesian Statistics Statistics 4224/5224 Spring 2021
February 2, 2021
1
The two-parameter normal model
Let y1, . . . , yn|μ, σ2 ∼ iid Normal(μ, σ2). A conjugate prior
If we have a prior distribution given by
μ|σ2 ∼ Normal(μ0, σ2/κ0)
σ2 ∼ Inv-χ2(ν0, σ02) then the resulting posterior is
μ|σ2, y ∼ Normal(μn, σ2/κn) σ2|y ∼ Inv-χ2(νn, σn2)
2
where
and
and
κn =κ0+n and νn =ν0+n μ n = κ 0 μ 0 + n y ̄
κ0 + n
1 κ0n
σn2=ν ν0σ02+(n−1)s2+κ +n(μ0−y ̄)2 n0
where
2 1 n 2 s=n−1 (yi−y ̄).
i=1
3
Posterior simulation
A simulated random sample μ(s),σ2(s) based on σ2(s) ∼ Inv-χ2(νn, σn2)
μ(s) ∼ Normal(μn, σ2(s)/κn)
for s = 1,…,S is produced by the following R code
sigma2.sim <- nu.n * sigma2.n / rchisq(S, df=nu.n)
mu.sim <- rnorm(S, mu.n, sqrt(sigma2.sim / kappa.n))
4
A noninformative prior
Letting κ0 → 0 and ν0 → 0 gives the improper prior distribution p(μ, σ2) ∝ 1/σ2, which yields the posterior distribution
μ|σ2, y ∼ Normal(y ̄, σ2/n) σ2|y ∼ Inv-χ2(n − 1, s2) .
The marginal posterior distribution of μ is given by
μ − y ̄
√ y∼tn−1.
s/ n
5
Homework 2
The second homework assignment is nominally due before class on Tuesday, February 9. Papers can be submitted to Course- works through the end of the day on Wednesday, February 10, after which no late homework will be accepted.
Note that the ‘HW2 Hints’ document was recently updated — maker sure you have the most current version!
1. Problem 1 is a straightforward probability calculation. The R function pgamma is likely to prove useful on this problem.
6
2. Let the number of home runs Reggie Jackson hits in n0 regu- lar season games be Poisson(n0θ0), and the number of home runs he hits in n1 World Series games be Poisson(n1θ1). As- sign independent noninformative priors to θ0 and θ1, and plot their posterior densities on a common set of axes. Quantify your degree of belief that Reggie does indeed perform better in the World Series by the posterior probability Pr(θ1 > θ0|y).
3. Problem 3 is a straightforward probability calculation — solve it using the χ2 probability function pchisq() in R. Remember that if the random variable X has the scaled-inverse-χ2 dis- tribution with ν > 0 degrees of freedom and scale parameter s2, write X ∼ Inv-χ2(ν, s2), then
νs2 2 X ∼ χν .
7
4. This problem has three parts.
(a) This will be two shifted and re-scaled t-distributions.
(b) Letting μ1 = mean weight change on placebo and μ2 = mean weight change on the treatment, find Pr(μ2 < μ1|y).
(c) Let y ̃ denote the weight change for a 7th subject in 1
the placebo group, and y ̃ that for a 7th subject in the 2
treatment group. Plot the posterior predictive densities
p(y ̃ |y) and p(y ̃ |y) on a common set of axes, and find 12
Pr(y ̃ < y ̃ |y), the posterior predictive probability that the 21
subject getting the treatment loses more weight.
8
Example: Analysis of a bioassay experiment
Here we discuss the example described in Sections 3.7–3.8 of Bayesian Data Analysis, by Gelman et al; R code to reproduce the calculations can be found at
Courseworks → Files → Examples → BayesLogistic .
In this example, twenty animals were tested, five at each of four dose levels.
Dose, xi Number of (log g/ml) animals, ni
−0.86 5 −0.30 5 −0.05 5
0.73 5
Number of deaths, yi 0
1
3
5
9
Modeling the dose-response relation
It seems reasonable to model the outcomes within each group i as independent with equal probabilities, which implies the data points yi are binomially distributed:
yi|θ ∼ Binomial(ni, θi)
where θi is the probability of death for animals given dose xi.
The standard approach is to use a transformation of the θ’s, such as the logit, in the dose-response relation:
θi
logit(θi) = log 1−θ = α+βxi .
i
This is called a logistic regression model.
10
The likelihood
We can write the sampling model, or likelihood, for each group i in terms of the parameters α and β as
p(yi|α, β) = [logit−1(α + βxi)]yi[1 − logit−1(α + βxi)]ni−yi . The model is characterized by the parameters α and β, whose
joint posterior distribution is
p(α, β|y) ∝ p(α, β)p(y|α, β) = p(α, β) p(yi|α, β) .
The prior distribution
We present an analysis based on a prior distribution for α and β that is independent and locally uniform in the two parameters, that is, p(α, β) ∝ 1.
11
k i=1
A rough estimate of the parameters
We will compute the joint posterior distribution p(α,β|y) at a
grid of points (α, β), but before doing so, it is a good idea to get
a rough estimate of (α,β) so we know where to look. Using the
glm() function in R, we find the maximum likelihood estimate of ˆ
(α, β) is (αˆ, β) = (0.8, 7.8), with standard errors of 1.0 and 4.9 for α and β, respectively.
Obtaining a contour plot of the joint posterior density
We are now ready to compute the posterior density at a grid of points (α,β). After some experimentation, we use the range (α, β) ∈ [−4, 8] × [−5, 45], which captures almost all the mass of the posterior distribution.
12
Sampling from the joint posterior distribution
We sample 1000 random draws (αs,βs) from the posterior dis- tribution using the following procedure:
1. Compute the marginal posterior of α by numerically summing over β in the discrete distribution computed on the grid.
2. For s = 1,...,S = 1000:
(a) Draw αs from the discretely computed p(α|y).
(b) Draw βs from the discrete conditional distribution p(β|αs, y).
(c) For each of αs and βs, add a uniform random jitter cen-
tered at zero with a width equal to the spacing of the sampling grid (this gives the simulation draws a continu- ous distribution).
13
The posterior distribution of the LD50
A parameter of common interest in bioassay studies is the LD50 — the dose level at which the probability of death is 50%.
In our logistic model, a 50% survival rate means yi −1
LD50: E n =logit (α+βxi)=0.5; i
thus, α + βxi = logit(0.5) = 0, and the LD50 is xi = −α/β.
Computing the posterior distribution of any summaries in the Bayesian approach is straightforward.
14
To simulate the posterior distribution of the LD50, we just com- pute −αs/βs for each of the 1000 draws of (α,β).
In the context of this example, LD50 is a meaningless concept if β ≤ 0, in which case increasing the drug dose does not cause the probability of death to increase.
We summarize the inference on the LD50 scale by reporting two results: (1) Pr(β > 0|y) — that is, the posterior probability that the drug is harmful — and (2) the posterior distribution for the LD50 conditional on β > 0.
15
Summary of elementary modeling and computation
A general strategy for computation of simple Bayesian posterior distributions:
1. Write the likelihood part of the model, p(y|θ), ignoring any factors that are free of θ.
2. Write posterior density, p(θ|y) ∝ p(θ)p(y|θ).
If prior information is well-formulated, include it in p(θ).
Otherwise, use a weakly informative prior distribution or tem- porarily set p(θ) ∝ constant, with the understanding that the prior density can be altered later.
16
3. Create a crude estimate of the parameters, θ, for use as a starting point and a comparison to the computation in the next step.
4. Draw simulations θ1, . . . , θS from the posterior distribution. Use the sample draws to compute the posterior density of any functions of θ that may be of interest.
1S
5. If any predictive quantities y ̃ are of interest, simulate y ̃ , . . . , y ̃
s
by drawing each y ̃ from the sampling distribution conditional
sss on the drawn value θ , that is, y ̃ ∼ p(y ̃|θ ).
17