CS计算机代考程序代写 Bayesian Example: Bayesian logistic regression

Example: Bayesian logistic regression
Bayesian Statistics Statistics 4224/5224 Spring 2021
February 2, 2021
1

The two-parameter normal model
Let y1, . . . , yn|μ, σ2 ∼ iid Normal(μ, σ2). A conjugate prior
If we have a prior distribution given by
μ|σ2 ∼ Normal(μ0, σ2/κ0)
σ2 ∼ Inv-χ2(ν0, σ02) then the resulting posterior is
μ|σ2, y ∼ Normal(μn, σ2/κn) σ2|y ∼ Inv-χ2(νn, σn2)
2

where
and
and
κn =κ0+n and νn =ν0+n μ n = κ 0 μ 0 + n y ̄
κ0 + n
1􏰙 κ0n􏰚
σn2=ν ν0σ02+(n−1)s2+κ +n(μ0−y ̄)2 n0
where
2 1 􏰄n 2 s=n−1 (yi−y ̄).
i=1
3

Posterior simulation
A simulated random sample 􏰤􏰋μ(s),σ2(s)􏰌􏰥 based on σ2(s) ∼ Inv-χ2(νn, σn2)
μ(s) ∼ Normal(μn, σ2(s)/κn)
for s = 1,…,S is produced by the following R code
sigma2.sim <- nu.n * sigma2.n / rchisq(S, df=nu.n) mu.sim <- rnorm(S, mu.n, sqrt(sigma2.sim / kappa.n)) 4 A noninformative prior Letting κ0 → 0 and ν0 → 0 gives the improper prior distribution p(μ, σ2) ∝ 1/σ2, which yields the posterior distribution μ|σ2, y ∼ Normal(y ̄, σ2/n) σ2|y ∼ Inv-χ2(n − 1, s2) . The marginal posterior distribution of μ is given by μ − y ̄ 􏰣􏰣 √ 􏰣􏰣y∼tn−1. s/ n 􏰣 5 Homework 2 The second homework assignment is nominally due before class on Tuesday, February 9. Papers can be submitted to Course- works through the end of the day on Wednesday, February 10, after which no late homework will be accepted. Note that the ‘HW2 Hints’ document was recently updated — maker sure you have the most current version! 1. Problem 1 is a straightforward probability calculation. The R function pgamma is likely to prove useful on this problem. 6 2. Let the number of home runs Reggie Jackson hits in n0 regu- lar season games be Poisson(n0θ0), and the number of home runs he hits in n1 World Series games be Poisson(n1θ1). As- sign independent noninformative priors to θ0 and θ1, and plot their posterior densities on a common set of axes. Quantify your degree of belief that Reggie does indeed perform better in the World Series by the posterior probability Pr(θ1 > θ0|y).
3. Problem 3 is a straightforward probability calculation — solve it using the χ2 probability function pchisq() in R. Remember that if the random variable X has the scaled-inverse-χ2 dis- tribution with ν > 0 degrees of freedom and scale parameter s2, write X ∼ Inv-χ2(ν, s2), then
νs2 2 X ∼ χν .
7

4. This problem has three parts.
(a) This will be two shifted and re-scaled t-distributions.
(b) Letting μ1 = mean weight change on placebo and μ2 = mean weight change on the treatment, find Pr(μ2 < μ1|y). (c) Let y ̃ denote the weight change for a 7th subject in 1 the placebo group, and y ̃ that for a 7th subject in the 2 treatment group. Plot the posterior predictive densities p(y ̃ |y) and p(y ̃ |y) on a common set of axes, and find 12 Pr(y ̃ < y ̃ |y), the posterior predictive probability that the 21 subject getting the treatment loses more weight. 8 Example: Analysis of a bioassay experiment Here we discuss the example described in Sections 3.7–3.8 of Bayesian Data Analysis, by Gelman et al; R code to reproduce the calculations can be found at Courseworks → Files → Examples → BayesLogistic . In this example, twenty animals were tested, five at each of four dose levels. Dose, xi Number of (log g/ml) animals, ni −0.86 5 −0.30 5 −0.05 5 0.73 5 Number of deaths, yi 0 1 3 5 9 Modeling the dose-response relation It seems reasonable to model the outcomes within each group i as independent with equal probabilities, which implies the data points yi are binomially distributed: yi|θ ∼ Binomial(ni, θi) where θi is the probability of death for animals given dose xi. The standard approach is to use a transformation of the θ’s, such as the logit, in the dose-response relation: 􏰐 θi 􏰑 logit(θi) = log 1−θ = α+βxi . i This is called a logistic regression model. 10 The likelihood We can write the sampling model, or likelihood, for each group i in terms of the parameters α and β as p(yi|α, β) = [logit−1(α + βxi)]yi[1 − logit−1(α + βxi)]ni−yi . The model is characterized by the parameters α and β, whose joint posterior distribution is p(α, β|y) ∝ p(α, β)p(y|α, β) = p(α, β) 􏰅 p(yi|α, β) . The prior distribution We present an analysis based on a prior distribution for α and β that is independent and locally uniform in the two parameters, that is, p(α, β) ∝ 1. 11 k i=1 A rough estimate of the parameters We will compute the joint posterior distribution p(α,β|y) at a grid of points (α, β), but before doing so, it is a good idea to get a rough estimate of (α,β) so we know where to look. Using the glm() function in R, we find the maximum likelihood estimate of ˆ (α, β) is (αˆ, β) = (0.8, 7.8), with standard errors of 1.0 and 4.9 for α and β, respectively. Obtaining a contour plot of the joint posterior density We are now ready to compute the posterior density at a grid of points (α,β). After some experimentation, we use the range (α, β) ∈ [−4, 8] × [−5, 45], which captures almost all the mass of the posterior distribution. 12 Sampling from the joint posterior distribution We sample 1000 random draws (αs,βs) from the posterior dis- tribution using the following procedure: 1. Compute the marginal posterior of α by numerically summing over β in the discrete distribution computed on the grid. 2. For s = 1,...,S = 1000: (a) Draw αs from the discretely computed p(α|y). (b) Draw βs from the discrete conditional distribution p(β|αs, y). (c) For each of αs and βs, add a uniform random jitter cen- tered at zero with a width equal to the spacing of the sampling grid (this gives the simulation draws a continu- ous distribution). 13 The posterior distribution of the LD50 A parameter of common interest in bioassay studies is the LD50 — the dose level at which the probability of death is 50%. In our logistic model, a 50% survival rate means 􏰐yi􏰑 −1 LD50: E n =logit (α+βxi)=0.5; i thus, α + βxi = logit(0.5) = 0, and the LD50 is xi = −α/β. Computing the posterior distribution of any summaries in the Bayesian approach is straightforward. 14 To simulate the posterior distribution of the LD50, we just com- pute −αs/βs for each of the 1000 draws of (α,β). In the context of this example, LD50 is a meaningless concept if β ≤ 0, in which case increasing the drug dose does not cause the probability of death to increase. We summarize the inference on the LD50 scale by reporting two results: (1) Pr(β > 0|y) — that is, the posterior probability that the drug is harmful — and (2) the posterior distribution for the LD50 conditional on β > 0.
15

Summary of elementary modeling and computation
A general strategy for computation of simple Bayesian posterior distributions:
1. Write the likelihood part of the model, p(y|θ), ignoring any factors that are free of θ.
2. Write posterior density, p(θ|y) ∝ p(θ)p(y|θ).
If prior information is well-formulated, include it in p(θ).
Otherwise, use a weakly informative prior distribution or tem- porarily set p(θ) ∝ constant, with the understanding that the prior density can be altered later.
16

3. Create a crude estimate of the parameters, θ, for use as a starting point and a comparison to the computation in the next step.
4. Draw simulations θ1, . . . , θS from the posterior distribution. Use the sample draws to compute the posterior density of any functions of θ that may be of interest.
1S
5. If any predictive quantities y ̃ are of interest, simulate y ̃ , . . . , y ̃
s
by drawing each y ̃ from the sampling distribution conditional
sss on the drawn value θ , that is, y ̃ ∼ p(y ̃|θ ).
17