CS计算机代考程序代写 algorithm chain Bayesian The Gibbs Sampler

The Gibbs Sampler
Bayesian Statistics Statistics 4224/5224 Spring 2021
March 11, 2021
1

Homework 4
Homework 4 is nominally due before class on Tue, March 16. Courseworks will accept submissions through the end of the day on Wed, March 17, after which no later papers will be accepted.
1. In problem 1 you will use the Bayesian CLT to approximate a posterior that is, in truth, a truncated gamma distribution. This is similar to the first example discussed on Feb 23, with R code posted as ‘Example04 NormalApprox’ (though it’s a gamma rather than beta posterior distribution). You will have to solve the mean and variance for the normal ap- proximation by finding the MAP estimator (posterior mode) and inverse Fisher information. In part (d), plot the exact posterior and the normal approximation on the same axes.
2

2. We did this problem together, the R code was posted as ‘Example10b RS IS’ and discussed on Feb 23. You should feel free to use that code, but at least change the seed for your random number generator — it’s no fun to get exactly the same answer. Note that part (d) calls for a rejection sampler with a fixed number of proposals and asks you to report the effective sample size, which is simply the num- ber of those proposals that are accepted. Which method is more efficient in your implementation, rejection sampling or importance sampling?
3. On problem 3 I was nice and posed the problem as ‘Show that the full conditionals are …’ rather than ‘Find the full conditionals for ….’ Arrange the plots of the posteriors for the σi2 in a 2 × 5 display, and use a consistent scale (same xlim and ylim on each plot) to facilitate comparison.
3

4. Problem 4 is like the example at the end of today’s lecture, with R code provided as ‘Example11c Gibbs.’ The code is basically ready to go for part (a), you just need to update the data entry part. Parts (b) and (c) call for some posterior predictions. If you’re cool, you’ll plot the posteriors for θ ̃ and y ̃ on the same axes, so one can clearly see the additional uncertainty about a single observation versus a long-run av- erage.
5. Back to the NBA free throws data! Here we consider an informative prior based on each player’s overall free throw percentage. Part (c) requires a ‘Metropolised Gibbs sampler,’ which is quite challenging to implement and is not assigned.
4

The remainder of these notes summarize Sections 11.1, 11.3 and 11.6 of Bayesian Data Analysis, by Andrew Gelman et al.
Gibbs sampler
Suppose the parameter vector θ has been divided into d compo- nents or subvectors, θ = (θ1, . . . , θd).
Each iteration of the Gibbs sampler cycles through the subvec- tors of θ, drawing each subset conditional on the value of all the others; there are thus d steps in iteration t.
At each iteration t, an ordering of the d subvectors of θ is chosen and, in turn, each θjt is sampled from the conditional distribution given all the other components of θ.
5

The algorithm
Given a starting point θ0 = (θ10,…,θd0), and a target density p(θ1,…,θd|y), the d-dimensional Markov chain
􏰤θt=(θ1t,…,θdt):t=1…,T􏰥 ,
for which p(θ|y) is the unique stationary density, is generated by
j = 1,…,d,
ues of the other components of θ, which are the iteration t values for the components already updated and the iteration t−1 values for the others.
6
the following update rule.
Given θt−1 = (θt−1,…,θt−1), generate θt = (θt ,…,θt) by, for
1d1d θjt ∼p(θj|θ1t,…,θjt−1,θt−1,…,θt−1,y).
j+1 d
Thus each subvector θj is updated conditional on the latest val-

Convergence properties
As T ↑, the marginal distribution of (θ1T,…,θdT) approaches the target distribution, p(θ1,…,θd|y), no matter the starting value (θ10, . . . , θd0).
More importantly, for any scalar-valued function h, 1􏰄Tta.s. 􏰒
T
justifying the use of the Gibbs sampler to approximate features of the posterior distribution.
7
h(θ ) −→ E[h(θ)|y] = h(θ)p(θ|y)dθ as T → ∞ ,
t=1

Full conditionals
Definition: The set of conditional distributions p(θj|θ(−j), y)
where
θ(−j) = (θ1,…,θj−1,θj+1,…,θd)
are called the full conditional distributions, or full conditionals,
for the joint posterior distribution p(θ1,…,θd|y).
For many problems involving standard statistical models, it is possible to sample directly from most or all of the posterior full conditionals.
8

Example
Consider a single observation (y1,y2) from a bivariate normally distributed population with unknown mean θ = (θ1, θ2) and known
􏰐1 ρ􏰑 covariance matrix ρ 1 .
With a uniform prior distribution on θ, the posterior distribution is
􏰐θ 􏰑􏰣􏰣 􏰙􏰐y 􏰑 􏰐1 ρ􏰑􏰚 11
θ 􏰣􏰣y∼Normal y ,ρ1 . 2􏰣2
Although it is simple to draw directly from the joint posterior distribution of (θ1,θ2), for the purpose of exposition we demon- strate the Gibbs sampler here.
9

From properties of the multivariate normal distribution, the full conditionals are
θ1|θ2,y ∼ Normal􏰋y1 + ρ(θ2 − y2),1 − ρ2􏰌 θ2|θ1,y ∼ Normal􏰋y2 + ρ(θ1 − y1),1 − ρ2􏰌 .
The Gibbs sampler proceeds by alternately sampling from these two normal distributions.
Consider the case ρ = 0, data (y1,y2) = (0,0), and four inde- pendent sequences started at (±2.5,±2.5).
10

Figure: Four independent sequences of the Gibbs sampler with overdispersed starting values.
First 10 updates of 4 Gibbs samplers
First 200 updates of 4 Gibbs samplers
Final 500 draws of 4 Gibbs samplers


● ●●
● ● ● ●●●●●●●●●●●●●●● ●● ● ●● ●● ●●● ●● ●●● ●
●●●●●●● ●●●●● ●●●● ● ●●● ● ● ●●● ●●●●● ●● ● ●



●● ● ●●●●● ●●●●●●●●●●● ●●● ●●● ●●●● ●●●●●● ●●●●●●●●●●●●●● ●● ● ● ●● ●●●●●●●●●●●●● ● ●● ●●
●● ● ●


●●
●● ●
● ●
● ● ●● ●●●● ●●●●●● ●●●● ●●● ●●● ● ● ● ●● ●● ●●●●●●●●●●●●●●● ●● ●● ●
● ●
● ● ●

●● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●
● ● ●●●●●●●●●●●●●●●●●●●●● ● ●●
● ●
● ● ●

●● ●●● ● ●●●●●●●●●●●●●●● ●●●● ● ● ● ● ●● ●
●●

●● ●
●● ● ●●●● ●●●●●●●●●●●●● ●● ●
● ● ●●●●●● ●●●●●●● ●●●●●●
● ● ●● ● ●●● ●●●●●●●●●● ●● ●●● ● ●
●●●●●●●●●●●●●●●● ●● ● ● ● ● ● ● ● ●●●●● ● ● ●
● ●● ●●● ● ●●● ●● ● ●● ● ● ●●●● ●
●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●
● ●●● ●●● ● ● ●●●
●●●● ●


●●●● ●●●● ● ● ●●
● ●●●●●●● ●●●●●●●●● ● ● ●●●●●●●●●●●●●●●● ● ●


● ●● ●●●
●●●●●●● ●●●●●
●●●●●
●● ●●●●●● ●● ●
● ●●●●●●●●●●●●●●●●●● ●
●● ● ●●●●●● ● ●●●●●●●
●●● ●●●●●●●●●●●●●●●●● ● ●●●●●●● ● ●●
●●●●●●●●● ●● ●●● ● ● ●●● ●● ●●●●●

●●


● ●●●●● ● ●● ●●●●●●●●● ●● ●●● ●● ●● ●●● ●● ●●●●●● ● ●●●●●● ●●●●● ●● ●●●●●●●

●● ●

● ●● ●●●●● ●●● ●●●●●● ●●● ● ●●
● ● ● ●●● ●●●● ● ● ●● ●●●●● ● ● ● ●● ●●● ●●●●●●●●●● ●●●● ● ●●
●● ● ●
−4 −2 0 2 4
theta1
−4 −2 0 2 4
theta1
−4 −2 0 2 4
theta1
The R code to produce these figures is given in
Courseworks → Files → Examples → Example11a Gibbs .
11
theta2
−4 −2 0 2 4
theta2
−4 −2 0 2 4
theta2
−4 −2 0 2 4

Using Gibbs and Metropolis as building blocks
The Gibbs sampler and the Metropolis algorithm can be used in various combinations to sample from complicated distributions. If some of the full conditionals for a joint posterior can be sam- pled from directly and some cannot, then the parameters can be updated one at a time, with the Gibbs sampler used where pos- sible and one-dimensional Metropolis updating used otherwise.
Gibbs sampler with approximations
For some problems, sampling from some or all of the full condi- tionals p(θj|θ(−j),y) is impossible, but one can construct approx- imations from which sampling is possible. The general form of the Metropolis-Hastings algorithm can be used to compensate for the approximation.
12

Example: Hierarchical normal model
Extending the problem discussed on Feb 09 (Example05b), by allowing an unknown data variance σ2.
Data from a small experiment
Coagulation time in seconds for blood drawn from 24 animals randomly allocated to four different diets.
Diet Measurements
A 62, 60, 63, 59
B 63, 67, 71, 64, 65, 66
C 68, 66, 71, 67, 68, 68
D 56, 62, 60, 61, 63, 64, 63, 59
13

The model
Under the hierarchical normal model, data yij, i = 1, . . . , nj and j = 1,…,J are independently normally distributed within each of J groups, with means θj and common variance σ2.
The total number of observations is n = 􏰃Jj=1 nj.
The group means are assumed to follow a normal distribution
with unknown mean μ and variance σ2.
A uniform prior distribution is assumed for (μ, log σ, τ ), with σ > 0 and τ > 0; equivalently p(μ, σ, τ ) ∝ 1/σ.
14

The joint posterior density of all the parameters is
J Jnj
p(θ,μ,σ,τ|y) ∝ σ−1 􏰅 N(θj|μ,τ2) 􏰅 􏰅 N(yij|θj,σ2) .
j=1 j=1 i=1
Starting values
Reasonable starting values are easy to come by in this problem. Take θ0 to be the sample mean y ̄ for each j, take μ0 to be the
j
average of the θj0, and τ0 to be their standard deviation.
j
We can take σ0 to be the average of the within-group sample standard deviations.
15

Gibbs sampler
The conditional distributions for this model all have simple con- jugate forms:
Conditional posterior distribution of each θj. The factors in the joint posterior density that involve θj are the Normal(μ,τ2) prior distribution, and the normal likelihood in the jth group, yij for i = 1 . . . , nj. The conditional posterior distribution of each θj given the other parameters in the model is
where
j (−j)
n y ̄
θ|θ ,μ,σ,τ,y∼Normal(θˆ,V ),
j θj μ+jj 1
θˆ=τ2 σ2 and V= . j 1 + nj θj 1 + nj
τ2 σ2 τ2 σ2
16

Conditional posterior distribution of μ. Conditional on y and the other model parameters, μ has a normal distribution determined by the θj’s:
where
μ|θ,σ,τ,y ∼ Normal(μˆ,τ2/J) , 1 􏰄J
μˆ=J θj. j=1
Conditional posterior of σ2. The conditional posterior for σ2 has the form corresponding to a normal variance with known mean; there are n observations yij with means θj. The conditional posterior distribution is
σ2|θ, μ, τ, y ∼ Inv-χ2(n, σˆ2) ,
17

where
with
1 J nj
σˆ2= 􏰄 􏰄(yij−θj)2.
n j=1 i=1
Conditional posterior distribution of τ2. Conditional on the data and the other model parameters, τ2 has a scaled inverse-χ2 dis- tribution, with parameters depending only on μ and θ:
222
τ |θ,μ,σ,y∼Inv-χ (J−1,τˆ ),
2 1 􏰄J 2
τˆ =J−1
(θj−μ) .
j=1
The expressions for τ 2 have J − 1 degrees of freedom instead of J because p(τ) ∝ 1 rather than 1/τ.
18

Numerical results with coagulation data
We illustrate the Gibbs sampler with the coagulation data.
theta1
theta2
theta3
theta4
mu
sigma 1.8
tau 1.9
2.2 2.4 2.7 3.4
3.5 5.1 8.2 28.6
See
2.5pct 25pct 50pct 75pct 97.5pct
58.8 60.5 61.2 62.0 63.6
64.0 65.3 65.9 66.5 67.8
65.9 67.1 67.8 68.4 69.7
59.5 60.6 61.2 61.7 62.8
54.7 62.2 64.0 65.8 73.9
Courseworks → File → Examples → Example11c Gibbs . 19