1 The Gibbs sampler as a special case of the Metropolis-Hastings algorithm (Based on Gelman et al BDA 2nd edn, p.293)
1.1 Preliminaries
Suppose the vector of unknown parameters is θ = θ1, . . . , θM . The unknowns could include missing data as well as model parameters but we will just use the generic notation θ to represent the unknowns of interest. Our aim is to compute the posterior distribution p(θ|data).
The tth iteration of a consists of a series of steps. In each of these steps a draw is made from the full conditional distribution of each com- ponent of θ given all other components of θ. The Gibbs sampling algorithm can be described as follows:
1 initialise θ ,θ ,…,θ to θ(0),θ(0),…,θ(0) 23M23M
2 for(tin1:T){
2.1 draw θ(t) from p(θ |θ(t−1), . . . , θ(t−1), data)
112M
2.2 draw θ(t) from p(θ |θ(t), θ(t−1) . . . , θ(t−1), data)
2213M .
2.M draw θ(t) from p(θ |θ(t),θt) …,θ(t) ,data) } M M12M−1
At convergence, the parameter values generated from this algorithm can be regarded as draws from the posterior p(θ|data).
To simplify the presentation we will use notation such as θj to denote jth parameter and θ−j to denote all parameters in θ except θj. Thus for any j in 1,…,M we can can write θ = (θj,θ−j).
1
1.2 Relating the Gibbs sampler to the Metropolis-Hastings algorithm
The Metropolis-Hastings algorithm works with the full parameter vector, θ,
and jumps through the parameter space using, at the tth iteration, the jump-
ing distribution Jt(θnew|θ(t−1)). To see the connection between the Gibbs
sampler and Metropolis-Hastings algorithm we first note that, for any j in
(1 . . . M), we can write θnew = (θnew, θnew) and θ(t−1) = (θ(t−1), θ(t−1)). The j−j j−j
tth iteration of Gibbs sampler comprises M steps, corresponding to the draws
from each of the full conditional posterior distributions. Each of these steps
can be thought of as a jump in the Metropolis-Hastings sense but with a
jumping density that allows only jumps to points in the θ space that agree
with the current point in all components except the one being updated. Let
θ(t−1)∗ denote the subvector of θ that does not contain jth element and has −j
each component set to its most currently updated value. Thus if θj, refers to the first element to be updated, θ(t−1)∗ = θ(t−1), whereas if θj is updated
−j −j
at a later step in the updating sequence θ(t−1)∗ will comprise values updated
−j
at the tth iteration for components updated earlier than θj in the updating
sequence, and values corresponding to the (t − 1)th iteration for components of θ updated later in the updating sequence than θj.
At the jth step of the tth Gibbs sampler iteration, the full conditional p(θnew|θ(t−1),data) can be viewed as a jump from θ(t−1)∗ to the proposed
j −j
new point θnew using a jumping distribution that ensures
θnew = θ(t−1)∗. (1) −j −j
This idea can be formalised by defining the jumping distribution at each step of the tth iteration at the Gibbs sampler as follows:
for j = 1,…,M
draw from
Jj,t(θnew|θ(t−1)∗) =
p(θnew|θ(t−1)∗, data) if θnew = θ(t−1)∗
j −j −j −j (2)
0 otherwise 2
The jumping distribution (2) enforces the condition that at each step j of the tth iteration the only possible jumps are to points which are identical with respect to all elements except those in the jth block.
1.3 Deriving the Metropolis-Hastings acceptance ratio
Suppose θnew is a draw from the jth full conditional of a at j
the tth iteration. The jumping distribution (2) ensures
θnew =(θnew,θ(t−1)∗). (3)
j −j
In addition, since for any j, and any value of θ, θ = (θj , θ−j )
p(θ|data) = p(θj |θ−j , data)p(θ−j |data). (4)
The Metropolis-Hastings ratio at the jth step of the tth iteration is by definition
rMH,j(θnew,θ(t−1)∗) = p(θnew|data)/Jj,t(θnew|θ(t−1)∗) (5) p(θ(t−1)∗|data)/Jj,t(θ(t−1)∗|θnew)
p(θnew|data)/p(θnew|θ(t−1)∗, data)
= j−j (6)
p(θ(t−1)∗|data)/p(θ(t−1)∗|θnew, data) j −j
But since our jumping density ensures θnew = θ(t−1)∗ we can replace θnew −j−j −j
with θ(t−1)∗ in the denominator of (6). This gives −j
p(θnew|data)/p(θnew|θ(t−1)∗, data) rMH,j(θnew,θ(t−1)∗) = j −j (7)
p(θ(t−1)∗|data)/p(θ(t−1)∗|θ(t−1)∗, data) j −j
Now, since θ = (θj , θ−j ), the joint posterior density, p(θ|data), can be written as the product of a conditional and marginal density as in (4). Applying the decomposition (4) in the numerator and denominator of (7) gives
p(θnew|θnew, data)p(θnew|data)/p(θnew|θ(t−1)∗, data) rMH,j(θnew,θ(t−1)∗) = j −j −j j −j
p(θ(t−1)∗|θ(t−1)∗, data)p(θ(t−1)∗|data)/p(θ(t−1)∗|θ(t−1), data) j −j −j j −j
p(θnew|θnew, data)p(θnew|data)/p(θnew|θ(t−1)∗, data) =j−j −j j−j
p(θ(t−1)∗|data) −j
3
(8)
Since our jumping density requires θnew = θ(t−1)∗ (8) can be written −j −j
(9)
rMH,j(θnew,θ(t−1)∗) =
= −j
−j
p(θnew|θ(t−1)∗data)p(θ(t−1)∗|data)/p(θnew|θ(t−1)∗, data)
j −j
−j j p(θ(t−1)∗|data)
−j
p(θ(t−1)∗|data) p(θ(t−1)∗|data)
−j =1
Thus a Gibbs sampler is equivalent to Metropolis-Hastings sampler with jumping distributions defined by the full conditionals as in (2) and acceptance probability equal to 1.
4