程序代写代做代考 chain algorithm 1 Outline proof of convergence of Metropolis

1 Outline proof of convergence of Metropolis

sampler to the target posterior distribution

(based on Gelman et al: BDA, 3rd edn pp

279-80).

1.1 Preliminaries

Suppose we are interested in the posterior distribution

p(θ|data) =
p(data|θ)p(θ)∫
p(data|θ)p(θ) dθ

∝ p(data|θ)p(θ) = q(θ|data)

and propose to approximate the distribution using a Metropolis (or Metropo-

lis – Hastings algorithm)

Let Jt(θ
new|θ(t−1)) denote a symmetric jumping or proposal distribution

from which a jump from the value of the θ at the (t − 1)th iteration of
the chain is generated. In this context symmetry means Jt(θ

new|θ(t−1)) =
Jt(θ

(t−1)|θnew) for all values of θnew and θ(t−1). For the Metropolis-Hastings
algorithm the requirement that Jt() be symmetric is relaxed.

1. Assume the sequence of parameter values θ̃ = (θ(0),θ(1), . . . ,θ(N)) con-

stitute a Markov chain, i.e.

p(θ(t)|θ(t−1), . . . ,θ(0)) = p(θ(t)|θ(t−1)) (1)

This holds by construction. (Here superscripts are iteration indices,

not powers)

2. Assume the chain is

(a) aperiodic (no deterministic movement )

(b) not transient (positive recurrent– some chance of returning to the

initial value);

jump to any state).

3. The above conditions imply the Markov Chain has a unique stationary

distribution.

4. Hence the the marginal distributions from which the θ values are drawn

converge to a unique stationary distribution.

5. If we can show that the stationary distribution of the Markov Chain is

the target posterior distribution p(θ|data) then we are done.

Definition of stationarity

if g(θ) is some density such that if θ(t−1),θ(t) are successive values from the

Markov Chain θ̃,

p(θ(t−1) = θa) = g(θa)⇒ p(θ(t) = θa) = g(θa) for all θa (2)

then g is a stationary distribution of the Markov Chain. That is if θ(t−1) being

a draw from the density g() implies θ(t) is also a draw from the density g()

then g() is a stationary distribution of the Markov Chain. Here, I am using

notation such as p(θ(t) = θa) to refer to a density function for θ
(t) evaluated

at the specific point θa (so p(θ
(t) = θa)∀θa refers to the full density function.)

1.2 Outline Proof that the posterior distribution is the

stationary distribution of the chain

Suppose the chain is aperiodic, not transient and irreducible and has been

running for some time. Consider the chain at the (t − 1)th iteration and
suppose that the marginal distribution of θ(t−1) is the target distribution

p(θ|data), that is p(θ(t−1) = θa) = p(θ = θa|data), for all θa. If we can
show that this implies the marginal distribution of p(θ(t)) is also p(θ|data)
meaning p(θ(t) = θa) = p(θ = θa|data)∀θa then we are done. In fact we
need only show that the (t − 1)th draw in the chain is a draw from the
target posterior distribution implies (θ(t),θ(t−1)) are exchangeable, since the

marginals of exchangeable r.v’s are identical (proved in Assignment 4). Recall

that exchangeability means

p(θ(t) = θa, θ
(t−1) = θb) = p(θ

(t) = θb, θ
(t−1) = θa), for all θa, θb

We will sometimes just write p(θa|data) to refer to the value of the poste-
rior density function for θ evaluated at the point θa, i.e p(θa|data) is short-
hand for p(θ = θa|data).

Let θa,θb be arbitrary points drawn from the target posterior density but

labelled so that

p(θb|data) ≥ p(θa|data) (3)

implying that the Metropolis acceptance ratio for a jump from θa to θb is

rM(θb,θa) =
p(θb|data)
p(θa|data)

=
q(θb|data)
q(θa|data)

≥ 1. (4)

Consequently, a jump from θa to θb will always be accepted, since the accep-

tance probability for a proposed jump in that direction is min (1, rM(θb,θa))) .

Note in (4 ), the second equality follows because the normalising constant

cancels from the numerator and denominator of the ratio of posterior densi-

ties. As usual, we need only be able to compute the unnormalized posterior

densities.

On the other hand, the Metropolis acceptance ratio for a proposed jump

from θb to θa is

rM(θa,θb) =
p(θa|data)
p(θb|data)

=
q(θa|data)
q(θb|data)

< 1, (5) so a jump from θb to θa will be accepted with probability rM(θa,θb). The joint probability density for observing θa at the (t− 1)th iteration and θb at the tth iteration is p(θ(t−1) = θa,θ (t) = θb) = p(θ (t−1) = θa)p(θ (t) = θb|θ(t−1) = θa) = p(θ = θa|data)Jt(θb|θa)×min (1, rM(θb,θa)) (6) = p(θ) = θa|data)Jt(θb|θa) (7) = p(θa|data)Jt(θb|θa) (8) 3 since rM(θa,θb) ≥ 1 because of the way we have arranged θa and θb. In (6) I have used the assumption that θ(t−1) is a draw from p(θ|data). The joint probability density for observing θb at iteration (t− 1) and θa at iteration (t) is p(θ(t−1) = θb,θ (t) = θa) = p(θb|data)p(θ(t) = θa|θ(t−1) = θb) = p(θb|data)Jt(θa|θb)×min (1, rM(θa,θb)) = p(θb|data)Jt(θa|θb) p(θa|data) p(θb|data) = p(θa|data)Jt(θa|θb) = p(θa|data)Jt(θb|θa) (9) where (9) follows because of the symmetry of Jt(). Thus, comparing (8) and (9) it is clear that p(θ(t−1) = θa,θ (t) = θb) = p(θ (t−1) = θb,θ (t) = θa) and this holds for any θa,θb. That is (θ (t−1),θ(t) are exchangeable hence p(θ(t)) = p(θ(t−1)) = p(θ|data) since we assumed θ(t−1) is a draw from the posterior distribution. Thus we have shown p(θ(t−1))=p(θ|data) implies p(θ(t) = p(θ|data) In other words, the posterior distribution is a stationary distribution of the Markov Chain; In view of the uniqueness of the stationary distribution the chain converges to the posterior distribution. In the case of the Metropolis-Hastings algorithm the proof is very similar because the acceptance probabilities adjust for the assymetry in Jt(). 4

Related Posts