CS计算机代考程序代写 chain algorithm Statistical Machine Learning

Statistical Machine Learning

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Outlines

Overview
Introduction
Linear Algebra

Probability

Linear Regression 1

Linear Regression 2

Linear Classification 1

Linear Classification 2

Kernel Methods
Sparse Kernel Methods

Mixture Models and EM 1
Mixture Models and EM 2
Neural Networks 1
Neural Networks 2
Principal Component Analysis

Autoencoders
Graphical Models 1

Graphical Models 2

Graphical Models 3

Sampling

Sequential Data 1

Sequential Data 2

1of 821

Statistical Machine Learning

Christian Walder

Machine Learning Research Group
CSIRO Data61

and

College of Engineering and Computer Science
The Australian National University

Canberra
Semester One, 2020.

(Many figures from C. M. Bishop, “Pattern Recognition and Machine Learning”)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

566of 821

Part XVI

Sampling

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

567of 821

Approximating π – Buffon’s Needle (1777)

Drop length c needle on parallel lines distance d apart
Needle falls perpendicular:
Probability of crossing the line is c/d.
Needle falls at an arbitrary angle a:
Probability of crossing the line c sin(a)/d.
Every angle is equally probable. Calculate the mean:

p(crossing) =
c
d

∫ π
0

sin(a) dp(a) =
1
π

c
d

∫ π
0

sin(a) da =
2
π

c
d

(iv) n crossings in N experiments results in nN ≈
2
π

c
d

d c

(i) Needle falls
perpendicular (a = π/2).

d c sin a
a

(ii) Needle falls at
arbitrary angle a.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

568of 821

Motivation

For most probabilistic models of practical interest, exact
inference is intractable. Need approximation.
Numerical sampling (Monte Carlo methods).
Fundamental problem : Find the expectation of some
function f (z) w.r.t. a probability distribution p(z)

E [f ] =
∫

f (z) p(z) dz

Key idea : Draw z(l), l = 1. . . . ,L independent samples
from p(z) and approximate the expectation by

f̂ =
1
L

L∑
l=1

f (z(l))

Problem: How to obtain independent samples from p(z) ?

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

569of 821

Approximating the Expectation of f (z)

Samples must be independent, otherwise the effective
sample size is much smaller than the apparent sample
size.
For dependent sequences of samples: if f (z) is small in
regions where p(z) is large (or vice versa) : need large
sample sizes to catch contributions from all regions.

p(z) f(z)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

570of 821

Sampling from the Uniform Distribution

In a computer usually via pseudorandom number
generator : an algorithm generating a sequence of
numbers that approximates the properties of random
numbers.
Use a mathematically well crafted pseudorandom number
generator.
From now on we will assume that we have a good
pseudorandom number generator for uniformly distributed
data available.
If you don’t trust any algorithm :
Three carefully adjusted radio receivers picking up
atmospheric noise to provide real random numbers at
http://www.random.org/

http://www.random.org/

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

571of 821

Sampling from Standard Distributions

Goal: Sample from p(y) which is given in analytical form.
Suppose uniformly distributed samples of z in the interval
(0, 1) are available.
Calculate the cumulative distribution function

h(y) =
∫ y
−∞

p(x) dx

Transform the samples from U(z | 0, 1) by

y = h−1(z)

to obtain samples y distributed according to p(y).

Easy to check that h is then the CDF of y:

p(y < x) = p(h−1(z) < x) = p(z < h(x)) = h(x). Statistical Machine Learning c©2020 Ong & Walder & Webers Data61 | CSIRO The Australian National University Motivation Sampling from the Uniform Distribution Sampling from Standard Distributions Rejection Sampling Adaptive Rejection Sampling Importance Sampling Markov Chain Monte Carlo Gibbs Sampling 572of 821 Sampling from Standard Distributions Goal: Sample from p(y) which is given in analytical form. If a uniformly distributed random variable z is transformed using y = h−1(z) then y will be distributed according to p(y). p(y) h(y) y0 1 Statistical Machine Learning c©2020 Ong & Walder & Webers Data61 | CSIRO The Australian National University Motivation Sampling from the Uniform Distribution Sampling from Standard Distributions Rejection Sampling Adaptive Rejection Sampling Importance Sampling Markov Chain Monte Carlo Gibbs Sampling 573of 821 Sampling from the Exponential Distribution Goal: Sample from the exponential distribution p(y) = { λe−λy 0 ≤ y 0 y < 0 with rate parameter λ > 0.
Suppose uniformly distributed samples of z in the interval
(0, 1) are available.
Calculate the cumulative distribution function

h(y) =
∫ y
−∞

p(x) dx =
∫ y

0
λe−λx dx = 1− e−λy

Transform the samples from U(z | 0, 1) by

y = h−1(z) = −
1
λ

ln(1− z)

to obtain samples y distributed according to the
exponential distribution.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

574of 821

Sampling from Multivariate Distributions

Generalisation to multiple variables is straightforward
Consider change of variables via the Jacobian

p(y1, . . . , yM) = p(z1, . . . , zM)
∣∣∣∣ ∂(z1, . . . , zM)∂(y1, . . . , yM)

∣∣∣∣
Technical challenge: Multiple integrals; inverting nonlinear
functions of multiple variables.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

575of 821

Sampling the Gaussian Distribution – Box-Muller

1 Generate pairs of uniformly distributed random numbers
z1, z2 ∈ (−1, 1) (e.g. zi = 2z− 1 for z from U(z | 0, 1))

2 Discard any pair (z1, z2) unless z21 + z
2
2 ≤ 1. Results in a

uniform distribution inside of the unit circle p(z1, z2) = 1/π.
3 Evaluate r2 = z21 + z

2
2 and

y1 = z1

(
−2 ln r2

)1/2
y2 = z2

(
−2 ln r2

)1/2
4 y1 and y2 are independent with joint distribution

p(y1, y2) = p(z1, z2)
∣∣∣∣ ∂(z1, z2)∂(y1, y2)

∣∣∣∣ = 1√2π e−y21/2 1√2π e−y22/2

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

576of 821

Sampling the Multivariate Gaussian

Generate x ∼ N (0, I)
let y = Ax + b
E y = b

E [(y− E y)>(y− E y)] = A>E [x>x]A
= A>A

We know y is Gaussian (why?). So to sample y ∼ N (m,Σ) put
b = m
A>A = Σ

A is the Cholesky factor of Σ

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

577of 821

Rejection Sampling

Assumption 1 : Sampling directly from p(z) is difficult, but
we can evaluate p(z) up to some unknown normalisation
constant Zp

p(z) =
1
Zp

p̃(z)

Assumption 2 : We can draw samples from a simpler
distribution q(z) and for some constant k and all z holds

kq(z) ≥ p̃(z)

z0 z

kq(z0)
kq(z)

p̃(z)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

578of 821

Rejection Sampling

1 Generate a random number z0 from the distribution q(z).
2 Generate a number from the u0 from the uniform

distribution over [0, k q(z0)].
3 If u0 > p̃(z0) then reject the pair (z0, u0).
4 The remaining pairs have uniform distribution under the

curve p̃(z).
5 The z values are distributed according to p(z).

z0 z

kq(z0)
kq(z)

p̃(z)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

579of 821

Rejection Sampling – Example

Consider the Gamma Distribution for a > 1

Gam(z | a, b) =
baza−1 exp(−bz)

Γ(a)

Suitable q(z) could be (similar to the Cauchy distribution)

q(z) =
k

1 + (z− c)2/b2

Samples z from q(z) by using uniformly distributed y and
transformation z = b tan y + c for c = a− 1, b2 = 2a− 1 and
k as small as possible for kq(z) ≥ p̃(z).

p(z)

0 10 20 30
0

0.05

0.1

0.15

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

580of 821

Adaptive Rejection Sampling

Suitable form for the proposal distribution q(z) might be
difficult to find.
If p(z) is log-concave (ln p(z) has nonincreasing
derivatives), use the derivatives to construct an envelope.

1 Start with an initial grid of points z1, . . . , zM and construct
the envelope using the tangents at the p(zi), i = 1, . . . ,M).

2 Draw a sample from the envelop function and if accepted
use it to calculate p(z). Otherwise, use it to refine the grid.

z1 z2 z3 z

ln p(z)

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

581of 821

Adaptive Rejection Sampling – Example

The piecewise exponential distribution is defined as

p(z) = kmλme
−λm(z−zm−1) ẑm−1,m < z ≤ ẑm,m+1 where ẑm−1,m is the point of intersection of the tangent lines at zm−1 and zm, λm is the slope of the tangent at zm and km accounts for the corresponding offset. z1 z2 z3 z ln p(z) Statistical Machine Learning c©2020 Ong & Walder & Webers Data61 | CSIRO The Australian National University Motivation Sampling from the Uniform Distribution Sampling from Standard Distributions Rejection Sampling Adaptive Rejection Sampling Importance Sampling Markov Chain Monte Carlo Gibbs Sampling 582of 821 Rejection Sampling - Problems Need to find a proposal distribution q(z) which is a close upper bound to p(z); otherwise many samples are rejected. Curse of dimensionality for multivariate distributions. z0 z u0 kq(z0) kq(z) p̃(z) Statistical Machine Learning c©2020 Ong & Walder & Webers Data61 | CSIRO The Australian National University Motivation Sampling from the Uniform Distribution Sampling from Standard Distributions Rejection Sampling Adaptive Rejection Sampling Importance Sampling Markov Chain Monte Carlo Gibbs Sampling 583of 821 Importance Sampling Directly calculate Ep [f (z)] w.r.t. some p(z). Does not sample p(z) as an intermediate step. Again use samples z(l) from some proposal q(z). E [f ] = ∫ f (z) p(z) dz = ∫ f (z) p(z) q(z) q(z) dz ≈ 1 L L∑ l=1 p(z(l)) q(z(l)) f (z(l)) Statistical Machine Learning c©2020 Ong & Walder & Webers Data61 | CSIRO The Australian National University Motivation Sampling from the Uniform Distribution Sampling from Standard Distributions Rejection Sampling Adaptive Rejection Sampling Importance Sampling Markov Chain Monte Carlo Gibbs Sampling 584of 821 Importance Sampling - Unnormalised Consider unnormalised p(z) = p̃(z)Zp and q(z) = q̃(z) Zq . It follows then that, with r̃l = p̃(z(l)) q̃(z(l)) , E [f ] = ∫ f (z) p(z) dz = Zq Zp ∫ f (z) p̃(z) q̃(z) q(z) dz ≈ Zq Zp 1 L L∑ l=1 r̃l f (z (l)). Using the same samples (and f (z) = 1) gives Zp Zq ≈ 1 L L∑ l=1 r̃l ⇒ E [f ] ≈ L∑ l=1 r̃l∑L m=1 r̃m f (z(l)) Statistical Machine Learning c©2020 Ong & Walder & Webers Data61 | CSIRO The Australian National University Motivation Sampling from the Uniform Distribution Sampling from Standard Distributions Rejection Sampling Adaptive Rejection Sampling Importance Sampling Markov Chain Monte Carlo Gibbs Sampling 585of 821 Importance Sampling - Key Points Importance weights rl correct the bias introduced by sampling from the proposal distribution q(z) instead of the wanted distribution p(z). Success depends on how well q(z) approximates p(z). If p(z) > 0 in same region, then q(z) > 0 necessary.

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

586of 821

Markov Chain Monte Carlo

Goal: sample from p(z).
Generate a sequence using a Markov chain.

1 Generate a new sample z? ∼ q(z | z(l)), conditional on the
previous sample z(l).

2 Accept or reject the new sample according to some
appropriate criterion.

z(l+1) =

{
z? if accepted
z(l) if rejected

3 For an appropriate proposal and corresponding acceptance
criterion, as l→∞, z(l) approaches an independent sample
of p(z).

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

587of 821

Metropolis Algorithm

1 Choose a symmetric proposal distribution
q(zA | zB) = q(zB | zA).

2 Accept the new sample z? with probability

A(z?, z(r)) = min
(

1,
p̃(z?)

p̃(z(r))

)
e.g., let u ∼ Uniform(0, 1) and accept if p̃(z

?)
p̃(z(r))

> u.
3 Unlike rejection sampling we include the previous sample

on rejection of the proposal:

z(l+1) =

{
z? if accepted
z(r) if rejected

Statistical Machine
Learning

c©2020
Ong & Walder & Webers

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

588of 821

Metropolis Algorithm – Illustration

Sampling from a Gaussian Distribution (black contour
shows one standard deviation).
Proposal is q(z|z(l)) = N (z|z(l), σ2I) with σ = 0.2.
150 candidates generated; 43 rejected.

0 0.5 1 1.5 2 2.5 3
0

0.5

1.5

2.5

accepted steps, rejected steps.

Statistical Machine
Learning

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

589of 821

Markov Chain Monte Carlo – Why it works

A First Order Markov Chain is a series of random variables
z(1), . . . , z(M) such that the following property holds

p(z(m+1) | z(1), . . . , z(m)) = p(z(m+1) | z(m))

Marginal probability

p(z(m+1)) =
∑
z(m)

p(z(m+1) | z(m)) p(z(m))

=
∑
z(m)

Tm(z
(m) | z(m+1)) p(z(m))

where Tm(z(m) | z(m+1)) are the transition probabilities.

z1 z2 zm+1zm

Statistical Machine
Learning

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

590of 821

Markov Chain Monte Carlo – Why it works

Marginal probability

p(z(m+1)) =
∑
z(m)

Tm(z
(m) | z(m+1)) p(z(m))

A Markov chain is called homogeneous if the transition
probabilities are the same for all m, denoted by T(z′, z).
A distribution is invariant, or stationary, with respect to a
Markov chain if each step leaves the distribution invariant.
For a homogeneous Markov chain, the distribution p?(z) is
invariant if

p?(z) =
∑

z′
T(z′, z) p?(z′). (1)

(Note: There can be many. If T is the identity matrix, every
distribution is invariant.)

Statistical Machine
Learning

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

591of 821

Markov Chain Monte Carlo – Why it works

Detailed balance

p?(z) T(z, z′) = p?(z′) T(z′, z). (2)

is sufficient (but not necessary) for p?(z) to be invariant (to
check, put (2) into (1)). (A Markov chain that respects the
detailed balance is called reversible.)
A Markov chain is ergodic if it converges to the invariant
distribution irrespective of the choice of the initial
conditions. The invariant distribution is then called
equilibrum.
An ergodic Markov chain can have only one equilibrium
distribution.
Why is it working? Choose the transition probabilities T to
satisfy the detailed balance for our goal distribution p(z).

Statistical Machine
Learning

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

592of 821

Markov Chain Monte Carlo – Metropolis-Hastings

Generalisation of the Metropolis algorithm for
nonsymmetric proposal distributions qk.
At step τ , draw a sample z? from the distribution qk(z | z(τ))
where k labels the set of possible transitions.
Accept with probability

A?k (z, z
(τ)) = min

(
1,

p̃(z?) qk(z(τ) | z?)
p̃(z(τ)) qk(z? | z(τ))

)
Choice of proposal distribution critical.
Common choice : Gaussian centered on the current state.

small variance→ high acceptance rate, but slow walk
through the state space; samples not independent
large variance→ high rejection rate

Statistical Machine
Learning

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

593of 821

Markov Chain Monte Carlo – Metropolis-Hastings

Transition probability of this Markov chain is

T(z, z′) = qk(z
′ | z) Ak(z′, z)

Prove that p(z) is the invariant distribution if the detailed
balance holds

p(z) T(z, z′) = T(z′, z) p(z′).

Using the symmetry min(a, b) = min(b, a) it can be shown
that the detailed balance holds

p(z) qk(z
′ | z) Ak(z′, z) = min (p(z) qk(z′ | z), p(z′) qk(z | z′))

= min (p(z′) qk(z | z′), p(z) qk(z′ | z))
= p(z′) qk(z | z′) Ak(z, z′).

Statistical Machine
Learning

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

594of 821

Markov Chain Monte Carlo – Metropolis-Hastings

Isotropoic Gaussian proposal distribution (blue)
In order to keep the rejection rate low, use the smallest
standard deviation σmin of the multivariate Gaussian (red)
for the proposal distribution.
Leads to random walk behaviour→ slow exploration of the
state space.
Number of steps separating states that are approximately
independent is (σmax/σmin)2.

σmax

σmin

Statistical Machine
Learning

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

595of 821

Gibbs Sampling

Goal: sample from a joint distribution p(z) = p(z1, . . . , zM)
How? sample one variable from the distribution
conditioned on all the other variable
Example : given p(z1, z2, z3)

At step τ we have samples z(τ)1 , z
(τ)
2 and z

(τ)
3 .

Get samples for the next step τ + 1

z(τ+1)1 ∼ p(z
(τ)
1 | z

(τ)
2 , z

(τ)
3 )

z(τ+1)2 ∼ p(z
(τ)
2 | z

(τ+1)
1 , z

(τ)
3 )

z(τ+1)3 ∼ p(z
(τ)
3 | z

(τ+1)
1 , z

(τ+1)
2 )

Statistical Machine
Learning

Data61 | CSIRO
The Australian National

University

Motivation

Sampling from the
Uniform Distribution

Sampling from Standard
Distributions

Rejection Sampling

Adaptive Rejection
Sampling

Importance Sampling

Markov Chain Monte
Carlo

Gibbs Sampling

596of 821

Gibbs Sampling – Why does it work?

1 p(z) is invariant of each of the Gibbs sampling steps and
hence of the whole Markov chain : a) p(zi | {z\i}) is
invariant because the marginal distribution p(z\i) does not
change, and b) by definition each step samples from
p(zi | {z\i}).

2 Ergodicity: sufficient that none of the conditional
distribution is zero anywhere.

3 Gibbs sampling is a Metropolis-Hastings sampling in
which each step is accepted.

z2
L

Decision Theory and Modelling
Inference and Decision
Probabilistic Modelling in the Abstract
Model Selection

Related Posts