for Approximate Inference
in Bayesian Networks
Copyright By PowCoder代写 加微信 powcoder
Let p(X1, . . . , Xn|e1, . . . , em) denote the joint distribution of a set of random variables (X1, . . . , Xn)
conditioned on a set of evidence variables (e1, . . . , em). Gibbs sampling is an algorithm to generate
a sequence of samples from such a joint probability distribution. The purpose of such a sequence
is to approximate the joint distribution (as with a histogram), or to compute an integral (such as
an expected value).
Gibbs sampling is applicable when the joint distribution is not known explicitly, but the con-
ditional distribution of each variable is known. The Gibbs sampling algorithm is used to generate
an instance from the distribution of each variable in turn, conditional on the current values of
the other variables. It can be shown that the sequence of samples comprises a Markov chain, and
the stationary distribution of that Markov chain is just the sought-after joint distribution. Gibbs
sampling is particularly well-adapted to sampling the posterior distribution of a Bayesian network,
since Bayesian networks are typically specified as a collection of conditional distributions.
1 The Gibbs Sampler
A Gibbs sampler runs a Markov chain on (X1, . . . , Xn). For convenience of notation, we denote the
set (X1, . . . , Xi−1, Xi+1, . . . , Xn) as X(−i), and e = (e1, . . . , em). Then, the following method gives
one possible way of creating a Gibbs sampler:
1. Initialize:
(a) Instantiate Xi to one of its possible values xi, 1 ≤ i ≤ n.
(b) Let x(0) = (x1, . . . , xn)
2. For t = 1, 2, . . .
(a) Pick an index i, 1 ≤ i ≤ n uniformly at random.1
(b) Sample xi from P (Xi|x
(c) Let x(t) = (x(−i), xi)
The sampler generates a sequence of samples x(0),x(1), . . . ,x(t), . . . from the Markov chain over
all possible states. The stationary distribution of the Markov chain is the joint distribution
P (X1, . . . , Xn|e). Thus, drawing samples from the Markov chain at long enough intervals, i.e.,
allowing enough time for the chain to reach the stationary distribution, gives independent samples
from the distribution P (X1, . . . , Xn|e).
2 Gibbs sampling in the Rain Network
The rain network is shown in Figure 2. Consider the inference problem of estimating P (Rain|Sprinkler =
true, WetGrass = true). Since Sprinkler and WetGrass is set to true, the Gibbs sampler draws
RainSprinkler
Figure 2.1: The Rain network
samples from P (Rain, Cloudy|Sprinkler = true, WetGrass = true). The Gibbs sampler for the
rain network works as follows:
1. Initialize:
(a) Instantiate Rain = true, Cloudy = true
(b) Let x(0) = (Rain = true, Cloudy = true)
2. For t = 1, 2, . . .
(a) Pick variable to update from {Rain, Cloudy} uniformly at random.
(b) If Rain was picked
i. Sample Rain from P (Rain|Cloudy = [value]t−1, Sprinkler = true, WetGrass =
ii. Let x(t) = (Rain = [value]t, Cloudy = [value]t−1)
i. Sample Cloudy from P (Cloudy|Rain = [value]t−1, Sprinkler = true, WetGrass =
ii. Let x(t) = (Cloudy = [value]t, Rain = [value]t−1)
Thus, we need conditional probabilities of Rain and Cloudy given values for all other variables.
Such conditional probabilities can be obtained using the CPT tables and the Markov blanket of
the variable of interest. For convenience, we denote Sprinkler = true by s, WetGrass = true by
Several other schedules are possible here. One of the simplest schedules is to go cyclically over all the indices.
w, Rain = true by r, Rain = false by ¬r, Cloudy = true by c and Cloudy = false by ¬c. With
this notation, from Figure 2 it follows that
P (c|r, s, w) = P (c|s, r) =
P (c)P (s, r|c)
P (c)P (s|c)P (r|c)
P (c)P (s|c)P (r|c) + P (¬c)P (s|¬c)P (r|¬c)
0.5 × 0.1 × 0.8
0.5 × 0.1 × 0.8 + 0.5 × 0.5 × 0.2
Clearly, P (¬c|r, s, w) = 1−P (c|r, s, w) = 0.6923. The probabilities P (c|¬r, s, w) and P (¬c|¬r, s, w)
can be similarly computed. Also,
P (r|c, s, w) =
P (r|c, s)P (w|r, c, s)
P (w|c, s)
P (r|c)P (w|r, s)
P (w|c, s)
P (r|c)P (w|r, s)
P (r|c)P (w|r, s) + P (¬r|c)P (w|¬r, s)
0.8 × 0.99
0.8 × 0.99 + 0.2 × 0.9
= 0.8148 .
Clearly, P (¬r|c, s, w) = 1−P (r|c, s, w) = 0.1852. The probabilities P (r|¬c, s, w) and P (¬r|¬c, s, w)
can be similarly computed.
The above analysis gives the exact conditional probabilities needed by the Gibbs sampler de-
scribed earlier in the section. Samples are drawn by running the Markov chain implemented by the
Gibbs sampler to convergence (to the stationary distribution).
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com