Bayesian Networks
AIMA 13.1-13.2
CMPSC 442
Week 8, Meeting 23, 3 Segments
Outline
● Syntax and Semantics of Bayesian Networks
● Conditional Probability Tables in a Bayesian Network
● More on Bayesian Networks
2Outline, Wk 8, Mtg 23
Bayesian Networks
AIMA 13.1-13.2
CMPSC 442
Week 8, Meeting 23, Segment 1 of 3: Syntax and Semantics
of Bayesian Networks
Bayesian Networks
● Concisely represent any full joint probability distribution
○ Graphically represents conditional independence
○ Defined by the topology and the local probability information
● Also known as belief nets
4Syntax and Semantics
A Naïve Bayes Model is a Bayesian Network
● A graphical representation has one node for each random variable
● Directed arrows represent the conditioning effect of a parent node on
its children
5Syntax and Semantics
Conditional Probability Tables at Each Node
6Syntax and Semantics
P(F)
.10
Flu P(X1|F)
T .90
F .10
Flu P(X2|F)
T .60
F .05
Flu P(X3|F)
T .75
F .33
Flu P(X4|F)
T .95
F .05
Flu P(X5|F)
T .28
F .30
Syntax
● A set of nodes, one per random variable (discrete or continuous)
● A directed, acyclic graph (DAG)
○ Directed arcs for parent node as conditioning context point to the child
nodes
● A conditional distribution for each node Xi given its parents that
quantifies the effect of the parents on the node using a finite number of
parameters θ
7Syntax and Semantics
Syntax
● The nodes and edges represent the topology (or structure) of the
network
● In the simplest case, the conditional distribution of each random
variable is represented as a conditional probability table (CPT)
● A CPT gives the distribution over Xi for each combination of parent
values
8Syntax and Semantics
Example
You have a new burglar alarm installed at home. It is fairly reliable at
detecting a burglary, but is occasionally set off by minor earthquakes. You
also have two neighbors, John and Mary, who have promised to call you at
work when they hear the alarm. John nearly always calls when he hears
the alarm, but sometimes confuses the telephone ringing with the alarm,
and calls then, too. Mary, on the other hand, likes rather loud music and
often misses the alarm altogether. Given the evidence of who has or has
not called, we would like to estimate the probability of a burglary.
(From Judea Pearl, developer of Bayesian Networks and causal models)
9Syntax and Semantics
Bayesian Network for Example
● Random Variables
○ Burglary
○ Earthquake
○ Alarm
○ JohnCalls
○ MaryCalls
● Knowledge: John and Mary usually call if my burglar alarm goes off,
and occasionally otherwise. A minor earthquake can cause the alarm.
● I’m at work, neighbor John calls to say my alarm is ringing, but
neighbor Mary doesn’t call. Is there a burglar?
10Syntax and Semantics
Network Topology Can Reflect Causal Knowledge
● P(Alarm|Burglary,Earthquake)
○ A burglar can set the alarm off
○ An earthquake can set the alarm off
● P(JohnCalls,MaryCalls|Alarm)
○ The alarm can cause Mary to call
○ The alarm can cause John to call
● Alarm is conditioned on Burglary,
Earthquake
● JohnCalls and MaryCalls are conditionally
independent of Burglary, Earthquake
11Syntax and Semantics
Semantics
● Local semantics: given its parents, each node is conditionally
independent of its other ancestors
● Review conditional independence
○ X and Y are conditionally independent given Z iff
○ P(X|Y,Z) = P(X|Z)
○ P(Y|X,Z) = P(Y|Z)
○ P(X ∧ Y|Z) = P(X|Z) P(Y|Z)
● Local semantics give rise to global semantics
12Syntax and Semantics
Semantics
● Where the network contains n random variables Xi, each entry in the
joint probability distribution is P(x1, . . . , xn )
● In a Bayesian network, each entry P(x1, . . . , xn ) in the joint probability
distribution is defined in terms of the parameters θ:
13Syntax and Semantics
Semantics
● The full joint distribution is defined as the product of the local
conditional distributions
● Therefore
14Syntax and Semantics
Deriving the Equation for the Global Semantics
● Step 1: by definition
● Step 2: where y is all the other variables in the full joint probability
● Step 3: proof that the network’s local parameters θ are the conditional
probabilities of the full joint probability distribution
15Syntax and Semantics
Chain Rule to Construct Bayesian Networks
● Chain rule: a joint probability can be expressed as a product of conditional
probabilities in the illustrated way
● Given the global semantics from the preceding slide, this gives a general
assertion about variables in the network, and a construction procedure
● Generalization of Naive Bayes
16Syntax and Semantics
Bayesian Networks
AIMA 13.1-13.2
CMPSC 442
Week 8, Meeting 23, Segment 2 of 3: Conditional
Probability Tables in a Bayesian Network
Constructing a Bayesian Network
A Bayesian network is a correct representation of a domain only if each
node is conditionally independent of its other predecessors in the node
ordering, given its parents
1. Determine the variables to model the domain, and order them such that
causes precede effects
2. Choose a minimal set of parents for Xi to satisfy the equation for the local &
global semantics, and insert an edge from every Parent(Xi) to Xi
3. Add the CPT at each node.
This procedure guarantees the network is acyclic, and with no redundant
probability values: it cannot violate the axioms of probability.
18Conditional Probability Tables
Full Bayesian Network for Judea Pearl’s Example
● P(Alarm|Burglary,Earthquake)
○ A burglar can set the alarm off
○ An earthquake can set the alarm off
● P(JohnCalls,MaryCalls|Alarm)
○ The alarm can cause Mary to call
○ The alarm can cause John to call
19Conditional Probability Tables
Compactness of a Bayesian Network
● A CPT for Boolean Xi with k Boolean parents has 2
k rows for the
combinations of parent values
● Each row requires one probability value p for Xi = true (the number for
Xi = false is 1-p)
● If no Boolean variable has more than k Boolean parents, the complete
network requires O(n ⋅ 2k) numbers
● Grows linearly with n versus O(2n) for the full joint distribution
● For the example, 1 + 1 + 4 + 2 + 2 = 10 numbers vs. 25-1 = 31
20Conditional Probability Tables
Reading the CPTs
● The probability of a burglary
on any day is 0.001
● The probability of an
earthquake on any day is
0.002
21Conditional Probability Tables
Reading the CPTs
● An earthquake and a burglary
can cause the alarm with p=0.70
● A burglary with no earthquake
can cause the alarm with p=0.01
● An earthquake with no burglary
can cause the alarm to go off
with probability = 0.70
● In the absence of a burglary or
earthquake, the alarm can go off
with probability = 0.01
22Conditional Probability Tables
Example Knowledge
● If the alarm goes off, JohnCalls
has probability = 0.90
● If the alarm goes off, MaryCalls
has probability = 0.70
23Conditional Probability Tables
A Non-Causal Network
● Suppose we choose the ordering M, J, A, B, E to link symptoms to
causes
○ P(M | A, B, E, J) = P(M) (1 entries)
○ P(J | A, B, E, M) = P(J|M) (2 entries)
○ P(A | B, E, J, M) = P(A|J, M) (4 entries)
○ P(B | A, E, J, M) = P(B | A) (2 entries)
○ P(E | A, B, J, M) = P(E | A, B) (4 entries)
24Conditional Probability Tables
Diagnostic Knowledge versus Causal Knowledge
● Diagnostic network is less compact than the causal network: 1 + 2 + 4 +
2 + 4 = 13 CPT entries instead of 10
● Causal models and conditional independence seem hardwired for
humans!
○ Expert physicians prefer to estimate probabilities for causal rules than for
diagnostic ones (Tversky & Kahneman, 1982)
○ Deciding conditional independence is hard in non-causal directions
25Conditional Probability Tables
Bayesian Networks
AIMA 13.1-13.2
CMPSC 442
Week 8, Meeting 23, Segment 3 of 3: More on Bayesian
Networks
Non-descendants Property
● Capturing conditional independence relations where the conditional
probabilities at a node (random variable) capture the dependence on
the parent nodes, and independence from all other nodes
● A random variable is conditionally independent of its non-descendants,
given its parents
27More on Bayesian Networks
Markov Blanket
Independent of the gobal network structure:
● For node A, the only part of the network to
consider for predicting the behavior of A and its
children
○ Parents of A, children of A, and the other
parents of A’s children’s
● This Markov Blanket property is exploited by
inference algorithms that use local and
distributed stochastic sampling processes
28More on Bayesian Networks
Extensions to Bayesian Networks
● Hidden variables
● Decision (action) variables
● Random variables with continuous distributions
● “Plate” models
○ Latent Dirichlet Allocation (LDA), a generative statistical model that allows
sets of observations to be explained by unobserved groups
● Dynamical Belief Nets (DBNs): Change over time
○ Hidden Markov Models (HMMs): a special case of DBNs in which the
entire state of the world is represented by a single hidden state variable
29More on Bayesian Networks
Brief Illustration of Hidden Variables
● Initial evidence: car won’t start
● Testable variables (green)
● Action variables (orange): Conduct repairs
● Hidden variables (gray): ensure sparse structure, reduce parameters
30More on Bayesian Networks
Summary
● A Bayesian network provides a compact representation of a joint
probability distribution
○ Guaranteed to be a consistent specification (obeys the axioms of
probability)
○ Graphically shows conditional independence
○ Can be extended to include “action” nodes, to use hidden variables, to
incorporate change over time, to use continuous variables
● Networks that capture causality tend to be sparser
● Estimating and querying belief nets is straightforward if everything is
observable (but computationally expensive for large networks)
31Summary, Wk 8, Mtg 23