CMPUT 366 F20: Belief Networks
James Wright & Vadim Bulitko
October 22, 2020
CMPUT 366 F20: Belief Networks
1
Lecture Outline
Midterm
coming Tuesday
on eClass, open book, no collaboration of any kind, individualized question sets hard enough so that communicating/consulting the Internet will harm your results 24-hour window to take the exam, starting at our normal class time
80 minutes to do the exam, timed by eClass
I will be in the usual Zoom lecture call to answer your questions during the exam material coverage: up to and including Tuesday lecture this week
but today’s lecture clarifies material in Tuesday lecture
coming Monday lab: midterm review session by the TAs Belief Networks
PM 8.3-8.4
CMPUT 366 F20: Belief Networks 2
Marginal Independence
Definition. X and Y are marginally or unconditionally independent iff P(X|Y) = P(X) and P(Y|X) = P(Y)
Theorem. X and Y are marginally or unconditionally independent iff P(X, Y) = P(X)P(Y)
Proof:
←: suppose P(X|Y) = P(X) and P(Y|X) = P(Y). Then
P(X, Y) = P(X|Y)P(Y) = P(X)P(Y).
→: suppose P(X, Y) = P(X)P(Y). Then P(X|Y) = P(X,Y) = P(X)P(Y) = P(X). A P(Y ) P(Y )
similar argument shows that P(Y|X) = P(Y).
CMPUT 366 F20: Belief Networks
3
Conditional Independence
Definition. X is conditionally independent from Y given Z iff P(X|Y, Z) = P(X|Z)
Theorem. X is conditionally independent from Y given Z iff P(X, Y|Z) = P(X|Z)P(Y|Z)
Proof:
←: suppose P(X|Y, Z) = P(X|Z). Then multiply both sides by P(Y|Z)P(Z):
P(X|Y, Z)P(Y|Z)P(Z) = P(X, Y, Z) = P(X, Y|Z)P(Z) = P(X, Y|Z) =
→: follow the steps above in reverse. CMPUT 366 F20: Belief Networks
P(X|Z)P(Y|Z)P(Z) P(X|Z)P(Y|Z)P(Z) P(X|Z)P(Y|Z)P(Z) P(X|Z)P(Y|Z)
4
File Alarm Scenario
Agent wants to deduce whether there is a fire in the building next door
The fire alarm detects heat from fires But it can also be set off by tampering
A fire causes visible smoke
People usually leave the building as a group when the fire alarm goes off
When lots of people leave the building, our friend will report it to us
CMPUT 366 F20: Belief Networks 5
File Alarm Scenario
Graph representation represents a specific factorization of the full joint distribution
Distribution on each node conditional on its parents
Marginal distributions on nodes with no parents
Semantics: every node is independent of its non-descendants, conditional on its parents
CMPUT 366 F20: Belief Networks 6
Belief/Bayesian Networks
A belief (or Bayesian) network consists of a directed acyclic graph, with each node
labelled by a random variable
a domain for each random variable
aA conditional probability table for each variable given its parents
a marginal distribution for nodes with no parents
CMPUT 366 F20: Belief Networks 7
Queries
The most common task for a belief network is to query posterior probabilities given some observations
Easy case:
Observations are the parents of query target
More common cases:
Observations are the children of query target
Observations have no straightforward relationship to the target
CMPUT 366 F20: Belief Networks 8
Extracting Joint Probabilities: Variable Ordering
1 2
3
To compute joint probability distribution, we need a variable ordering that is consistent with the graph
Algorithm 1: Ordering Variables
for i = ,…,n do
select a variable/vertex that is (i) unlabelled
and (ii) whose parents are labelled or do
not exist label it i
CMPUT 366 F20: Belief Networks 9
Extracting Joint Probabilities: Multiplications
Multiply in the label order:
P(Tampering) = P(Tampering)
P(Fire) = P(Fire)
P(Tampering, Fire) = P(Tampering)P(Fire)
P(Tampering, Fire, Alarm) = P(Alarm | Tampering, Fire)P(Tampering)P(Fire)
P(Tampering, Fire, Alarm, Smoke) = P(Smoke | Fire)P(Alarm |
Tampering, Fire)P(Tampering)P(Fire)
P(Tampering, Fire, Alarm, Smoke, Leaving) = P(Leaving | Alarm)P(Smoke | Fire)P(Alarm | Tampering, Fire)P(Tampering)P(Fire)
n
P(X, . . . , Xn) = P(Xi | parents(Xi))
i=
CMPUT 366 F20: Belief Networks 10
Observing Children
Observing children can render conditionally independent nodes conditionally dependent
the coins scenario
observing both B and C uniquely determines C2
Similarly:
we start with prior probabilities of Tampering and Fire
if we observe that Alarm is ringing, how are these posterior probabilities different? Both increase
if we then observe Smoke, how do these posterior probabilities change? P(Fire) increases, P(Tampering) decreases
CMPUT 366 F20: Belief Networks 11
Recap: Independence & Factorization
Joint distribution P(X, . . . , Xn)
allows us to answer all sorts of queries
exponential in the number of variables: can be too large to represent and reason over
We want to factor the full joint distribution into smaller distributions
easier to represent
can be easier to reason over
Example:
P(C =H,C2 =H,C3 =H,C4 =T)=P(C = H)P(C2 = H)P(C3 = H)P(C4 = T)
CMPUT 366 F20: Belief Networks 12
Recap: Belief Networks
Sometimes variables are dependent on each other example: B = if C = C2 and à otherwise
cannot do product: P(C, C2, B) ̸= P(C)P(C2)P(B)
Need conditional independence:
P(C, C2, B) = P(C)P(C2)P(B|C, C2)
network labeled with:
P(C )
P(C2 ) P(B|C, C2)
CMPUT 366 F20: Belief Networks
13
2C
1C
B
Intuition
Given an order of variables X, . . . , Xn, the chain rule gives us:
n
P(X,…,Xn) = P(Xi|X,…,Xi−)
i=
For example: if we label C, C2, B as X, X2, X3 then the chain rule gives us:
P(C, C2, B) = P(C)P(C2|C)P(B|C, C2)
Do we need all conditional probabilities? Not here: P(C2|C) = P(C2)
Thus: P(C, C2, B) = P(C)P(C2)P(B|C, C2)
So we set parents(C2) = ∅ and parents(B) = {C, C2}
Then ni= P(Xi|X, . . . , Xi−) simplifies to ni= P(Xi|parents(Xi))
CMPUT 366 F20: Belief Networks 14
Constructing Belief Networks: Coins and B
Order the variables X, X2, . . . , Xn and associate each one with a node For each variable Xi:
Choose a minimal set of variables parents(Xi ) from X , . . . , Xi− such that P(Xi | X, . . . , Xi−) = P(Xi | parents(Xi))
Add an arrow from each variable in parents(Xi) to Xi
Label the node for Xi with the conditional probability table P(Xi | parents(Xi))
Construct a belief network for the coins example, given the following orderings of the variables (skip the last step of the algorithm):
1. C,C2,B 2. C2,C,B 3. C,B,C2 4. C2,B,C 5. B,C,C2 6. B,C2,C
CMPUT 366 F20: Belief Networks 15
Constructing Belief Networks: Alice, Bob and Time
Order the variables X, X2, . . . , Xn and associate each one with a node For each variable Xi:
Choose a minimal set of variables parents(Xi ) from X , . . . , Xi− such that P(Xi | X, . . . , Xi−) = P(Xi | parents(Xi))
Add an arrow from each variable in parents(Xi) to Xi
Label the node for Xi with the conditional probability table P(Xi | parents(Xi))
Construct a belief network for the coins example, given the following orderings of the variables (skip the last step of the algorithm):
1. A,B,T 2. A,T,B 3. B,A,T 4. B,T,A 5. T,A,B 6. T,B,A
CMPUT 366 F20: Belief Networks 16
Checking Correctness of a Given Belief Network: Intuition
Suppose we are given a network over some variables X, . . . , Xn
The chain rule gives us P(X,…,Xn) = ni= P(Xi|X,…,Xi−)
The network gives us P(X, . . . , Xn) = ni= P(Xi|parents(Xi))
So the network is claiming conditional independence of some variables Check if the claims are correct
CMPUT 366 F20: Belief Networks 17
Checking Correctness of a Given Belief Network: Intuition
Assume the variables are ordered C, C2, B The chain rule gives us
P(C, C2, B) = P(C)P(C2|C)P(B|C, C2) The network gives us
P(C, C2, B) = P(C)P(C2)P(B|C, C2)
Thus the network claims P(C2|C) = P(C2) Is this correct?
List the network’s claim if the variables are ordered C2, C, B. Is that claim correct as well?
CMPUT 366 F20: Belief Networks 18
2C
1C
B
Observing Children
Observing children can render conditionally independent nodes conditionally dependent
the coins scenario
observing both B and C uniquely determines C2
Similarly:
we start with prior probabilities of Tampering and Fire
if we observe that Alarm is ringing, how are these posterior probabilities different? Both increase
if we then observe Smoke, how do these posterior probabilities change? P(Fire) increases, P(Tampering) decreases
CMPUT 366 F20: Belief Networks 19
Causality
The arrows in belief networks do not, in general, represent causal relationships
T → A is causal relationship if T causes the value of A
B does not cause T but the network on the right is correct
Reasoning about causal relationships is often a good way to construct a natural encoding as a belief network
We can often reason about causal independence even when we do not know the full joint distribution
CMPUT 366 F20: Belief Networks 20
Summary
Belief networks represent a factoring of a joint distribution
Graph structure encodes conditional independence relationships
Can query posterior probabilities of subsets of variables given observations
Each joint distribution has multiple correct representations as a belief network Some are more compact than others
Some are more natural than others
Arcs in a belief network often represent causal relationships But they don’t have to!
CMPUT 366 F20: Belief Networks 21