CIS 471/571(Fall 2020): Introduction to Artificial Intelligence
Lecture 15: Bayes Nets – Inference
Thanh H. Nguyen
Source: http://ai.berkeley.edu/home.html
Reminder
§Homework 4: Bayes Nets §Deadline: Nov 24th, 2020
Thanh H. Nguyen
11/16/20
2
Bayes’ Net Representation
§ A directed, acyclic graph, one node per random variable § A conditional probability table (CPT) for each node
§ A collection of distributions over X, one for each combination of parents’ values
§Bayes’ nets implicitly encode joint distributions § As a product of local conditional distributions
§ To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together:
Example: Alarm Network
Burglary
Alarm
John calls
B
+b
-b
P(B)
0.001
0.999
E
+e
-e
P(E)
0.002
0.998
A
+a
+a
-a
-a
J
+j
-j
+j
-j
P(J|A)
0.9
0.1
0.05
0.95
A
+a
+a
-a
-a
M
+m
-m
+m
-m
Earthqk
P(M|A)
0.7
0.3
0.01
0.99
B
Mary calls
+b
+b
+b
+b
-b
-b
-b
-b
E
+e
+e
-e
-e
+e
+e
-e
-e
A
+a
-a
+a
-a
+a
-a
+a
-a
P(A|B,E)
0.95
0.05
0.94
0.06
0.29
0.71
0.001
0.999
Example: Alarm Network
BE A
JM
B
+b
-b
P(B)
0.001
0.999
E
+e
-e
P(E)
0.002
0.998
A
J
P(J|A)
+a
+j
0.9
+a
-j
0.1
-a
+j
0.05
-a
-j
0.95
A
M
P(M|A)
+a
+m
0.7
+a
-m
0.3
-a
+m
0.01
-a
-m
0.99
B
+b
+b
+b
+b
-b
-b
-b
-b
E
+e
+e
-e
-e
+e
+e
-e
-e
A
+a
-a
+a
-a
+a
-a
+a
-a
P(A|B,E)
0.95
0.05
0.94
0.06
0.29
0.71
0.001
0.999
Example: Alarm Network
BE A
JM
B
+b
-b
P(B)
0.001
0.999
E
+e
-e
P(E)
0.002
0.998
A
J
P(J|A)
+a
+j
0.9
+a
-j
0.1
-a
+j
0.05
-a
-j
0.95
A
M
P(M|A)
+a
+m
0.7
+a
-m
0.3
-a
+m
0.01
-a
-m
0.99
B
+b
+b
+b
+b
-b
-b
-b
-b
E
+e
+e
-e
-e
+e
+e
-e
-e
A
+a
-a
+a
-a
+a
-a
+a
-a
P(A|B,E)
0.95
0.05
0.94
0.06
0.29
0.71
0.001
0.999
Bayes’ Nets
§ Representation
§ Conditional Independences
§ Probabilistic Inference
§ Enumeration (exact, exponential
complexity)
§ Variable elimination (exact, worst-case exponential complexity, often better)
§ Inference is NP-complete
§ Sampling (approximate)
§ Learning Bayes’ Nets from Data
Inference
§Inference: calculating some useful quantity from a joint probability distribution
§ Examples:
§ Posterior probability
§ Most likely explanation:
Inference by Enumeration
All variables
§ Step 2: Sum out H to get joint of Query and evidence
§ General case:
§ Evidence variables: § Query* variable:
§ Hidden variables:
§ Step 1: Select the entries consistent with the evidence
§ We want:
* Works fine with multiple query variables, too
§ Step 3: Normalize
⇥1 Z
Inference by Enumeration in Bayes’ Net
§Given unlimited time, inference in BNs is easy §Reminder of inference by enumeration by example:
BE A
JM
/B P(B,+j,+m) e,a
P(B |+j,+m)
= XP(B,e,a,+j,+m)
= X P (B)P (e)P (a|B, e)P (+j|a)P (+m|a) e,a
=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j| a)P (+m| a) P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j| a)P (+m| a)
Inference by Enumeration?
Inference by Enumeration vs. Variable Elimination
§ Why is inference by enumeration so slow?
§ You join up the whole joint distribution before you sum out the hidden variables
§ Idea: interleave joining and marginalizing!
§ Called “Variable Elimination”
§ Still NP-hard, but usually much faster than inference by enumeration
§ First we’ll need some new notation: factors
Factor Zoo
Factor Zoo I
§Joint distribution: P(X,Y) § Entries P(x,y) for all x, y
§ Sums to 1
§Selected joint: P(x,Y)
§ A slice of the joint distribution § Entries P(x,y) for fixed x, all y § Sums to P(x)
§Number of capitals = dimensionality of the table
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
T
W
P
cold
sun
0.2
cold
rain
0.3
Factor Zoo II
§ Single conditional: P(Y | x)
§ Entries P(y | x) for fixed x, all y § Sums to 1
T
W
P
cold
sun
0.4
cold
rain
0.6
§ Family of conditionals:
P(Y | X)
§ Multiple conditionals §Entries P(y | x) for all x, y § Sums to |X|
T
W
P
hot
sun
0.8
hot
rain
0.2
cold
sun
0.4
cold
rain
0.6
Factor Zoo III
§ Specified family: P( y | X ) § Entries P(y | x) for fixed y,
but for all x
§ Sums to … who knows!
T
W
P
hot
rain
0.2
cold
rain
0.6
Factor Zoo Summary
§ In general, when we write P(Y1 … YN | X1 … XM)
§ It is a “factor,” a multi-dimensional array
§ItsvaluesareP(y1 …yN |x1 …xM)
§ Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array
Example: Traffic Domain
§Random Variables §R: Raining
§T: Traffic
§L: Late for class!
P ( L ) = X?
= P(r,t,L)
Xr,t r,t
R
T
+r
0.1
-r
0.9
+r
+t
0.8
+r
-t
0.2
-r
+t
0.1
-r
-t
0.9
L
+t
+l
0.3
+t
-l
0.7
-t
+l
0.1
-t
-l
0.9
=
P (r)P (t|r)P (L|t)
Inference by Enumeration: Procedural Outline
§ Track objects called factors
§ Initial factors are local CPTs (one per node)
+r
0.1
-r
0.9
+r
+t
0.8
+r
-t
0.2
-r
+t
0.1
-r
-t
0.9
+t
+l
0.3
+t
-l
0.7
-t
+l
0.1
-t
-l
0.9
§ Any known values are selected
§ E.g. if we know , the initial factors are
+r
0.1
-r
0.9
+r
+t
0.8
+r
-t
0.2
-r
+t
0.1
-r
-t
0.9
+t
+l
0.3
-t
+l
0.1
§ Procedure: Join all factors, eliminate all hidden variables, normalize
Operation 1: Join Factors
§ First basic operation: joining factors
§ Combining factors:
§ Just like a database join
§ Get all factors over the joining variable
§ Build a new factor over the union of the variables involved
§ Example: Join on R R
T
§ Computation for each entry: pointwise products
R,T
+r
0.1
-r
0.9
+r
+t
0.8
+r
-t
0.2
-r
+t
0.1
-r
-t
0.9
+r
+t
0.08
+r
-t
0.02
-r
+t
0.09
-r
-t
0.81
Example: Multiple Joins
Example: Multiple Joins
R T L
+r
0.1
-r
0.9
Join R
Join T
R, T L
R, T, L
+r
+t
0.08
+r
-t
0.02
-r
+t
0.09
-r
-t
0.81
+r
+t
0.8
+r
-t
0.2
-r
+t
0.1
-r
-t
0.9
+r
+t
+l
0.024
+r
+t
-l
0.056
+r
-t
+l
0.002
+r
-t
-l
0.018
-r
+t
+l
0.027
-r
+t
-l
0.063
-r
-t
+l
0.081
-r
-t
-l
0.729
+t
+l
0.3
+t
-l
0.7
-t
+l
0.1
-t
-l
0.9
+t
+l
0.3
+t
-l
0.7
-t
+l
0.1
-t
-l
0.9
Operation 2: Eliminate
§Second basic operation: marginalization
§Take a factor and sum out a variable § Shrinks a factor to a smaller one
§ A projection operation
§ Example:
+r
+t
0.08
+r
-t
0.02
-r
+t
0.09
-r
-t
0.81
+t
0.17
-t
0.83
Multiple Elimination
R, T, L
T, L
L
+r
+t
+l
0.024
+r
+t
-l
0.056
+r
-t
+l
0.002
+r
-t
-l
0.018
-r
+t
+l
0.027
-r
+t
-l
0.063
-r
-t
+l
0.081
-r
-t
-l
0.729
Sum out R
Sum out T
+t
+l
0.051
+t
-l
0.119
-t
+l
0.083
-t
-l
0.747
+l
0.134
-l
0.886
Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration)
Marginalizing Early (= Variable Elimination)
Traffic Domain
P(L) = ?
§Inference by Enumeration
R T
§ Variable Elimination
=XP(L|t)XP(r)P(t|r) trtr
= X X P (L|t)P (r)P (t|r) L
Join on r
Eliminate r Join on t
Eliminate t
Join on r
Join on t Eliminate r
Eliminate t
Marginalizing Early! (aka VE)
Join R
Sum out R Join T Sum out T
+r
+t
0.08
+r
-t
0.02
-r
+t
0.09
-r
-t
0.81
+r
0.1
-r
0.9
+t
0.17
-t
0.83
R
T L
R, T T T, L L
+r
+t
0.8
+r
-t
0.2
-r
+t
0.1
-r
-t
0.9
L
L
+t
+l
0.051
+t
-l
0.119
-t
+l
0.083
-t
-l
0.747
+l
0.134
-l
0.866
+t
+t
-t
-t
+l
-l
+l
-l
0.3
0.7
0.1
0.9
+t
+t
-t
-t
+l
-l
+l
-l
0.3
0.7
0.1
0.9
+t
+t
-t
-t
+l
-l
+l
-l
0.3
0.7
0.1
0.9
Evidence
§ If evidence, start with factors that select that evidence § No evidence uses these initial factors:
+r
0.1
-r
0.9
+r
+t
0.8
+r
-t
0.2
-r
+t
0.1
-r
-t
0.9
+t
+l
0.3
+t
-l
0.7
-t
+l
0.1
-t
-l
0.9
§ Computing , the initial factors become:
+r
0.1
+r
+t
0.8
+r
-t
0.2
+t
+l
0.3
+t
-l
0.7
-t
+l
0.1
-t
-l
0.9
§ We eliminate all vars other than query + evidence
Evidence II
§ Result will be a selected joint of query and evidence § E.g. for P(L | +r), we would end up with:
Normalize
§ To get our answer, just normalize this! §That’s it!
+r
+l
0.026
+r
-l
0.074
+l
0.26
-l
0.74
General Variable Elimination
§ Query:
§ Start with initial factors:
§ Local CPTs (but instantiated by evidence) § While there are still hidden variables
(not Q or evidence):
§ Pick a hidden variable H
§ Join all factors mentioning H § Eliminate (sum out) H
§ Join all remaining factors and normalize
Example
Choose A
Example
Choose E
Finish with B
Normalize
Another Variable Elimination Example
Computational complexity critically depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size 2 — as they all only have one variable (Z, Z, and X3 respectively).
Variable Elimination Ordering
§ For the query P(Xn|y1,…,yn) work through the following two different orderings as done in previous slide: Z, X1, …, Xn-1 and X1, …, Xn-1, Z. What is the size of the maximum factor generated for each of the orderings?
… …
§ Answer: 2n+1 versus 22 (assuming binary)
§ In general: the ordering can greatly affect efficiency.
VE: Computational and Space Complexity
§ The computational and space complexity of variable elimination is determined by the largest factor
§ The elimination ordering can greatly affect the size of the largest factor.
§ E.g., previous slide’s example 2n vs. 2
§ Does there always exist an ordering that only results in small factors?
§ No!
Worst Case Complexity?
§ CSP:
…
…
§ If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution. § Hence inference in Bayes’ nets is NP-hard. No known efficient probabilistic inference in general.
Polytrees
§ A polytree is a directed graph with no undirected cycles
§ For poly-trees you can always find an ordering that is efficient
§ Try it!!
§Cut-set conditioning for Bayes’ net inference
§ Choose set of variables such that if removed only a polytree remains § Exercise: Think about how the specifics would work out!
Bayes’ Nets
§ Representation
§ Conditional Independences
§ Probabilistic Inference
§ Enumeration (exact, exponential
complexity)
§ Variable elimination (exact, worst-case exponential complexity, often better)
§ Inference is NP-complete
§ Sampling (approximate)
§ Learning Bayes’ Nets from Data