CIS 471/571(Fall 2020): Introduction to Artificial Intelligence
Lecture 13: Bayes Nets
Thanh H. Nguyen
Source: http://ai.berkeley.edu/home.html
Reminder:
§Written assignment 3: §Deadline: Nov 10, 2020
§Programming project 3: §Deadline: Nov 10, 2020
Thanh H. Nguyen
11/9/20
2
Probabilistic Models
§ Models describe how (a portion of) the world works
§ Models are always simplifications
§ May not account for every variable
§ May not account for all interactions between variables
§ “All models are wrong; but some are useful.” – George E. P. Box
§ What do we do with probabilistic models?
§ We (or our agents) need to reason about unknown variables,
given evidence
§ Example: explanation (diagnostic reasoning)
§ Example: prediction (causal reasoning)
Probability Recap
§Conditional probability §Product rule
§Chain rule
Independence
Independence
§Two variables are independent if:
§ This says that their joint distribution factors into a product two
simpler distributions § Another form:
§ We write:
§Independence is a simplifying modeling assumption
§ What could we assume for {Weather, Traffic, Cavity, Toothache}?
Example: Independence?
T
P
hot
0.5
cold
0.5
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
T
W
P
hot
sun
0.3
hot
rain
0.2
cold
sun
0.3
cold
rain
0.2
W
P
sun
0.6
rain
0.4
Example: Independence
§N fair, independent coin flips:
H
0.5
T
0.5
H
0.5
T
0.5
H
0.5
T
0.5
Conditional Independence
§Unconditional (absolute) independence very rare §Conditional independence is our most basic and robust
form of knowledge about uncertain environments. § X is conditionally independent of Y given Z
if and only if:
or, equivalently, if and only if
Conditional Independence
§ P(Toothache, Cavity, Catch)
§ If I have a cavity, the probability that the probe catches in it
doesn’t depend on whether I have a toothache:
§ P(+catch | +toothache, +cavity) = P(+catch | +cavity)
§ The same independence holds if I don’t have a cavity: § P(+catch | +toothache, -cavity) = P(+catch| -cavity)
§ Catch is conditionally independent of Toothache given Cavity: § P(Catch | Toothache, Cavity) = P(Catch | Cavity)
§ Equivalent statements:
§ P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
§ P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) § One can be derived from the other easily
Conditional Independence
§What about this domain:
§ Traffic
§ Umbrella § Raining
Conditional Independence
§What about this domain:
§ Fire
§ Smoke § Alarm
Conditional Independence and the Chain Rule
§ Chain rule:
§ Trivial decomposition:
§ With assumption of conditional independence:
§ Bayes’nets / graphical models help us express conditional independence assumptions
Ghostbusters Chain Rule
§ Each sensor depends only on where the ghost is
§ That means, the two sensors are conditionally independent, given the ghost position
§ T: Top square is red
B: Bottom square is red G: Ghost is in the top
§ Givens:
P( +g ) = 0.5 P( -g)=0.5
P(+t |+g)=0.8 P(+t | -g)=0.4 P( +b | +g ) = 0.4 P(+b| -g)=0.8
P(T,B,G) = P(G) P(T|G) P(B|G)
T
B
G
P(T,B,G)
+t
+b
+g
0.16
+t
+b
-g
0.16
+t
-b
+g
0.24
+t
-b
-g
0.04
-t
+b
+g
0.04
-t
+b
-g
0.24
-t
-b
+g
0.06
-t
-b
-g
0.06
Bayes’ Nets: Big Picture
Bayes’ Nets: Big Picture
§ Two problems with using full joint distribution tables as our probabilistic models:
§ Unless there are only a few variables, the joint is WAY too big to represent explicitly
§ Hard to learn (estimate) anything empirically about more than a few variables at a time
§ Bayes’ nets: a technique for describing complex joint distributions (models) using simple, local distributions (conditional probabilities)
§ More properly called graphical models
§ We describe how variables locally interact
§ Local interactions chain together to give global, indirect interactions
§ For about 10 min, we’ll be vague about how these interactions are specified
Example Bayes’ Net: Insurance
Example Bayes’ Net: Car
Graphical Model Notation
§ Nodes: variables (with domains)
§ Can be assigned (observed) or unassigned
(unobserved)
§ Arcs: interactions
§ Similar to CSP constraints
§ Indicate “direct influence” between variables
§ Formally: encode conditional independence (more later)
§ For now: imagine that arrows mean direct causation (in general, they don’t!)
Example: Coin Flips
§N independent coin flips X1X2 Xn
§No interactions between variables: absolute independence
Example: Traffic
§ Variables:
§ R: It rains
§ T: There is traffic
§ Model 1: independence R
§ Model 2: rain causes traffic R
T
§ Why is an agent using model 2 better?
T
Example: Traffic II
§Let’s build a causal graphical model!
§ Variables
§ T: Traffic
§ R: It rains
§ L: Low pressure § D: Roof drips
§ B: Ballgame § C: Cavity
Example: Alarm Network
§ Variables
§ B: Burglary
§ A: Alarm goes off § M: Mary calls
§ J: John calls
§ E: Earthquake!
Bayes’ Net Semantics
Bayes’ Net Semantics
§ A set of nodes, one per variable X §A directed, acyclic graph
A1
An
§A conditional distribution for each node § A collection of distributions over X, one for
X
each combination of parents’ values § CPT: conditional probability table
§ Description of a noisy “causal” process
A Bayes net = Topology (graph) + Local Conditional Probabilities
Probabilities in BNs
§Bayes’ nets implicitly encode joint distributions
§ As a product of local conditional distributions
§ To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together:
§ Example:
Probabilities in BNs
§ Why are we guaranteed that setting results in a proper joint distribution?
§ Chain rule (valid for all distributions): § Assume conditional independences:
à Consequence:
§ Not every BN can represent every joint distribution § The topology enforces certain conditional independencies
Example: Coin Flips
X1X2 Xn
h
0.5
t
0.5
h
0.5
t
0.5
h
0.5
t
0.5
Only distributions whose variables are absolutely independent can be represented by a Bayes’ net with no arcs.
Example: Traffic
R
+r
1/4
-r
3/4
+t
3/4
-t
1/4
T
+r
+t
1/2
-t
1/2
-r
Example: Alarm Network
Burglary
Alarm
John calls
B
+b
-b
P(B)
0.001
0.999
E
+e
-e
P(E)
0.002
0.998
A
+a
+a
-a
-a
J
+j
-j
+j
-j
P(J|A)
0.9
0.1
0.05
0.95
A
+a
+a
-a
-a
M
+m
-m
+m
-m
Earthqk
P(M|A)
0.7
0.3
0.01
0.99
B
Mary calls
+b
+b
+b
+b
-b
-b
-b
-b
E
+e
+e
-e
-e
+e
+e
-e
-e
A
+a
-a
+a
-a
+a
-a
+a
-a
P(A|B,E)
0.95
0.05
0.94
0.06
0.29
0.71
0.001
0.999
Example: Alarm Network
BE A
JM
B
+b
-b
P(B)
0.001
0.999
E
+e
-e
P(E)
0.002
0.998
A
J
P(J|A)
+a
+j
0.9
+a
-j
0.1
-a
+j
0.05
-a
-j
0.95
A
M
P(M|A)
+a
+m
0.7
+a
-m
0.3
-a
+m
0.01
-a
-m
0.99
B
+b
+b
+b
+b
-b
-b
-b
-b
E
+e
+e
-e
-e
+e
+e
-e
-e
A
+a
-a
+a
-a
+a
-a
+a
-a
P(A|B,E)
0.95
0.05
0.94
0.06
0.29
0.71
0.001
0.999
Example: Traffic
§Causal direction
+r
1/4
-r
3/4
R T
+r
+t
3/16
+r
-t
1/16
-r
+t
6/16
-r
-t
6/16
+r
+t
3/4
-t
1/4
-r
+t
1/2
-t
1/2
Example: Reverse Traffic
§Reverse causality?
+t
9/16
-t
7/16
T R
+r
+t
3/16
+r
-t
1/16
-r
+t
6/16
-r
-t
6/16
+t
+r
1/3
-r
2/3
-t
+r
1/7
-r
6/7
Size of a Bayes’ Net
§How big is a joint distribution over N Boolean variables?
2N
§ How big is an N-node net if nodes have up to k parents?
O(N * 2k+1)
§ Both give you the power to calculate
§ BNs: Huge space savings!
§ Also easier to elicit local CPTs
§ Also faster to answer queries (coming)
Causality?
§ When Bayes’ nets reflect the true causal patterns:
§ Often simpler (nodes have fewer parents) § Often easier to think about
§ Often easier to elicit from experts
§ BNs need not actually be causal
§ Sometimes no causal net exists over the domain (especially if
variables are missing)
§ E.g. consider the variables Traffic and Drips
§ End up with arrows that reflect correlation, not causation
§ What do the arrows really mean?
§ Topology may happen to encode causal structure § Topology really encodes conditional independence
Bayes’ Nets
§ So far: how a Bayes’ net encodes a joint distribution
§ Next: how to answer queries about that distribution
§ Today:
§ First assembled BNs using an intuitive notion of
conditional independence as causality
§ Then saw that key property is conditional independence
§ Main goal: answer queries about conditional independence and influence
§ After that: how to answer numerical queries (inference)