留学生辅导 COMP30024 – Artificial Intelligence Chapter 13.1-3

Week 10: Bayesian Networks
COMP30024 – Artificial Intelligence Chapter 13.1-3
Dr Wafa Johal University of Melbourne

Copyright By PowCoder代写 加微信 powcoder

• Semantics
• Exact inference by enumeration
• Exact inference by variable elimination
AIMA Slides © and ; Dr Wafa Johal 2

Bayesian networks
A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions
a set of nodes, one per variable
a directed, acyclic graph (link ≈ “directly influences”)
a conditional distribution for each node given its parents:
P(Xi |Parents(Xi ))
In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values
AIMA Slides © and ; Dr Wafa Johal 3

Topology of network encodes conditional independence assertions:
Toothache Catch
Weather is independent of the other variables
Toothache and Catch are conditionally independent given Cavity
AIMA Slides © and ; Dr Wafa Johal 4

Scenario: I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar?
Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls
Network topology reflects “causal” knowledge: – A burglar can set the alarm off
– An earthquake can set the alarm off
– The alarm can cause Mary to call
– The alarm can cause John to call
AIMA Slides © and ;
Dr Wafa Johal 5

Example 2 contd.
Earthquake
TT TF FT FF
.95 .94 .29 .001
AIMA Slides © and ;
Dr Wafa Johal 6

Compactness
A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values
Each row requires one number p for Xi = true (the number for Xi = false is just 1 − p)
If each variable has no more than k parents,
the complete network requires O(n · 2k) numbers
I.e., grows linearly with n, vs. O(2n) for the full joint distribution For burglary net, 1+1+4+2+2 = 10 numbers (vs. 25 −1 = 31)
AIMA Slides © and ; Dr Wafa Johal 7

Global semantics
Global semantics defines the full joint distribution as the product of the local conditional distributions:
P(x1, . . . , xn) =
e.g., P(j∧m∧a∧¬b∧¬e)
AIMA Slides © and ;
Dr Wafa Johal 8
P(xi|parents(Xi))

Global semantics
Global semantics defines the full joint distribution as the product of the local conditional distributions:
P(x1, . . . , xn) =
e.g., P(j∧m∧a∧¬b∧¬e)
= P(j|a)P(m|a)P(a|¬b, ¬e)P(¬b)P(¬e) = 0.9×0.7×0.001×0.999×0.998
AIMA Slides © and ;
Dr Wafa Johal 9
P(xi|parents(Xi))

Local semantics
Local semantics: each node is conditionally independent of its nondescendants given its parents
Theorem: Local semantics ≡ global semantics AIMA Slides © and ;
Dr Wafa Johal 10

Constructing Bayesian networks
Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics
1. Choose an ordering of variables X1 , . . . , Xn 2. Fori=1ton
add Xi to the network selectparentsfromX1,…,Xi−1 suchthat
P(Xi|Parents(Xi)) = P(Xi|X1, . . . , Xi−1)
This choice of parents guarantees the global semantics:
P(X1,…,Xn) = =
P(Xi|X1, …, Xi−1) P(Xi|Parents(Xi))
(chain rule) (by construction)
AIMA Slides © and ;
Dr Wafa Johal 11

Suppose we choose the ordering M, J, A, B, E
P(J|M) = P(J)?
AIMA Slides © and ; Dr Wafa Johal 12

Suppose we choose the ordering M, J, A, B, E
P(J|M) = P(J)? No
P(A|J, M) = P(A|J)? P(A|J, M) = P(A)?
JohnCalls Alarm
AIMA Slides © and ; Dr Wafa Johal 13

. No P(B|A, J, M) = P(B|A)?
P(B|A, J, M) = P(B)?
AIMA Slides © and ;
Dr Wafa Johal 14
JohnCalls Alarm

. Yes . No P(E|B, A, J, M) = P(E|A)? P(E|B, A, J, M) = P(E|A, B)?
AIMA Slides © and ;
Dr Wafa Johal 15
JohnCalls Alarm
Earthquake

JohnCalls Alarm
.AIMA Slides © and ; Dr Wafa Johal 16
Earthquake

Example contd.
JohnCalls Alarm
Earthquake
Deciding conditional independence is hard in noncausal directions (symptoms → causes)
Causal models (causes → symptoms) and conditional independence seem easier for humans!
Assessing conditional probabilities is hard in noncausal directions Network is less compact: 1+2+4+2+4 = 13 numbers needed
AIMA Slides © and ; Dr Wafa Johal 17

Example: Car diagnosis
Initial evidence: car won’t start
Testable variables (green), “broken, so fix it” variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters
battery age
battery dead
battery meter
alternator broken
no charging
fanbelt broken
battery flat
car won’t start
fuel line blocked
starter broken
AIMA Slides © and ;
Dr Wafa Johal

Example: Car insurance
GoodStudent
RiskAversion SeniorTrain
ExtraCar VehicleYear
DrivingSkill
MakeModel Antilock
DrivingHist DrivQuality
Ruggedness
Cushioning
Airbag CarValue HomeBase AntiTheft
OwnDamage OtherCost
MedicalCost
LiabilityCost
PropertyCost
AIMA Slides © and ;
Dr Wafa Johal

Inference tasks
Simple queries: compute posterior marginal P(Xi|E = e) e.g., P(NoGas|Gauge = empty, Lights = on, Starts = false)
Conjunctive queries: P(Xi,Xj|E = e) = P(Xi|E = e)P(Xj|Xi,E = e) probabilistic inference required for P(outcome|action, evidence) Value of information: which evidence to seek next?
Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor?
We focus on simple and conjunctive queries
AIMA Slides © and ; Dr Wafa Johal 20

Inference by enumeration
Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation
Simple query on the burglary network:
= P(B, j, m)/P(j, m)
= α 􏰀 􏰀 P(B,e,a,j,m) ea
= αP(B, j, m)
Rewrite full joint entries using product of CPT entries:
= α 􏰀 􏰀 P(B)P(e)P(a|B, e)P(j|a)P(m|a)
RAIMeAcuSlridsesiv©eStudaretpRutshsel-lfianrdsPteternNuormvig;eCrharitsiLoecnki:e O(n)space,O(dD)rWtiamfaJeohal 21
P(e) 􏰀 P(a|B, e)P(j|a)P(m|a) ea

Evaluation tree
P( a|b, e)
Enumeration is inefficient: repeated computation e.g., computes P(j|a)P(m|a) for each value of e
AIMA Slides © and ;
Dr Wafa Johal

Inference by variable elimination
Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation
P(B|j,m) 􏰀 􏰀
= α P(B) e P(e) a P(a|B, e) P(j|a) P(m|a)
􏰅􏰄􏰃􏰆 􏰅􏰄􏰃􏰆 􏰅 􏰄􏰃 􏰆􏰅􏰄􏰃􏰆􏰅 􏰄􏰃 􏰆 B􏰀E􏰀AJM
= αP(B) 􏰀e P(e) 􏰀a P(a|B, e)P(j|a)fM (a)
= αP(B) 􏰀e P(e) 􏰀a P(a|B, e)fJ (a)fM (a) = αP(B)􏰀e P(e) a fA(a,b,e)fJ(a)fM(a) = αP(B) e P(e)fA ̄JM(b, e) (sum out A) = αP(B)fE ̄A ̄JM(b) (sum out E)
= αfB(b) × fE ̄A ̄JM(b)
AIMA Slides © and ;
Dr Wafa Johal 23

Variable elimination: Basic operations
Summing out a variable from a product of factors:
move any constant factors outside the summation
add up submatrices in pointwise product of remaining factors
assuming f1,…,fi do not depend on X
Pointwise product of factors f1 and f2: f1(x1,…,xj,y1,…,yk) × f2(y1,…,yk,z1,…,zl)
= f(x1,…,xj,y1,…,yk,z1,…,zl) e.g., f1(a, b) × f2(b, c) = f(a, b, c)
x f1 ×···×fk = f1 ×···×fi x fi+1 ×···×fk = f1 ×···×fi ×fX ̄
AIMA Slides © and ;
Dr Wafa Johal 24

Irrelevant variables
Consider the query P(JohnCalls|Burglary = true) BE
P(J|b) = αP(b)
P(e) P(a|b, e)P(J|a) eam
P(m|a) Sum over m is identically 1; M is irrelevant to the query
Thm 1: Y is irrelevant unless Y ∈ Ancestors({X} ∪ E)
Here, X = JohnCalls, E = {Burglary}, and Ancestors({X} ∪ E) = {Alarm, Earthquake}
so MaryCalls is irrelevant
AIMA Slides © and ; Dr Wafa Johal 25

Bayes nets provide a natural representation for (causally induced) conditional independence
Topology + CPTs = compact representation of joint distribution Generally easy for (non)experts to construct
Exact inference by enumeration
Exact inference by variable elimination
Examples of skills expected:
• Formulate a belief network for a given problem domain
• Derive expression for joint probability distribution for given belief network
• Use inference by enumeration to answer a query
about simple or conjunctive queries on a given belief network
AIMA Slides © and ; Dr Wafa Johal 26

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com