CS计算机代考程序代写 chain Bayesian Hidden Markov Mode Bayesian network algorithm Bayesian Networks

Bayesian Networks
AIMA 13.1-13.2

CMPSC 442
Week 8, Meeting 23, 3 Segments

Outline

● Syntax and Semantics of Bayesian Networks
● Conditional Probability Tables in a Bayesian Network
● More on Bayesian Networks

2Outline, Wk 8, Mtg 23

Bayesian Networks
AIMA 13.1-13.2

CMPSC 442
Week 8, Meeting 23, Segment 1 of 3: Syntax and Semantics
of Bayesian Networks

Bayesian Networks

● Concisely represent any full joint probability distribution
○ Graphically represents conditional independence
○ Defined by the topology and the local probability information

● Also known as belief nets

4Syntax and Semantics

A Naïve Bayes Model is a Bayesian Network

● A graphical representation has one node for each random variable
● Directed arrows represent the conditioning effect of a parent node on

its children

5Syntax and Semantics

Conditional Probability Tables at Each Node

6Syntax and Semantics

P(F)

.10

Flu P(X1|F)

T .90

F .10

Flu P(X2|F)

T .60

F .05

Flu P(X3|F)

T .75

F .33

Flu P(X4|F)

T .95

F .05

Flu P(X5|F)

T .28

F .30

Syntax

● A set of nodes, one per random variable (discrete or continuous)
● A directed, acyclic graph (DAG)

○ Directed arcs for parent node as conditioning context point to the child
nodes

● A conditional distribution for each node Xi given its parents that
quantifies the effect of the parents on the node using a finite number of
parameters θ

7Syntax and Semantics

Syntax

● The nodes and edges represent the topology (or structure) of the
network

● In the simplest case, the conditional distribution of each random
variable is represented as a conditional probability table (CPT)

● A CPT gives the distribution over Xi for each combination of parent
values

8Syntax and Semantics

Example

You have a new burglar alarm installed at home. It is fairly reliable at
detecting a burglary, but is occasionally set off by minor earthquakes. You
also have two neighbors, John and Mary, who have promised to call you at
work when they hear the alarm. John nearly always calls when he hears
the alarm, but sometimes confuses the telephone ringing with the alarm,
and calls then, too. Mary, on the other hand, likes rather loud music and
often misses the alarm altogether. Given the evidence of who has or has
not called, we would like to estimate the probability of a burglary.

(From Judea Pearl, developer of Bayesian Networks and causal models)

9Syntax and Semantics

Bayesian Network for Example

● Random Variables
○ Burglary
○ Earthquake
○ Alarm
○ JohnCalls
○ MaryCalls

● Knowledge: John and Mary usually call if my burglar alarm goes off,
and occasionally otherwise. A minor earthquake can cause the alarm.

● I’m at work, neighbor John calls to say my alarm is ringing, but
neighbor Mary doesn’t call. Is there a burglar?

10Syntax and Semantics

Network Topology Can Reflect Causal Knowledge

● P(Alarm|Burglary,Earthquake)
○ A burglar can set the alarm off
○ An earthquake can set the alarm off

● P(JohnCalls,MaryCalls|Alarm)
○ The alarm can cause Mary to call
○ The alarm can cause John to call

● Alarm is conditioned on Burglary,
Earthquake

● JohnCalls and MaryCalls are conditionally
independent of Burglary, Earthquake

11Syntax and Semantics

Semantics

● Local semantics: given its parents, each node is conditionally
independent of its other ancestors

● Review conditional independence
○ X and Y are conditionally independent given Z iff
○ P(X|Y,Z) = P(X|Z)
○ P(Y|X,Z) = P(Y|Z)
○ P(X ∧ Y|Z) = P(X|Z) P(Y|Z)

● Local semantics give rise to global semantics

12Syntax and Semantics

Semantics

● Where the network contains n random variables Xi, each entry in the
joint probability distribution is P(x1, . . . , xn )

● In a Bayesian network, each entry P(x1, . . . , xn ) in the joint probability
distribution is defined in terms of the parameters θ:

13Syntax and Semantics

Semantics

● The full joint distribution is defined as the product of the local
conditional distributions

● Therefore

14Syntax and Semantics

Deriving the Equation for the Global Semantics

● Step 1: by definition
● Step 2: where y is all the other variables in the full joint probability
● Step 3: proof that the network’s local parameters θ are the conditional

probabilities of the full joint probability distribution
15Syntax and Semantics

Chain Rule to Construct Bayesian Networks

● Chain rule: a joint probability can be expressed as a product of conditional
probabilities in the illustrated way

● Given the global semantics from the preceding slide, this gives a general
assertion about variables in the network, and a construction procedure

● Generalization of Naive Bayes

16Syntax and Semantics

Bayesian Networks
AIMA 13.1-13.2

CMPSC 442
Week 8, Meeting 23, Segment 2 of 3: Conditional
Probability Tables in a Bayesian Network

Constructing a Bayesian Network

A Bayesian network is a correct representation of a domain only if each
node is conditionally independent of its other predecessors in the node
ordering, given its parents

1. Determine the variables to model the domain, and order them such that
causes precede effects

2. Choose a minimal set of parents for Xi to satisfy the equation for the local &
global semantics, and insert an edge from every Parent(Xi) to Xi

3. Add the CPT at each node.
This procedure guarantees the network is acyclic, and with no redundant
probability values: it cannot violate the axioms of probability.

18Conditional Probability Tables

Full Bayesian Network for Judea Pearl’s Example

● P(Alarm|Burglary,Earthquake)
○ A burglar can set the alarm off
○ An earthquake can set the alarm off

● P(JohnCalls,MaryCalls|Alarm)
○ The alarm can cause Mary to call
○ The alarm can cause John to call

19Conditional Probability Tables

Compactness of a Bayesian Network

● A CPT for Boolean Xi with k Boolean parents has 2
k rows for the

combinations of parent values
● Each row requires one probability value p for Xi = true (the number for

Xi = false is 1-p)
● If no Boolean variable has more than k Boolean parents, the complete

network requires O(n ⋅ 2k) numbers
● Grows linearly with n versus O(2n) for the full joint distribution
● For the example, 1 + 1 + 4 + 2 + 2 = 10 numbers vs. 25-1 = 31

20Conditional Probability Tables

Reading the CPTs

● The probability of a burglary
on any day is 0.001

● The probability of an
earthquake on any day is
0.002

21Conditional Probability Tables

Reading the CPTs

● An earthquake and a burglary
can cause the alarm with p=0.70

● A burglary with no earthquake
can cause the alarm with p=0.01

● An earthquake with no burglary
can cause the alarm to go off
with probability = 0.70

● In the absence of a burglary or
earthquake, the alarm can go off
with probability = 0.01

22Conditional Probability Tables

Example Knowledge

● If the alarm goes off, JohnCalls
has probability = 0.90

● If the alarm goes off, MaryCalls
has probability = 0.70

23Conditional Probability Tables

A Non-Causal Network

● Suppose we choose the ordering M, J, A, B, E to link symptoms to
causes
○ P(M | A, B, E, J) = P(M) (1 entries)
○ P(J | A, B, E, M) = P(J|M) (2 entries)
○ P(A | B, E, J, M) = P(A|J, M) (4 entries)
○ P(B | A, E, J, M) = P(B | A) (2 entries)
○ P(E | A, B, J, M) = P(E | A, B) (4 entries)

24Conditional Probability Tables

Diagnostic Knowledge versus Causal Knowledge

● Diagnostic network is less compact than the causal network: 1 + 2 + 4 +
2 + 4 = 13 CPT entries instead of 10

● Causal models and conditional independence seem hardwired for
humans!
○ Expert physicians prefer to estimate probabilities for causal rules than for

diagnostic ones (Tversky & Kahneman, 1982)
○ Deciding conditional independence is hard in non-causal directions

25Conditional Probability Tables

Bayesian Networks
AIMA 13.1-13.2

CMPSC 442
Week 8, Meeting 23, Segment 3 of 3: More on Bayesian
Networks

Non-descendants Property

● Capturing conditional independence relations where the conditional
probabilities at a node (random variable) capture the dependence on

the parent nodes, and independence from all other nodes

● A random variable is conditionally independent of its non-descendants,
given its parents

27More on Bayesian Networks

Markov Blanket

Independent of the gobal network structure:

● For node A, the only part of the network to
consider for predicting the behavior of A and its
children
○ Parents of A, children of A, and the other

parents of A’s children’s

● This Markov Blanket property is exploited by
inference algorithms that use local and
distributed stochastic sampling processes

28More on Bayesian Networks

Extensions to Bayesian Networks

● Hidden variables
● Decision (action) variables
● Random variables with continuous distributions
● “Plate” models

○ Latent Dirichlet Allocation (LDA), a generative statistical model that allows
sets of observations to be explained by unobserved groups

● Dynamical Belief Nets (DBNs): Change over time
○ Hidden Markov Models (HMMs): a special case of DBNs in which the

entire state of the world is represented by a single hidden state variable

29More on Bayesian Networks

Brief Illustration of Hidden Variables

● Initial evidence: car won’t start
● Testable variables (green)
● Action variables (orange): Conduct repairs
● Hidden variables (gray): ensure sparse structure, reduce parameters

30More on Bayesian Networks

Summary

● A Bayesian network provides a compact representation of a joint
probability distribution
○ Guaranteed to be a consistent specification (obeys the axioms of

probability)
○ Graphically shows conditional independence
○ Can be extended to include “action” nodes, to use hidden variables, to

incorporate change over time, to use continuous variables

● Networks that capture causality tend to be sparser
● Estimating and querying belief nets is straightforward if everything is

observable (but computationally expensive for large networks)
31Summary, Wk 8, Mtg 23