Ve492: Introduction to Artificial Intelligence
Bayesian Networks: Representation and Conditional Independences
UM-SJTU Joint Institute
Copyright By PowCoder代写 加微信 powcoder
Slides adapted from http://ai.berkeley.edu, CMU, AIMA, UM
Learning Objectives
❖ How to reason with uncertain knowledge?
❖ What is a Bayes’ net?
❖ What can we use Bayes’ net for?
❖ How to check for conditional independences?
❖ How to perform probabilistic reasoning?
❖ Representation
❖ Conditional Independences
❖ Probabilistic Inference
Probabilistic Models
❖ Models describe how (a portion of) the world works
❖ Models are always simplifications
❖ May not account for every variable
❖ May not account for all interactions between variables
❖ “All models are wrong; but some are useful.”
– . P. Box
❖ Assumptions:
❖ World possibly non-deterministic
❖ Partially observable
❖ Factored representation
❖ What do we do with probabilistic models?
❖ We (or our agents) need to reason about unknown variables,
given evidence
❖ Example: explanation (diagnostic reasoning)
❖ Example: prediction (causal reasoning)
Motivation for Bayes’ Nets
❖ Joint distribution can be used for inference
❖ Three problems with directly using full joint distribution tables as our probabilistic models:
❖ Space complexity for storing joint distribution
❖ Computational complexity of inference
❖ Hard to learn/estimate directly joint distribution
❖ Bayes’ nets: a technique for describing complex joint distributions (models) using simple, local distributions (conditional probabilities)
❖ Instance of graphical models
❖ We describe how variables locally interact
❖ Local interactions chain together to give global, indirect interactions
Bayes’ Net Notation
❖ Nodes: variables (with domains) ❖ Can be assigned (observed) or
unassigned (unobserved)
❖ Arcs: interactions
❖ Similar to CSP constraints
❖ Indicate “direct influence” between variables
❖ Formally: encode conditional independence (more later)
❖ For now: imagine that arrows mean direct causation (in general, they don’t!)
Example: Coin Flips
❖ N independent coin flips
❖ No interactions between variables: absolute independence
Example: Traffic
❖ Variables:
❖ R: It rains
❖ T: There is traffic
❖ Model 2: rain causes traffic RR
❖ Why is an agent using model 2 better?
❖ Model 1: independence
Example: Alarm Network
❖ Variables
❖ B: Burglary
❖ A: Alarm goes off
❖ M: Mary calls
❖ J: John calls
❖ E: Earthquake!
Example Bayes’ Net: Car
Example Bayes’ Net: Insurance
Bayes’ Net Definition and Semantics
❖ A set of nodes, one per variable
❖ A directed, acyclic graph
❖ A conditional distribution for each node
❖ A collection of distributions over X, one for each combination of parents’ values
❖ CPT: conditional probability table
❖ Description of a noisy “causal” process
A Bayes net = Topology (graph) + Local Conditional Probabilities
Probabilities in BNs
❖ Bayes’ nets implicitly encode joint distributions
❖ As a product of local conditional distributions
❖ To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together:
❖ Example:
Probabilities in BNs
Why are we guaranteed that setting
results in a proper joint distribution? Chain rule (valid for all distributions):
Assume conditional independences: à Consequence:
Not every BN can represent every joint distribution
❖ The topology enforces certain conditional independencies
Example: Coin Flips
Only distributions whose variables are absolutely independent can be represented by a Bayes’ net with no arcs.
Example: Traffic
Quiz: Alarm Network
Example: Traffic
❖ Causal direction
Example: Reverse Traffic
❖ Reverse causality?
Causality?
❖ When Bayes’ nets reflect the true causal patterns:
❖ Often simpler (nodes have fewer parents)
❖ Often easier to think about
❖ Often easier to elicit from experts
❖ BNs need not actually be causal
❖ Sometimes no causal net exists over the domain (especially if variables are
❖ E.g. not well-understood or complex domain such as medical domain
❖ End up with arrows that reflect correlation, not causation
❖ What do the arrows really mean?
❖ Topology may happen to encode causal structure
❖ Topology really encodes conditional independence
Size of a Bayes’ Net
❖ How big is a joint distribution over N Boolean variables?
❖ How big is an N-node net if nodes have up to k parents?
O(N * 2k+1)
❖ Both give you the power to calculate
❖ BNs: Huge space savings!
❖ Also easier to elicit local CPTs
❖ Also faster to answer queries (coming)
Questions about Bayes’ Nets
❖ A Bayes’ net is an efficient encoding of a probabilistic model of a domain
❖ Questions we can ask:
❖ Inference:givenafixedBN,whatisP(X|e)?
❖ Representation:givenaBNgraph,whatkindsofdistributionscan it encode?
❖ Modeling:whatBNismostappropriateforagivendomain?
Bayes’ Nets
❖ Representation
❖ Conditional Independences
❖ Probabilistic Inference
Conditional Independence
❖ X and Y are independent if
❖ X and Y are conditionally independent given Z
❖ (Conditional) independence is a property of a joint distribution
❖ Example:
Bayes’ Nets: Assumptions
❖ Assumptions we are required to make to define the Bayes’ net when given the graph:
❖ Beyond the above conditional independence assumptions
❖ Often additional conditional independences
❖ They can be read off the graph
❖ Important for modeling: understand assumptions made when choosing a Bayes’ net graph
❖ Conditional independence assumptions directly from simplifications in chain rule:
❖ Additional implied conditional independence assumptions?
Independence in a BN
❖ Important question about a BN:
❖ Are two nodes independent given certain evidence?
❖ Ifyes,canproveusingalgebra(tediousingeneral)
❖ Ifno,canprovewithacounterexample
❖ Example:
❖ Question:areXandZnecessarilyindependent?
❖ Answer: No, e.g., low pressure causes rain, which causes traffic.
❖ X can influence Z, Z can influence X (via Y)
❖ Addendum: they could be independent: how?
Conditional Independences
D-separation
D-separation: Outline
❖ Study independence properties for triples
❖ Analyze complex cases in terms of member triples
❖ D-separation: a condition / algorithm for answering such queries
❖ This configuration is a “serial chain”
❖ Is X always independent of Z ? ❖ No!
❖ One example set of CPTs for which X is not independent of Z is sufficient to show this independence is not guaranteed.
❖ Counter-example:
❖ Low pressure => rain => traffic,
high pressure => no rain => no traffic
❖ In numbers:
P( +y | +x ) = 1, P( -y | – x ) = 1, P( +z | +y ) = 1, P( -z | -y ) = 1
Serial Chains
X: Low pressure
Z: Traffic
Serial Chains
❖ This configuration is a “serial chain”
❖ Is X always independent of Z given Y?
X: Low pressure Y: Rain Z: Traffic Yes!
❖ Evidence along the chain “blocks” the influence
❖ This configuration is a “divergent chain” Y:
❖ Is X always independent of Z ? ❖ No!
Divergent Chain
X: Piazza busy
Z: OH busy
Project due
❖ One example set of CPTs for which X is not independent of Z is sufficient to show this independence is not guaranteed.
❖ Counter-example:
❖ Project due => Piazza busy
and OH busy ❖ In numbers:
P( +x | +y ) = 1, P( -x | -y ) = 1, P( +z | +y ) = 1, P( -z | -y ) = 1
Divergent Chain
❖ This configuration is a “divergent chain” ❖ Guaranteed X and Z independent given Y?
Y: Project due
X: Piazza busy
Z: OH busy
❖ Observing the cause blocks influence between effects.
Convergent Chain
❖ Last configuration: “convergent chain” (v-structures)
❖ Are X and Y independent?
❖ Yes: the ballgame and the rain cause
traffic, but they are not correlated
❖ Still need to prove they must be (try it!)
❖ Are X and Y independent given Z?
❖ No: seeing traffic puts the rain and the ballgame in competition as explanation.
❖ This is backwards from the other cases
❖ Observing an effect activates influence between possible causes.
X: Raining
Y: Ballgame
Z: Traffic
P(x,y,z) = P(x)P(y)P(z|x,y)
Conditional Independences
D-separation
General Case
The General Case
❖ General question: in a given BN, are two variables independent (given evidence)?
❖ Solution: analyze the graph
❖ Any complex example can be broken
into repetitions of the three canonical cases
Active / Inactive Paths
❖ Question: Are X and Y conditionally independent given evidence variables {Z}?
❖ Yes, if X and Y “d-separated” by Z
❖ Consider all (undirected) paths from X to Y
❖ No active paths = independence!
❖ A path is active if each triple is active:
❖ Serial chain A ® B ® C where B is unobserved (either
direction)
❖ Divergent chain A ¬ B ® C where B is unobserved
❖ Convergent chain (aka v-structure)
A ® B ¬ C where B or one of its descendants is observed
❖ All it takes to block a path is a single inactive segment
Active Triples
InacWve Triples
D-Separation
❖ If one or more active, then independence not guaranteed
❖ Otherwise (i.e. if all paths are inactive), then independence is guaranteed
❖ Check all (undirected!) paths between and
Structure Implications
❖ Given a Bayes net structure, can run d- separation algorithm to build a complete list of conditional independences that are necessarily true of the form
❖ This list determines the set of probability distributions that can be represented
Topology Limits Distributions
❖ Given some graph topology G, only certain joint distributions can be encoded
❖ The graph structure guarantees certain (conditional) independences
❖ (There might be more independence)
❖ Adding arcs increases the set of distributions, but has several costs
❖ Full conditioning can encode any distribution
YY XZXZ YY XZXZ
Concluding Remarks
❖ Bayes’ nets
❖ Compact representation of a joint distribution
❖ Guaranteed (conditional) independencies of distributions can be deduced from BN graph structure
❖ CPTs can encode further (conditional) independencies
❖ D-separation checks for (conditional) independencies
❖ For more information:
❖ AIMA, Chapter 14 for Probabilistic Reasoning
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com