CS代写 Ve492: Introduction to Artificial Intelligence

Ve492: Introduction to Artificial Intelligence
Bayesian Networks: Inference
Paul M-SJTU Joint Institute
Some slides adapted from http://ai.berkeley.edu

Bayes’ Nets
❖ Conditional Independences
❖ Probabilistic Inference ❖ Enumeration
❖ Variable elimination
❖ Probabilistic inference is NP-complete
❖ Approximate inference (next)
❖ Representation

❖ Inference: calculating some useful quantity from a joint probability distribution
❖ Examples:
❖ Marginal probability 𝑃(𝑄)
❖ e.g., how often a disease occur?
❖ Posterior probability 𝑃(𝑄|𝐸 = 𝑒)
❖ e.g., what’s the probability of a disease given some symptom?
❖ Most likely explanation arg𝑚𝑎𝑥!𝑃(𝑄 = 𝑞|𝐸 = 𝑒) ❖ e.g., what’s the most probable disease given some symptom?

General case: ❖
We have the joint and we
❖ Step 3: Normalize
Inference by Enumeration
❖ Evidence variables:
❖ Query* variable:
❖ Hidden variables:
* Works fine with multiple query variables, too
All variables
❖ Step 1: Select the entries consistent with the evidence
❖ Step 2: Sum out 𝐻!, … , 𝐻” to get the joint of Q and

Inference by Enumeration in Bayes’ Net
❖ Given unlimited time, inference in BNs is easy
❖ Example of inference by enumeration:

Inference by Enumeration?

Inference by Enumeration is Slow!
❖ Why is inference by enumeration so slow?
❖ You join up the whole joint distribution before you sum out the hidden variables
❖ Idea: interleave joining and marginalizing!
❖ Called “Variable Elimination” (VE)
❖ Still NP-hard, but usually much faster
than inference by enumeration
❖ First we’ll need some new notation: factors

Factor Zoo

Factor Zoo I
❖ Joint distribution: P(X,Y)
❖ Entries P(x,y) for all x, y
❖ Sums to 1
❖ Selected joint: P(x,Y)
❖ A slice of the joint distribution
❖ Entries P(x,y) for fixed x, all y
❖ Sums to P(x)
❖ Number of capitals = dimensionality of the table

Factor Zoo III
❖ Specified family: P( y | X )
❖ Entries P(y | x) for fixed y, but for all x
❖ Sums to … who knows!

Factor Zoo Summary
❖ In general, when we write P(Y1 … YN | X1 … XM)
❖ It is a “factor,” a multi-dimensional array
❖ ItsvaluesareP(y1 …yN |x1 …xM)
❖ Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array

Variable Elimination

Example: Traffic Domain
❖ Random Variables
❖ R: Raining
❖ T: Traffic
❖ L: Late for class!

Inference by Enumeration: Procedural Outline
❖ Track objects called factors
❖ Initial factors are local CPTs (one per node)
❖ Any known values are selected
❖ E.g. if we know , the initial factors are
❖ Procedure: Join all factors, then eliminate all hidden variables

Operation 1: Join Factors
❖ First basic operation: joining factors
❖ Combining factors:
❖ Just like a database join
❖ Get all factors over the joining variable
❖ Build a new factor over the union of the
variables involved
❖ Example: Join on R R
❖ Computation for each entry: pointwise products

Example: Multiple Joins

Operation 2: Eliminate
❖ Second basic operation: marginalization
❖ Take a factor and sum out a variable
❖ Shrinks a factor to a smaller one
❖ A projection operation
❖ Example:

Multiple Elimination
R, T, L Sum T, L Sum L out R out T

❖ Inference by Enumeration Join on r
Join on t Eliminate r
Eliminate t
❖ Variable Elimination Join on r
Eliminate r Join on t
Eliminate t
Traffic Domain

Marginalizing Early! (aka VE)

If evidence, start with factors that select that evidence
❖ No evidence uses these initial factors:
❖ Computing , the initial factors become:
We eliminate all vars other than query + evidence to obtain a selected joint of query and evidence
❖ e.g., for P(L|+r), we would end up with:
How to get the answer?
❖ Just normalize this!

Variable Elimination Pseudo-Code
❖ Start with initial factors:
❖ Local CPTs (but instantiated by evidence)
❖ While there are still hidden variables (not Q
or evidence):
❖ Pick a hidden variable H
❖ Join all factors depending on H
❖ Eliminate (sum out) H
❖ Join all remaining factors and normalize

Example ctd.
Finish with B

Same Example in Equations
❖ marginal can be obtained from joint by summing out
❖ use Bayes’ net joint distribution expression ❖ use x*(y+z) = xy + xz
❖ joining on a, and then summing out gives f1 ❖usex*(y+z) =xy+xz
❖ joining on e, and then summing out gives f2
All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency!

Another Variable Elimination Example
Computational complexity critically depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size 2— as they all only have one variable (Z, Z, and X3 resp.).

Quiz: Variable Elimination Ordering
❖ Assume all variables are binary.
❖ For the query P(Xn|y1,…,yn) work through the following two different
orderings as done in previous slide: Z, X1, …, Xn-1 and X1, …, Xn-1, Z.
❖ What is the size of the maximum factor generated for each of the orderings?
❖ In general, the ordering can greatly affect efficiency.

VE: Computational and Space Complexity
❖ The computational and space complexity of variable elimination is determined by the largest factor
❖ The elimination ordering can greatly affect the size of the largest factor.
❖ E.g., previous slide’s example O(2n) vs. O(1)
❖ Does there always exist an ordering that only results in small factors?

Worst Case Complexity?
❖ If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution.
❖ Hence inference in Bayes’ nets is NP-hard. No known efficient probabilistic inference in general.

❖ A polytree is a directed graph with no undirected cycles
❖ For polytrees you can always find an ordering that is efficient
❖ Idea: eliminate nodes without parents first
❖ Cut-set conditioning for Bayes’ net inference
❖ Choose set of variables such that if removed only a polytree remains
❖ Exercise: Think about how the specifics would work out!

Concluding Remarks
❖ Bayes’ nets exact inference
❖ In general, inference is NP-hard.
❖ Enumeration is an exact algorithm with exponential complexity.
❖ Variable elimination is an exact algorithm whose complexity is still exponential in the worst case, but can be much better by exploiting graph structure (e.g., polytree, cut-set conditioning).
❖ For more information:
❖ AIMA, Chapter 14 for Probabilistic Reasoning

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts