Ve492: Introduction to Artificial Intelligence
Bayesian Networks: Sampling
Paul M-SJTU Joint Institute
Slides adapted from http://ai.berkeley.edu, AIMA, UM, CMU
Copyright By PowCoder代写 加微信 powcoder
Bayes’ Nets
❖ Conditional Independences
❖ Probabilistic Inference
❖ Enumeration (exact, exponential complexity)
❖ Variable elimination (exact, worst-case
exponential complexity, often better)
❖ Probabilistic inference is NP-complete
❖ Approximate inference (sampling)
❖ Representation
Learning Objectives
❖ How to perform probabilistic reasoning in an approximate way?
❖ What is prior sampling?
❖ What is rejection sampling?
❖ What is likelihood sampling?
❖ What is Gibbs sampling?
Approximate Inference
❖ Basic Idea: compute approximate probability using samples instead of full distribution
❖ 𝑝!, 𝑟𝑒𝑑;𝑝”, 𝑏𝑙𝑢𝑒;𝑝#, 𝑔𝑟𝑒𝑒𝑛
❖ Sampling: element selection according to distribution
❖ If distribution is known, e.g.,
❖ random.choices([‘r’, ‘b’, ‘g’], [pr, pb, pg])
❖ random.uniform(a, b)
❖ If distribution is unknown, e.g.,
❖ flip coin, rolling die, running an experiment
❖ Basic idea
❖ Draw N samples from a sampling distribution S
❖ Compute an approximate (posterior) probability
❖ Show this converges to the true probability P
❖ Why sample?
❖ Often very fast to get a decent approximate
❖ The algorithms are very simple and general (easy to apply to fancy models)
❖ They require very little memory (O(n))
❖ They can be applied to large models, whereas exact algorithms blow up
❖ Suppose you have two agent programs A and B for Monopoly
❖ What is the probability that A wins?
❖ Method 1:
❖ Let 𝑠 be a sequence of dice rolls and Chance and Community Chest cards
❖ Given 𝑠, the outcome 𝑉(𝑠) is determined (1 for a win, 0 for a loss)
❖ Probability that A wins is ∑! 𝑃(𝑠)𝑉(𝑠)
❖ Problem: infinite number of sequences 𝑠 !
❖ Method 2:
❖ Sample 𝑁 sequences from 𝑃(𝑠), play 𝑁 games (maybe 100)
❖ Probability that A wins is roughly #” ∑$𝑉(𝑠$) , i.e., fraction of wins in the sample
Sampling Basics: Discrete (Categorical) Distribution
❖ Sampling from given distribution
❖ Step 1: Get sample u from uniform distribution over [0, 1)
❖ e.g., random() in python
❖ Step 2: Convert this sample u into an outcome for the given distribution by having each outcome associated with a sub- interval of [0,1) with sub-interval size equal to probability of the outcome
❖ If random() returns u = 0.83, then our sample is C = blue
❖ E.g., after sampling 8 times:
Sampling in Bayes’ Nets
❖ Prior Sampling
❖ Rejection Sampling
❖ Likelihood Weighting
Prior Sampling
Prior Sampling
Cloudy Sprinkler
Rain WetGrass
+c, -s, +r, +w -c, +s, -r, +w
Prior Sampling
For i=1, 2, …, n
Sample xi from P(Xi | Parents(Xi)) Return (x1, x2, …, xn)
Prior Sampling
❖ This process generates samples with probability:
i.e., the BN’s joint probability
❖ Let the number of samples of an event be
❖ i.e., the sampling procedure is consistent
❖ Assume we have a bunch of samples from a BN: +c, -s, +r, +w
+c, +s, +r, +w
-c, +s, +r, -w C
+c, -s, +r, +w -c, -s, -r, +w
❖ How to estimate P(W)?
❖ Count realizations of W <+w:4, -w:1>
❖ Normalize to estimate P(W) = <+w:0.8, -w:0.2>
❖ This will get closer to the true distribution with more samples
❖ General approach: can estimate any (conditional) probability ❖ P(C| +w)? P(C| +r, +w)? P(C| -r, -w)?
❖ Fast: can use fewer samples if less time (what’s the drawback?)
Rejection Sampling
Rejection Sampling
+c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w
❖ Let’s say we want P(C)
❖ No point keeping all samples around
❖ Just tally counts of C as we go
❖ Let’s say we want P(C| +s)
❖ Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s
❖ This is called rejection sampling
❖ It is also consistent for conditional probabilities (i.e., correct in the limit)
Rejection Sampling
IN: evidence instantiation For i=1, 2, …, n
Sample xi from P(Xi | Parents(Xi)) If xi not consistent with evidence
Reject: Return, and no sample is generated in this cycle
Return (x1, x2, …, xn)
Likelihood Weighting
Likelihood Weighting
❖ Problem with rejection sampling:
❖ If evidence is unlikely, rejects lots of samples
❖ Evidence not exploited as you sample
❖ Consider P(Shape|blue)
Shape Color pyramid, green
pyramid, red sphere, blue cube, red sphere, green
Idea: fix evidence variables and sample the rest
❖ Problem: sample distribution not consistent!
❖ Solution: weight by probability of evidence given parents
pyramid, blue pyramid, blue sphere, blue cube, blue sphere, blue
Likelihood Weighting
Cloudy Sprinkler
Rain WetGrass
+c, +s, +r, +w …
Likelihood Weighting
IN: evidence instantiation
for i=1, 2, …, n
if Xi is an evidence variable
xi = observation for Xi
Set w = w * P(xi | Parents(Xi))
Sample xi from P(Xi | Parents(Xi))
return (x1, x2, …, xn), w
How to Use the Weights?
Let’s say we want P(C| +s)
❖ For each assignment c of C, sum
weights of samples matching c and +s ❖ Normalize
This sampling method is also consistent
𝑤” +c, +s, +r, +w 𝑤% +c, +s, +r, +w 𝑤& -c, +s, +r, -w 𝑤’ +c, +s, +r, +w 𝑤( -c, +s, -r, +w
Likelihood Weighting
❖ Sampling distribution if z sampled and e fixed evidence
❖ Now, samples have weights
❖ Together, weighted sampling distribution is consistent
Quiz: Likelihood Weighting
❖ Two identical samples from likelihood weighted sampling will have the same exact weights.
C. It depends
Likelihood Weighting
❖ Likelihood weighting is good ❖
❖ All samples are used
❖ The values of downstream variables are influenced by upstream evidence
Likelihood weighting still has
weaknesses
❖ The values of upstream variables are unaffected by downstream evidence
❖ E.g., suppose evidence is a video of a traffic accident
❖ With evidence in k leaf nodes, weights will be O(2$%)
❖ With high probability, one lucky sample will have much larger weight than the others, dominating the result
❖ We would like each variable to “see” all the evidence!
❖ Procedure: keep track of a full instantiation x1, x2, …, xn.
1. Start with an arbitrary instantiation consistent with the evidence.
2. Sample one variable at a time, conditioned on all the rest, but keep evidence fixed.
3. Keep repeating this for a long time.
❖ Property: in the limit of repeating this infinitely many times the resulting sample is coming from the correct distribution
❖ Rationale: both upstream and downstream variables condition on evidence.
❖ In contrast: likelihood weighting only conditions on upstream evidence, and hence weights obtained in likelihood weighting can sometimes be very small. Sum of weights over all samples is indicative of how many “effective” samples were obtained, so want high weight.
Example: P( S | +r)
❖ Step 1: Fix evidence ❖ R=+r
❖ Steps 3: Repeat
❖ Choose a non-evidence variable X
❖ Resample X from P( X | all other variables)
S +r S +r S +r S +r S +r S +r WWWWWW
Step 2: Initialize other variables C
❖ Randomly S +r
Example: P( S | +r)
Steps 3: Repeat
❖ Choose a non-evidence variable X
❖ Resample X from P( X | all other variables)
Keep only the last sample from each iteration:
S +r S +r S +r WWW
S +r S +r S +r WWW
S +r S +r S +r WWW
S +r S +r WW
S +r S +r WW
S +r S +r WW
Efficient Resampling of One Variable
Sample from P(S | +c, +r, -w)
Many things cancel out – only CPTs with S remain!
More generally: only CPTs that have resampled variable need to be considered, and joined together
Concluding Remarks
❖ Approximate Inference via Sampling
❖ Prior Sampling, P
❖ Rejection Sampling, P( Q | e )
❖ Likelihood Weighting, P( Q | e)
❖ , P( Q | e )
❖ Special case of Metropolis-Hastings method
❖ Special case of Markov Chain Monte Carlo methods
❖ For more information:
❖ AIMA, Chapter 14 for Probabilistic Reasoning
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com