CMPUT 366 F20: Probability Theory II
James Wright & Vadim Bulitko
October 20, 2020
CMPUT 366 F20: Probability Theory II
1
Lecture Outline
Probability Theory
PM 8.1-8.4
CMPUT 366 F20: Probability Theory II 2
Bayes’ Rule
We have P(h, e) = P(h|e)P(e) = P(e|h)P(h) From here we have the Bayes’ rule
P(h|e) = P(e|h)P(h) P(e)
P(e) is probability of the the evidence
P(h) is the prior probability of a hypothesis h
P(e|h) is the likelihood — often easier to compute than: P(h|e) is the posterior
CMPUT 366 F20: Probability Theory II 3
Expected Value
The expected value of a random variable X is the weighted average of that variable over the domain, weighted by the probability of each value:
E[X] = P(X = x)x x
The conditional expected value of a variable X conditioned on proposition y is its expected value weighted by the conditional probability:
E[X|y] = P(X = x|y)x x
CMPUT 366 F20: Probability Theory II
4
Expected Value Examples: E[X] = 3
CMPUT 366 F20: Probability Theory II 5
Summary So Far
Probability is a numerical measure of uncertainty
Formal semantics:
positive weights, sum up to over possible worlds
probability of proposition is total weight of worlds in which the proposition is true
Conditional probability updates the agent’s beliefs based on evidence
Expected value of a variable is its probability-weighted average over possible worlds
CMPUT 366 F20: Probability Theory II 6
Unstructured Joint Distributions
Without additional constraints we need to list a full joint distribution to reason over it
For k binary variables we would need 2k − values
Need some structure to reduce the number of values to be specified
CMPUT 366 F20: Probability Theory II 7
Structure
Most real problems are generated by some sort of underlying process This gives us structure that we can exploit to represent and reason about
probabilities in a more compact way
We can compute any required joint probabilities based on the process, instead of specifying every possible joint probability explicitly
A simple kind of structure is when random variables do not interact
CMPUT 366 F20: Probability Theory II 8
Marginal Independence
When the value of one variable never gives you information about the value of the other, we say the two variables are marginally independent
Mathematically, random variables X and Y are marginally independent iff:
for all values of x and y
P(X=x|Y=y) = P(X=x) P(Y=y|X=y) = P(Y=y)
CMPUT 366 F20: Probability Theory II
9
Marginal Independence Example
Discuss:
Flip four fair coins, and get four results: C, C2, C3, C4
What is the probability P(C = heads)?
Suppose that C2, C3, C4 are all tails
With that in mind, what is the probability that C is heads?
In other words, what is P(C = heads | C2 = tails, C3 = tails, C4 = tails)?
CMPUT 366 F20: Probability Theory II 10
Exploiting Marginal Independence
Theorem. If X and Y are marginally independent then
P(X = x, Y = y) = P(X = x)P(Y = y) for all values of x and y
Thus instead of storing the entire joint distribution, we can store marginal distributions (one for each variable) and then use them to recover the whole joint distribution
CMPUT 366 F20: Probability Theory II
11
Clock Example
Discuss:
Consider a clock with no number markings
Alice and Bob, both look at the clock at the same time, and form opinions about what time it is
Are Alice and Bob’s opinions independent?
P(A|B) = P(A)?
Suppose it is 10am. Are A and B independent?
P(A|B, T = àam) = P(A|T = àam)?
CMPUT 366 F20: Probability Theory II 12
Conditional Independence
When knowing the value of a third variable Z makes two variables X, Y independent, we say that they are conditionally independent given Z (or independent conditional on Z).
Definition: Random variables X and Y are conditionally independent given Z iff P(X = x|Y = y, Z = z) = P(X = X|Z = z)
for all values of x, y, z
Example: A and B are conditionally independent given T
CMPUT 366 F20: Probability Theory II 13
Exploiting Conditional Independence
Theorem: If X and Y are conditionally independent given Z, then P(X = x, Y = y|Z = z) = P(X = x|Z = z)P(Y = y|Z = z)
for all values of x, y, z.
CMPUT 366 F20: Probability Theory II 14
Caveats
Sometimes when two variables are marginally independent, they are also conditionally independent given a third variable
Example: coin flip outcomes C and C2 are marginally independent, and also conditionally independent given C3 because knowing the value of C3 does not make C2 any more informative about C
This is not always true
Consider a random variable B which is true if both C and C2 are of the same value Again, C and C2 are marginally independent:
P(C = heads | C2 = heads) = P(C = heads)
In fact, C and C2 are also both marginally independent of B: P(C | B = true) = P(C) But if I know the value of B, then knowing the value of C2 tells me exactly what the valueofC is:P(C =heads|B=true,C2 =heads)̸=P(C =heads|B=true)
C and C2 are not conditionally independent given B
CMPUT 366 F20: Probability Theory II 15
Summary So Far
Unstructured joint distributions are exponentially expensive to represent (and operate on)
Marginal and conditional independence are especially important forms of structure that a distribution can have
Vastly reduces the cost of representation and computation
Joint probabilities of (conditionally or marginally) independent random variables can be computed by multiplication
CMPUT 366 F20: Probability Theory II 16
Belief Networks, Informally
We can represent the pattern of dependence in a distribution as a directed acyclic graph
Nodes are random variables
Arc to each node from the variables on which it depends
CMPUT 366 F20: Probability Theory II 17
File Alarm Scenario
Agent wants to deduce whether there is a fire in the building next door
The fire alarm detects heat from fires But it can also be set off by tampering
A fire causes visible smoke
People usually leave the building as a group when the fire alarm goes off
When lots of people leave the building, our friend will report it to us
CMPUT 366 F20: Probability Theory II 18
File Alarm Scenario
Graph representation represents a specific factorization of the full joint distribution
Distribution on each node conditional on its parents
Marginal distributions on nodes with no parents
Semantics: every node is independent of its non-descendants, conditional on its parents
CMPUT 366 F20: Probability Theory II 19
Belief/Bayesian Networks
A belief (or Bayesian) network consists of a directed acyclic graph, with each node
labelled by a random variable
a domain for each random variable
aA conditional probability table for each variable given its parents
a marginal distribution for nodes with no parents
CMPUT 366 F20: Probability Theory II 20
Queries
The most common task for a belief network is to query posterior probabilities given some observations
Easy case:
Observations are the parents of query target
More common cases:
Observations are the children of query target
Observations have no straightforward relationship to the target
CMPUT 366 F20: Probability Theory II 21
Extracting Joint Probabilities: Variable Ordering
1 2
3
To compute joint probability distribution, we need a variable ordering that is consistent with the graph
Algorithm 1: Ordering Variables
for i = ,…,n do
select a variable/vertex that is (i) unlabelled
and (ii) whose parents are labelled or do
not exist label it i
CMPUT 366 F20: Probability Theory II 22
Extracting Joint Probabilities: Multiplications
Multiply in the label order:
P(Tampering) = P(Tampering)
P(Fire) = P(Fire)
P(Tampering, Fire) = P(Tampering)P(Fire)
P(Tampering, Fire, Alarm) = P(Alarm | Tampering, Fire)P(Tampering)P(Fire)
P(Tampering, Fire, Alarm, Smoke) = P(Smoke | Fire)P(Alarm |
Tampering, Fire)P(Tampering)P(Fire)
P(Tampering, Fire, Alarm, Smoke, Leaving) = P(Leaving | Alarm)P(Smoke | Fire)P(Alarm | Tampering, Fire)P(Tampering)P(Fire)
CMPUT 366 F20: Probability Theory II 23