CMPUT 366 F20: Probability Theory
James Wright & Vadim Bulitko
October 15, 2020
CMPUT 366 F20: Probability Theory
1
Lecture Outline
Probability Theory
PM 8.1-8.2
CMPUT 366 F20: Probability Theory 2
Uncertainty
In both search and RL we assumed that the agent knows its current state s
That is an abstraction/simplification
in real life agents may not know the entire state with certainty
Agent’s knowledge is uncertain
agent must consider multiple hypotheses
agent must update beliefs about which hypotheses are likely given observations
Stephen Hladky, 2009
CMPUT 366 F20: Probability Theory
3
Example*
An AI robot has to decide between three actions: drive without wearing a seatbelt
drive while wearing a seatbelt stay home
If the robot knows with certainty that an accident will happen, it will just stay home
If the robot knows with certainty that an accident will not happen, it will not bother to wear a seatbelt
Wearing a seatbelt makes sense because the robot is uncertain about whether driving will lead to an accident
* This is a hypothetical example with a robot. As a human in real life please always follow appropriate laws and regulations on wearing seatbealts.
CMPUT 366 F20: Probability Theory 4
Measuring Uncertainty
Probability is a way of measuring/quantifying uncertainty
The agent assigns a number between 0 and 1 to hypotheses 0 means absolutely certain that statement is false
1 means absolutely certain that statement is true intermediate values mean more or less certain
Probability is a measurement of uncertainty, not truth
a statement with probability 0.75 is not “mostly true”
rather, the agent believes it is more likely to be true than not
CMPUT 366 F20: Probability Theory 5
Subjective versus Objective: The Frequentist Perspective
Probabilities can be interpreted as objective statements about the world, or as subjective statements about an agent’s beliefs
Objective view is called frequentist:
The probability of an event is the proportion of times it would happen in the long
run of repeated experiments
Every event has a single, true probability
Events that can only happen once do not have a well-defined probability
CMPUT 366 F20: Probability Theory 6
Subjective versus Objective: The Bayesian Perspective
Probabilities can be interpreted as objective statements about the world or as subjective statements about an agent’s beliefs
Subjective view is called Bayesian
The probability of an event is a measure of an agent’s belief about its likelihood
Different agents can legitimately have different beliefs, so they can legitimately assign different probabilities to the same event
There is only one way to update those beliefs in response to new data
In this course, we will primarily take the Bayesian view
CMPUT 366 F20: Probability Theory 7
Example: Dice
Discuss:
Diane rolls a fair, six-sided dice, and gets the number X What is P(X = 5)?
Diane truthfully tells Oliver that she rolled an odd number What should Oliver believe P(X = 5) is?
Diane truthfully tells Greta that she rolled a number greater than or equal to 5 What should Greta believe P(X = 5) is?
CMPUT 366 F20: Probability Theory 8
Semantics: Possible Worlds
Random variables (e.g., X) take values from a set (domain)
A possible world ω is a complete assignment of values to all random variables A probability measure is a function P : Ω → R:
P(ω)=1
ω∈Ω
∀ω ∈ Ω [P(ω) ≥ 0]
CMPUT 366 F20: Probability Theory 9
Semantics: Possible Worlds
Random variables (e.g., X) take values from a set (domain)
A possible world ω is a complete assignment of values to all random variables A probability measure is a function P : Ω → R:
P(ω)=1
ω∈Ω
∀ω ∈ Ω [P(ω) ≥ 0]
Discuss for the six-sided fair dice example:
What is the random variable? What is its domain?
How many worlds are there? What is the P?
CMPUT 366 F20: Probability Theory 10
Propositions
A primitive proposition is an equality or an inequality (e.g., X = 2 or X ≥ 5)
A proposition is built up from other propositions using logical connectives (e.g.,
X = 1 ∨ X = 3 ∨ X = 5)
The probability of a proposition is the sum of the probabilities of the possible
worlds in which that proposition is true
P(α) = P(ω) ω∈Ω, ω|=α
Example:inthediceexampleP(X≥5)=P(X=5)+P(X=6)=1/6+1/6 =1/3
CMPUT 366 F20: Probability Theory 11
Basic Properties
P(α ∨ β) ≥ P(α) P(α ∨ β) ≥ P(β) P(α & β)≤P(α) P(α & β)≤P(β) P(¬α) = 1 − P(α)
CMPUT 366 F20: Probability Theory 12
Joint Distributions
In our dice example there was a single random variable X
We typically want to think about the interactions of multiple random variables
A joint distribution assigns a probability to each full assignment of values to variables
P(X=1,Y=5)isequivalenttoP(X=1 & Y=5)
the cumulative probability of all worlds in which X = 1 and Y = 5
Suppose Diane now throws her six-sided fair dice twice. The result of the first throw is X and the second throw is Y
Discuss:
What is P(X = 1, Y = 5)? What is P(X = 1)?
CMPUT 366 F20: Probability Theory 13
Another Joint-Distribution Example
What might a day be like in Edmonton?
Two random variables:
Weather with domain {clear, snowing} Temperature with domain {mild, cold, very_cold}
Joint distribution P(Weather, Temperature) →
CMPUT 366 F20: Probability Theory 14
Marginalization
Marginalization is using a joint distribution P(X1,…,Xm,…,Xn) to compute a distribution over a smaller number of variables P(X1, . . . , Xm)
The smaller distribution is called the marginal distribution of its variables
We compute the marginal distribution by summing out the other variables, for instance:
P(X,Y) = P(X,Y,Z = z) z
What is the marginal distribution of Weather?
What is P(Weather = clear)? What is P(Weather = snowing)?
CMPUT 366 F20: Probability Theory 15
Conditional Probability
Agents need to be able to update their beliefs based on new observations This process is called conditioning
We write P(h|e) to denote the probability of hypothesis h given that we have observed evidence e
P(h|e) is the probability of h conditional on e
CMPUT 366 F20: Probability Theory 16
Semantics of Conditional Probability
Evidence e lets us rule out all of the worlds that are incompatible with e
For instance, if the agent observes that the weather is clear, it should no longer assign any probability to the worlds in which it is snowing
We need to normalize the probabilities of the remaining worlds to ensure that the probabilities of possible worlds sum to 1
Modify the table on the right given the evidence that the weather is clear
CMPUT 366 F20: Probability Theory 17
Chain Rule
Conditional probability is defined as
P(h|e) = P(h, e) P(e)
whichisexactlythesumofprobabilitiesofallworldsinwhichh & earetrue
divided by the sum of probabilities of all worlds in which e is true
in the weather example, P(mild|clear) = 0.2 0.2+0.3+0.25
From there we have P(h, e) = P(h|e)P(e) More generally, we have the chain rule:
P(α1,…,αn) = P(α1)P(α2|α1)…P(αn|α1,…,αn−1) n
= P(αi|α1,…,αi−1) i=1
CMPUT 366 F20: Probability Theory
18
Bayes’ Rule
We have P(h, e) = P(h|e)P(e) = P(e|h)P(h) From here we have the Baye’s rule
P(h|e) = P(e|h)P(h) P(e)
P(e) is probability of the the evidence
P(h) is the prior probability of a hypothesis h
P(e|h) is the likelihood — often easier to compute than: P(h|e) is the posterior
Discuss why P(wet|rain) is easier to compute than P(rain|wet) wet is the evidence e
rain is the hypothesis h
CMPUT 366 F20: Probability Theory 19
Expected Value
The expected value of a random variable X is the weighted average of that variable over the domain, weighted by the probability of each value:
E[X] = P(X = x)x x
The conditional expected value of a variable X conditioned on proposition y is its expected value weighted by the conditional probability:
E[X|y] = P(X = x|y)x x
Discuss
What is the expected value of a six-sided fair dice?
What is the conditional expected value of a six-sided fair dice conditioned on the fact that it is even?
CMPUT 366 F20: Probability Theory 20
Expected Value Examples: E[X] = 3
CMPUT 366 F20: Probability Theory 21
Summary
Probability is a numerical measure of uncertainty
Formal semantics:
positive weights, sum up to 1 over possible worlds
probability of proposition is total weight of worlds in which the proposition is true
Conditional probability updates the agent’s beliefs based on evidence
Expected value of a variable is its probability-weighted average over possible worlds
CMPUT 366 F20: Probability Theory 22