COMP3702 Artificial Intelligence – Module 3: Reasoning and planning under uncertainty — Part 1
COMP3702
Artificial Intelligence
Module 3: Reasoning and planning under uncertainty — Part 1
Dr Archie Chapman
Semester 2, 2021
The University of Queensland
School of Information Technology and Electrical Engineering
Thanks to Dr Alina Bialkowski and Dr Hanna Kurniawati
Table of contents
1. Review of Probability
2. Search under uncertainty: AND-OR trees
3. Decision theory
Later: Sequential decision problems, i.e. Markov decision processes
1
Overview of Module 3 —
Reasoning and Planning under
uncertainty
Causes of uncertainty: System noise and errors
Where does uncertainty come from?
• Control error or disturbances from external forces
→ Effect of performing an action is non-deterministic
• Errors in sensing and in processing of sensing data
→ Imperfect observation about the world (partially-observable)
• Strategic uncertainty and interaction with other agents, aka game theory
→ Covered in Module 5
2
Causes of uncertainty: Where do the modelling errors come from?
• Too complex to model
— Lazy, e.g. rolling a dice in a casino, depends on wind direction from air conditioning,
number of people around the table
— Deliberate, to reduce computational complexity. We want to eliminate variables that
will not affect the solution significantly
• Accidental error
— e.g. lack of understanding about the problem
• Abstraction error. . .
3
Causes of uncertainty: Abstraction that may lead to modelling error
The actual possible states are often too large
Simplify, so it’s solvable by current computing power:
• One approach to simplification is to clustering several actual states together and assume
all actual states in the same cluster are the same
• Meaning: A state in our model corresponds to a set of actual states that are not
differentiable by the program
• Similarly with action space
• Another approach is to use function approximations the state or action policy, e.g. using
basis functions or machine learning methods
In both, the effect of performing an action becomes non-deterministic
Usually we deal with bounded, quantifiable uncertainty
4
Assumptions on environment in Module 3
• Does the agent know the state of the world/itself exactly?
Fully observable vs partially observable
• Does an action map one state into a single other state?
Deterministic vs non-deterministic
• Can the world change while the agent is “thinking”?
Static vs dynamic
• Are the actions and percepts discrete
Discrete vs continuous
5
Review of Probability
Applied probability and statistics
In this course, we are only interested in applied probability and statistics.
We will not cover the mathematics of probability theory and stochastic processes, statistics
(derivations and proofs), or the design of experiments. This is not a statistics course.
If you are unfamiliar with probability or want more detail, see:
• R&N Chapter 12.1–12.5
• P&M Chapter 8.1
6
Some terminology
Experiment: An occurrence with an uncertain outcome that we can observe.
For example, rolling a die.
Outcome: The result of an experiment; one particular state of the world.
For example: 4.
Sample Space: The set of all possible outcomes for the experiment.
For example, 1, 2, 3, 4, 5, 6.
Event: A subset of possible outcomes that together have some property we are
interested in.
For example, the event ”even die roll” is the set of outcomes 2, 4, 6.
Probability. . .
7
What is a probability distribution?
What do you think of when an event is described as ”random”?
• An unexpected event?
• A uniform number between 0 and 1?
• A normally distributed random value?
A random variable, denoted X , has an element of chance associated with its value.
The level of chance associated with any particular value (or range of values), X = x , is called
its probability, P(X = x). This is a positive value between 0 and 1.
The collection of probabilites over values that a variable may take is called a distribution, with
the property that the sum of probabilities of all mutually exclusive, collectively exhaustive
events is 1.
The value of any function of a random varible is also a random variable. E.g. the sum of n
random variables takes a random value.
8
What is a probability distribution?
Very loosely* – just about anything that you can count or measure, has non-negative values,
and that sums to one over all outcomes, is a probability distribution.
Fundamentally, both discrete and continuous variables, X , are represented by a cumulative
distribution function, cdf, denoted F (x).
The cdf is the probability that the realised value of X is less than or equal to x :
F (x) = P(X ≤ x).
(*Take it on trust that there is a serious branch of mathematics behind this, regarding topological
spaces and measure theory.)
9
What is a probability distribution?
The terms used for the probability that X takes a particular value are different for discrete and
continuous variables.
For discrete variables, a probability mass function (pmf), P(X = x) describes the chance of a
particular event occurring.
• For finite discrete-valued variables, this is easy to understand as a finite vector of
non-negative values that sum to one. E.g. chance of a coin toss, roll of dice, poker hands,
etc.
• For countably infinite discrete variables, the probability distribution is a series of numbers
over an infinite set of distinct elements.
For continuous variables, a probability density function (pdf), f (x), is a continuous function
that integrates from below to the cumulative density function:
F (X ≤ x) =
∫ x
−∞
f (y).dy
10
Probability distributions
Python will very effectively help you handle probabilities.
Many distributions have functions for probabilities, random number generation, etc.
The methods that you will learn about available through the python random module, which is
part of the standard library. See https://docs.python.org/3/library/random.html for
details.
11
https://docs.python.org/3/library/random.html
Sampling random variables
For example, a normal distribution can be sampled using:
>> import random as r
>> mu, sigma = 2.0, 4.0
>> x = r.gauss(mu, sigma)
1.3927394833370967
. . . or a Weibull distribution, f (x |a, b) =
b
a
(x
a
)
e−(
x
a )
k
:
>> a, b = 1.0, 1.5
>> r.weibullvariate(a, b)
1.9157188803236334
In this course, you don’t have to remember functional forms, etc — just focus on
understanding a distribution’s use and parameters.
12
Random numbers: discrete uniform distribution and choices from sets
Even integer from 0 to 100 inclusive:
>> randrange(0, 101, 2)
26
Single random element from a sequence
>> options =[‘win’, ‘lose’, ‘draw’]
>> choice(options)
‘draw’
Single element according to relative weights
>> weights = [5,10,15]
>> r.choices(options,weights)
‘lose’
13
Conditional probability and independence
Conditional probability is a measure of the probability of an event given that another event has
already occurred.
If the event of interest is A and the event B is known to have occurred, then the corresponding
conditional probability of A given B is denoted P(A | B).
If two events, A and B, are independent, then the probability of both occurring is
P(A ∩ B) = P(A)P(B)
Otherwise, if the events are dependent:
P(A ∩ B) = P(B)P(A | B) = P(A)P(B | A)
In both cases, the probability of events A or B occuring is:
P(A ∪ B) = P(A) + P(B)− P(B ∩ A)
(Nb. the version of the slide used in the lecture included the incorrect statement for independent events: P(A ∪ B) = P(A) + P(B). This is not true for independent events;
instead it applies only to mutually exclusive events. )
14
Bayes rule
Bayes’ rule rearranges the conditional probability relationships to describe the probability of an
event, given prior knowledge of related events:
P(A | B) =
P(B | A)P(A)
P(B)
E.g. Knowing symptoms of a disease is easier than figuring out the disease given the symptoms.
15
Search under uncertainty:
AND-OR trees
Making decisions
• We want to find a plan that works regardless of what outcomes actually occur
• Can no longer rely on a sequence of actions
• Need a conditional plan. The action to perform depends on the output of the previous
action
• Need a different type of tree data structure
16
AND-OR search tree
• A tree with interleaving AND and OR levels
• At each node of an OR level, branching is introduced by the agent’s own choice
• At each node of an AND level, branching is introduced by the environment
17
Example: Slippery vacuum robot
States: Conjunctions of the following state factors:
• Robot Position: {in R1, in R2}
• R1 state: {clean, dirty}
• R2 state: {clean, dirty}
Action: { Left, Right, Suck(R1), Suck(R2) }
World dynamics: Non deterministic, after performing an action at a state,
the robot may end up in one of several possible states
Initial state:
• (Robot in R1) ∧ (R1 is clean) ∧ (R2 is dirty)
Goal state:
• (R1 is clean) ∧ (R2 is clean)
18
Example: Slippery vacuum robot
World dynamics:
• Successors of (Robot in R1, Right) = {Robot in R1, Robot in R2}
• Successors of (Robot in R2, Right) = {Robot in R1, Robot in R2}
19
AND-OR tree of the slippery vacuum robot
20
AND-OR search tree
Solution is a sub-tree that:
• Has a goal node at every leaf
• Specifies one action at each node of an OR level
• Includes every outcome branch at each node of an AND level
When do we have a solution?
21
Labelling an AND-OR tree
22
Labelling an AND-OR tree
23
Labelling an AND-OR tree
24
Labelling an AND-OR tree
25
Labelling an AND-OR tree
26
Labelling an AND-OR tree
When a node is the same as an ancestor node. . . ?
27
Search AND-OR search tree
Start from a state node (OR level)
• Fringe nodes are state nodes
Use any of the search algorithms we have studied,
• Select a fringe node to expand
• Select an action to use
• Insert the corresponding action node
• Insert all possible outcomes of the action, as the child of the action node
• Backup to (re-)label the ancestor nodes
Cost/reward calculation at AND level:
• Weighted sum (when uncertainty is quantified using probability, expectation)
• Take the maximum cost / minimum reward (conservative)
28
Decision theory
Preferences
• Actions result in outcomes
• Agents have preferences over outcomes
• A rational agent will do the action that has the best outcome for them
• Sometimes agents don’t know the outcomes of the actions, but they still need to compare
actions
• Agents have to act.
(Doing nothing is (often) an action).
29
Preferences Over Outcomes
Some notation:
• The preference relation, �, means “is preferred to” or “succeeds in a preference order”
• ≺ is “precedes in a preference order,” or “is not preferred to”, or “is preferred less than”
• Indifference is ∼
If o1 and o2 are outcomes
• o1 � o2 means o1 is at least as desirable as o2.
• o1 ∼ o2 means o1 � o2 and o2 � o1.
• o1 � o2 means o1 � o2 and o2 6� o1
30
Lotteries
• An agent may not know the outcomes of its actions, but only have a probability
distribution of the outcomes.
• o1 lottery is a probability distribution over outcomes.
• R&N denote this [p1, o1; p2, o2, . . . , pk , k], and
P&M denote it: [p1 : o1, p2 : o2, . . . , pk : k] (like a python dict)
where the i are outcomes and pi ≥ 0 such that
∑
i pi = 1
• The lottery specifies that outcome i occurs with probability pi .
• Alternatively, an agent may choose to select an action using a lottery.
• When we talk about outcomes, we will include lotteries over “pure” outcomes.
31
Axioms of rational preferences
Idea: preferences of a rational agent must obey certain rules.
Rational preferences imply behaviour describable as maximization of expected utility
Completeness (for some reason R&N call this Orderability): (o1 � o2)∨ (o2 ≺ o1)∨ (o1 ∼ o2)
Transitivity: (o1 � o2) ∧ (o2 � C )⇒ (o1 � C )
Monotonicity: o1 � o2 ⇒ (p ≥ q ⇔ [p : o1, 1− p : o2] � [q : o1, 1− q : o2])
Continuity: o1 � o2 � C ⇒ ∃p ∈ [0, 1][p : o1, 1− p : C ] ∼ o2
Substitutability: o1 ∼ o2 ⇒ [p : o1, 1− p : C ] ∼ [p : o2, 1− p : C ]
Decomposability [p : o1, 1− p : [q : o2, 1− q : o3]] ∼ [p : o1, (1− p)q : o2, (1− p)(1− q)o3]
32
Properties of Preferences — Completeness
Completeness: Agents have to act, so they must have preferences:
∀o1∀o2 o1 � o2 or o2 � o1
33
Properties of Preferences — Transitivity
Transitivity: Preferences must be transitive:
if o1 � o2 and o2 � o3 then o1 � o3
(Similarly for other mixtures of � and �.)
Rationale: otherwise o1 � o2 and o2 � o3 and o3 � o1.
If they are prepared to pay to get o2 instead of o3,
and are happy to have o1 instead of o2,
and are happy to have o3 instead of o1
−→ money pump.
34
Properties of Preferences — Monotonicity
Monotonicity: An agent prefers a larger chance of getting a better outcome than a smaller
chance:
• If o1 � o2 and p > q then
[p : o1, 1− p : o2] � [q : o1, 1− q : o2]
35
Consequence of axioms of Completeness, Transitivity and Monotonicity
• Suppose o1 � o2 and o2 � o3. Consider whether the agent would prefer
• o2
• the lottery [p : o1, 1− p : o3]
for different values of p ∈ [0, 1].
• Plot which one is preferred as a function of p:
o2 –
lottery –
0 1 p
36
Properties of Preferences — Continuity
Continuity: Suppose o1 � o2 and o2 � o3, then there exists a p ∈ [0, 1] such that
o2 ∼ [p : o1, 1− p : o3]
37
Properties of Preferences — Substitutability
Substitutability: if o1 ∼ o2 then the agent is indifferent between lotteries that only differ by o1
and o2:
[p : o1, 1− p : o3] ∼ [p : o2, 1− p : o3]
38
Alternative Axiom for Substitutability
Substitutability: if o1 � o2 then the agent weakly prefers lotteries that contain o1 instead of o2,
everything else being equal.
That is, for any number p and outcome o3:
[p : o1, (1− p) : o3] � [p : o2, (1− p) : o3]
39
Properties of Preferences — Decomposability
Decomposability: (no fun in gambling). An agent is indifferent between lotteries that have
same probabilities and outcomes. This includes lotteries over lotteries.
For example:
[p : o1, 1− p : [q : o2, 1− q : o3]]
∼ [p : o1, (1− p)q : o2, (1− p)(1− q) : o3]
40
Summary of Rational Preferences
Completeness:
(o1 � o2) ∨ (o1 ≺ o2) ∨ (o1 ∼ o2)
Transitivity:
(o1 � o2) ∧ (o2 � C )⇒ (o1 � C )
Monotonicity:
o1 � o2 ⇒ (p ≥ q ⇔ [p : o1, 1− p : o2] � [q : o1, 1− q : o2])
Continuity:
o1 � o2 � C ⇒ ∃p ∈ [0, 1][p : o1, 1− p : C ] ∼ o2
Substitutability:
o1 ∼ o2 ⇒ [p : o1, 1− p : C ] ∼ [p : o2, 1− p : C ]
Decomposability [p : o1, 1− p : [q : o2, 1− q : o3]] ∼ [p : o1, (1− p)q : o2, (1− p)(1− q)o3]
41
What we would like
• We would like a measure of preference that can be combined with probabilities. So that
value([p : o1, 1− p : o2])
= p × value(o1) + (1− p)× value(o2)
• Money does not act like this.
What would you prefer
$1, 000, 000 or [0.5 : $0, 0.5 : $2, 000, 000]?
• It may seem that preferences are too complex and muti-faceted to be represented by single
numbers.
42
Theorem
If preferences follow the rationality properties, then preferences can be measured by a function
utility : outcomes → [0, 1]
such that
• o1 � o2 if and only if utility(o1) ≥ utility(o2)
• o1 ∼ o2 if and only if utility(o1) = utility(o2)
• Utilities are linear with probabilities:
utility([p1 : o1, p2 : o2, . . . , pk : ok ])
=
k∑
i=1
pi × utility(oi )
So, we can replace preferences with real-numbers, but still what is the “best” lottery?
43
Maximum Expected Utility (MEU)
Main problem: What does “best” mean?
Utility: a number that assigns the desirability of a state, MEU is the commonly used definition
of “best” decision
Idea:
• Assigns utility function to each outcome (state) to represent the agent’s preference
• “Best” decision maximises the expected utility of the outcomes
44
Example: Buying a Used Car
Goal of buying the car: To gain profit from reselling it
Car costs $1000
Can sell the car for $1100 ⇒ $100 profit
BUT Every car is either good or bad
• Costs $40 to repair a good car
• Costs $200 to repair a bad car
• 20% cars are bad
Should we buy the car? → Solve using MEU
45
Example: Buying a Used Car
State space: {good car, bad car}
Preference: good car � bad car
Utility function:
• U(good car) =1100 – 1000 – 40 = 60
• U(bad car) = 1100 – 1000 – 200 = -100
Lottery: [0.8, good car; 0.2, bad car]
Expected Utility if we buy the car:
• P(goodcar)× U(goodcar) + P(badcar)× U(badcar) = 0.8× 60 + 0.2×−100 = 28
• Higher than not buying the car. Hence, buy!
46
How about the Utility of Money?
Which one do you prefer?
A: A sure gain of $240 B: A 25% chance of winning $1000 and 75% chance of winning nothing
Which one do you prefer? C: A sure loss of $750 D: A 75% chance of losing $1000 and 25%
chance of losing nothing
Is decision theory useless here? Need a better utility function that can incorporate our
preference! (must follow the axioms)
47
Utility as a function of money
48
Factored Representation of Utility
• Suppose the outcomes can be described in terms of features X1, . . . ,Xn.
• An additive utility is one that can be decomposed into set of factors:
u(X1, . . . ,Xn) = f1(X1) + · · ·+ fn(Xn).
This assumes additive independence.
• Strong assumption: contribution of each feature doesn’t depend on other features.
• Many ways to represent the same utility:
— a number can be added to one factor as long as it is subtracted from others.
49
Additive Utility
• An additive utility has a canonical representation:
u(X1, . . . ,Xn) = w1 × u1(X1) + · · ·+ wn × un(Xn).
• If besti is the best value of Xi , ui (Xi=besti ) = 1.
If worsti is the worst value of Xi , ui (Xi=worsti ) = 0.
• wi are weights,
∑
i wi = 1.
The weights reflect the relative importance of features.
• We can determine weights by comparing outcomes.
w1 = u(best1, x2, . . . , xn)− u(worst1, x2, . . . , xn).
for any values x2, . . . , xn of X2, . . . ,Xn.
50
Complements and Substitutes
• Often additive independence is not a good assumption.
• Values x1 of feature X1 and x2 of feature X2 are complements if having both is better than
the sum of the two.
• Values x1 of feature X1 and x2 of feature X2 are substitutes if having both is worse than
the sum of the two.
51
Generalized Additive Utility
• A generalized additive utility can be written as a sum of factors:
u(X1, . . . ,Xn) = f1(X1) + · · ·+ fk(Xk)
where Xi ⊆ {X1, . . . ,Xn}.
• An intuitive canonical representation is difficult to find.
• It can represent complements and substitutes.
52
Utility and time
Utility and time
• Would you prefer $1000 today or $1000 next year?
• What price would you pay now to have an eternity of happiness?
• How can you trade off pleasures today with pleasures in the future?
53
Utility and time
• How would you compare the following sequences of rewards (per week):
A: $1000000, $0, $0, $0, $0, $0,. . .
B: $1000, $1000, $1000, $1000, $1000,. . .
C: $1000, $0, $0, $0, $0,. . .
D: $1, $1, $1, $1, $1,. . .
E: $1, $2, $3, $4, $5,. . .
54
Rewards and Values
Suppose the agent receives a sequence of rewards r1, r2, r3, r4, . . . in time. What utility should
be assigned? “Return” or “value”
• total reward V =
∞∑
i=1
ri
• average reward V = lim
n→∞
(r1 + · · ·+ rn)/n
55
Rewards and Values
Suppose the agent receives a sequence of rewards r1, r2, r3, r4, . . . in time.
• discounted return V = r1 + γr2 + γ2r3 + γ3r4 + · · ·
γ is the discount factor 0 ≤ γ ≤ 1.
56
Properties of the Discounted Rewards
• The discounted return for rewards r1, r2, r3, r4, . . . is
V = r1 + γr2 + γ
2r3 + γ
3r4 + · · ·
= r1 + γ(r2 + γ(r3 + γ(r4 + . . . )))
• If Vt is the value obtained from time step t
Vt = rt + γVt+1
• How is the infinite future valued compared to immediate rewards?
1 + γ + γ2 + γ3 + · · · = 1/(1− γ)
Therefore
minimum reward
1− γ
≤ Vt ≤
maximum reward
1− γ
• We can approximate V with the first k terms, with error:
V − (r1 + γr2 + · · ·+ γk−1rk) = γkVk+1
57
Allais Paradox (1953)
What would you prefer:
A: $1m — one million dollars
B: lottery [0.10 : $2.5m, 0.89 : $1m, 0.01 : $0]
What would you prefer:
C: lottery [0.11 : $1m, 0.89 : $0]
D: lottery [0.10 : $2.5m, 0.9 : $0]
It is inconsistent with the axioms of preferences to have A � B and D � C .
A,C: lottery [0.11 : $1m, 0.89 : X ]
B,D: lottery [0.10 : $2.5m, 0.01 : $0, 0.89 : X ]
58
Framing Effects [Tversky and Kahneman]
• A disease is expected to kill 600 people. Two alternative programs have been proposed:
Program A: 200 people will be saved
Program B: probability 1/3: 600 people will be saved
probability 2/3: no one will be saved
Which program would you favor?
• A disease is expected to kill 600 people. Two alternative programs have been proposed:
Program C: 400 people will die
Program D: probability 1/3: no one will die
probability 2/3: 600 will die
Which program would you favor?
Tversky and Kahneman: 72% chose A over B.
22% chose C over D.
59
Attributions and References
Thanks to Dr Alina Bialkowski and Dr Hanna Kurniawati for their materials.
Many of the slides in this course are adapted from David Poole and Alan Mackworth, Artificial Intelligence:
foundations of computational agents, 2E, CUP, 2017 http://https://artint.info/. These materials are
copyright c© Poole and Mackworth, 2017, licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License.
Other materials derived from Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3E,
Prentice Hall, 2009.
All remaining errors are Archie’s — please email if you find any: archie. .au
http://https://artint.info/
Overview of Module 3 — Reasoning and Planning under uncertainty
Review of Probability
Search under uncertainty: AND-OR trees
Decision theory
Utility and time
Appendix