Decision Theoretic Agents
AIMA 16
CMPSC 442
Week 10, Meeting 30, Three Segments
Outline
● Utility Functions
● Decision Networks
● Information Value Theory
2Outline, Wk 10, Mtg 30
Decision Theoretic Agents
AIMA 16
CMPSC 442
Week 10, Meeting 30, Segment 1 of 3: Utility Functions
Conceptual Basis for Decision Theoretic Agent
● Ability to reason about an uncertain world
○ Probabilistic models of agent’s beliefs
○ Factored state representations
● Ability to reason about conflicting goals
○ Axioms of utility: constraints on a rational agent’s preferences
○ Decision networks: nodes for belief states, actions, utilities
○ Value of information in different settings
4Utility Functions
Concept of a Utility Function
● Choosing among actions based on the desirability of their outcomes
○ Each action a in state s results in a new state s’ with some probability
P(result(a)=s’)
○ The transition model gives the probabilities of action outcomes
● Given a utility function U(s) that quantifies the desirability of a state s
○ Expected Utility EU(a) of an action a is the sum of utilities over the
outcomes, weighted by their probabilities
5Utility Functions
Maximum Expected Utility (MEU) Principle
● MEU defines a rational agent as one that chooses its next action to be
the one that maximizes the expected utility:
● Implemention requires computational solutions to perception, learning,
causal knowledge about outcomes of actions, and inference
● Instead of a retrospective performance measure, a decision theoretic
agent incorporates it into the agent’s utility function, thus allowing it to
anticipate how to achieve the highest performance
6Utility Functions
Lotteries
● Possible action outcomes can be represented as a lottery, where the
action is a ticket, and each outcome state Si can be an atomic state or
another lottery, occurring with probability pi
● A rational decision process can be founded on a preference
methodology for comparing lotteries
● The axioms of utility theory specify constraints on preference relations
7Utility Functions
Axioms of Utility Theory – One
Six Constraints on Rational Preferences for lotteries, where >
represents a preference and ∼ represents indifference
1. Orderability: (A > B) ∨ (B > A) ∨ (A ∼ B)
2. Transitivity: (A > B) ∧ (B > C) ⇒ (A > C)
3. Continuity: A > B > C ⇒ [p, A; 1−p, C] ∼ [1, B]
a. If A is preferred over B, and B over C, then there is a probability p
that evens the choice between two lotteries, one with a chance of
A and C [p, A; 1−p, C], and another with a certainty of B [1, B]
8Utility Functions
Axioms of Utility Theory -Two
Remainder of Six Constraints on Rational Preferences for lotteries,
where > represents a preference and ∼ represents indifference
4. Substitutability: if A ∼ B then A can be substituted for B (and vice versa)
5. Monotonicity: A > B ⇒ (p > q ⇔ [p, A; 1−p, B] > [q, A; 1−q B])
6. Decomposability: [p, A; 1−p, [q, B; 1−q, C]] ∼ [p, A; (1−p)q, B;
(1−p)(1−q), C]
9Utility Functions
Utility Based on Rational Preferences
● Existence of Utility Function: if an agent’s preferences obey the axioms
of utility, then
○ There exists a function U such that U(A) > U(B) iff A > B
○ U(A) = U(B) iff A ∼ B
● Expected Utility of a Lottery
10Utility Functions
Deterministic Environments
● In a deterministic environment (as in game playing, e.g., minimax), a
preference ranking on states is sufficient, exact quantities for
preferences are not needed
● Such preference rankings are called value functions
11Utility Functions
Decision Theoretic Agents
AIMA 16
CMPSC 442
Week 10, Meeting 30, Segment 2 of 3: Decision Networks
Decision Networks
● Chance nodes (ovals, as in Bayesian Networks)
○ Parents can be chance nodes or decision nodes
● Decisions (rectangles; no parents, treated as observed evidence)
● Utility nodes (diamonds, depend on action and chance nodes)
13Decision Networks
Example: Airport Siting Problem
● Decision nodes indicate points where the
agent can take an action, that in turn can
influence some of the variables
● Air traffic, potential for litigation, and
construction costs affect the utility function for
choosing an airport site
●
14
● The safety, quietness and frugality represent uncertain outcomes of the
decision, which in turn affect the utility function
● Utility nodes represent the agent’s utility function as a function of the parent
nodes: the uncertain outcomes of the agent’s decision
Norvig & Russell, 3rd Ed., Fig 16.6
Decision Networks
Simplified Example: Airport Siting Problem
● Decision nodes indicate points where the
agent can take an action, that in turn can
influence some of the variables
● Air traffic, potential for litigation, and
construction costs affect the utility function for
choosing an airport site
●
15
● Outcome states are omitted
● Utility nodes represent the agent’s utility function : the utility function is an
expected utility given by an action-utility function (known as a Q-function in
reinforcement learning)
Decision Networks
Action Selection
1. Instantiate all evidence
2. Set action node(s) each possible way; for each action value:
a. Calculate the posterior for the parents of the utility node,
given the evidence
b. Calculate the utility for each action
3. Choose the action with the highest utility
16Decision Networks
Simple Decision Network: Umbrella Example
● Umbrella = leave
● Umbrella = take
● Optimal decision: Umbrella = leave
17Decision Networks
Example due to Dan Klein & Peter Abeel, CS
188 @ UC Berkeley, Sp 14
W P(W)
sun 0.70
rain 0.30
A W U(A,W)
leave sun 100
leave rain 0
take sun 20
take rain 70
Umbrella Example with an Additional Variable
● Umbrella = leave
● Umbrella = take
● Optimal decision: Umbrella = take
18
A W U(A,W)
leave sun 100
leave rain 0
take sun 20
take rain 70
W P(W|F=bad)
sun 0.34
rain 0.66
Decision Networks
Example due to Dan Klein & Peter Abeel, CS
188 @ UC Berkeley, Sp 14
Decision Theoretic Agents
AIMA 16
CMPSC 442
Week 10, Meeting 30, Segment 3 of 3: Information Value
Theory
The Value of Information
● How an agent chooses what information to acquire: values of any of
the potentially observable chance variables in the model
○ Observation actions affect the agent’s belief state
○ Value of any observation derives from the potential effect on the agent’s
actions
20Information Value Theory
Example: Value of Information Affecting a Purchase
● An oil company can purchase one of n blocks of oil-drilling rights, at cost C/n,
where all blocks are worthless except one, which can generate C net profit
● Results of seismological survey of block #3 indicating whether it has oil can be
purchased. What should the company pay?
○ With probability 1/n, the company will buy block #3 for C/n and make a profit of
C-C/n = (n-1)C/n dollars
○ With probability (n-1)/n the company will buy a different block where the probability
of oil is now 1/(n-1), making a profit of C/(n-1) – C/n = C/n(n-1)
● Company should pay some portion of its expected profit:
21Information Value Theory
Value of Perfect Information (VPI)
● The value of discovering Ej is the average over all possible values ej
using the current belief state, less the expected utility
22Information Value Theory
Comparison of Utilities of Different Action Choices
VPI gives the potential gain regarding choice of action
● Fig 1: Utility of action 1 is almost always greater than action 2; no way to gain
● Fig 2: The choice is unclear, and information is crucial
● Fig 3: The choice is unclear, but with little difference because there is greater
certainty about the lower utility action: the information is less valuable
23
Fig 1 Fig 2 Fig 3
Information Value Theory
Fig. 16.8 from Norvig & Russell, 4th Ed
VPI for Umbrella Example
● MEU with no evidence
● MEU if F=good, assume leave has:
● MEU if F=bad, assume take has:
24
A W U(A,W)
leave sun 100
leave rain 0
take sun 20
take rain 70
F P(F)
good 0.59
bad 0.41
Example due to Dan Klein & Peter Abeel, CS
188 @ UC Berkeley, Sp 14
Information Value Theory
Value of Perfect Information
● Assume we have evidence E=e, and MEU value if we act now:
● Assume we see that E’ = e’. Value if we act then:
25Information Value Theory
Value of Perfect Information
● E’ is a random variable whose value is unknown, so we don’t know
what e’ will be. Expected value if E’ is revealed and then we act:
● Value of information: how much MEU goes up:
26Information Value Theory
VPI Properties
● Non-negative: one can ignore information that is not useful
● Non-additive, because the value depends on the current belief state
● Order-independence of sensing actions as distinct from other actions
27Information Value Theory
Summary
● Probability theory describes what an agent should believe on the basis of
evidence, utility theory describes what an agent wants, and decision theory
combines the two for a decision-theoretic agent
● An agent whose preferences between lotteries are consistent with the axioms
of utility theory has a utility function to select actions that maximize its
expected utility
● Decision Networks are an extension of Bayesian networks that provide a
formalism for expression and solving decision problems
● The value of information is defined as the expected improvement in utility
given the information
28Summary, Wk 10, Mtg 30