CS计算机代考程序代写 Bayesian Bayesian network Decision Theoretic Agents

Decision Theoretic Agents
AIMA 16

CMPSC 442
Week 10, Meeting 30, Three Segments

Outline

● Utility Functions
● Decision Networks
● Information Value Theory

2Outline, Wk 10, Mtg 30

Decision Theoretic Agents
AIMA 16

CMPSC 442
Week 10, Meeting 30, Segment 1 of 3: Utility Functions

Conceptual Basis for Decision Theoretic Agent

● Ability to reason about an uncertain world
○ Probabilistic models of agent’s beliefs
○ Factored state representations

● Ability to reason about conflicting goals
○ Axioms of utility: constraints on a rational agent’s preferences
○ Decision networks: nodes for belief states, actions, utilities
○ Value of information in different settings

4Utility Functions

Concept of a Utility Function

● Choosing among actions based on the desirability of their outcomes
○ Each action a in state s results in a new state s’ with some probability

P(result(a)=s’)
○ The transition model gives the probabilities of action outcomes

● Given a utility function U(s) that quantifies the desirability of a state s
○ Expected Utility EU(a) of an action a is the sum of utilities over the

outcomes, weighted by their probabilities

5Utility Functions

Maximum Expected Utility (MEU) Principle

● MEU defines a rational agent as one that chooses its next action to be
the one that maximizes the expected utility:

● Implemention requires computational solutions to perception, learning,
causal knowledge about outcomes of actions, and inference

● Instead of a retrospective performance measure, a decision theoretic
agent incorporates it into the agent’s utility function, thus allowing it to
anticipate how to achieve the highest performance

6Utility Functions

Lotteries

● Possible action outcomes can be represented as a lottery, where the
action is a ticket, and each outcome state Si can be an atomic state or
another lottery, occurring with probability pi

● A rational decision process can be founded on a preference
methodology for comparing lotteries

● The axioms of utility theory specify constraints on preference relations

7Utility Functions

Axioms of Utility Theory – One

Six Constraints on Rational Preferences for lotteries, where >
represents a preference and ∼ represents indifference
1. Orderability: (A > B) ∨ (B > A) ∨ (A ∼ B)

2. Transitivity: (A > B) ∧ (B > C) ⇒ (A > C)

3. Continuity: A > B > C ⇒ [p, A; 1−p, C] ∼ [1, B]

a. If A is preferred over B, and B over C, then there is a probability p
that evens the choice between two lotteries, one with a chance of
A and C [p, A; 1−p, C], and another with a certainty of B [1, B]

8Utility Functions

Axioms of Utility Theory -Two

Remainder of Six Constraints on Rational Preferences for lotteries,
where > represents a preference and ∼ represents indifference

4. Substitutability: if A ∼ B then A can be substituted for B (and vice versa)

5. Monotonicity: A > B ⇒ (p > q ⇔ [p, A; 1−p, B] > [q, A; 1−q B])

6. Decomposability: [p, A; 1−p, [q, B; 1−q, C]] ∼ [p, A; (1−p)q, B;
(1−p)(1−q), C]

9Utility Functions

Utility Based on Rational Preferences

● Existence of Utility Function: if an agent’s preferences obey the axioms
of utility, then
○ There exists a function U such that U(A) > U(B) iff A > B
○ U(A) = U(B) iff A ∼ B

● Expected Utility of a Lottery

10Utility Functions

Deterministic Environments

● In a deterministic environment (as in game playing, e.g., minimax), a
preference ranking on states is sufficient, exact quantities for
preferences are not needed

● Such preference rankings are called value functions

11Utility Functions

Decision Theoretic Agents
AIMA 16

CMPSC 442
Week 10, Meeting 30, Segment 2 of 3: Decision Networks

Decision Networks

● Chance nodes (ovals, as in Bayesian Networks)
○ Parents can be chance nodes or decision nodes

● Decisions (rectangles; no parents, treated as observed evidence)
● Utility nodes (diamonds, depend on action and chance nodes)

13Decision Networks

Example: Airport Siting Problem

● Decision nodes indicate points where the
agent can take an action, that in turn can
influence some of the variables

● Air traffic, potential for litigation, and
construction costs affect the utility function for
choosing an airport site

14

● The safety, quietness and frugality represent uncertain outcomes of the
decision, which in turn affect the utility function

● Utility nodes represent the agent’s utility function as a function of the parent
nodes: the uncertain outcomes of the agent’s decision

Norvig & Russell, 3rd Ed., Fig 16.6

Decision Networks

Simplified Example: Airport Siting Problem

● Decision nodes indicate points where the
agent can take an action, that in turn can
influence some of the variables

● Air traffic, potential for litigation, and
construction costs affect the utility function for
choosing an airport site

15

● Outcome states are omitted
● Utility nodes represent the agent’s utility function : the utility function is an

expected utility given by an action-utility function (known as a Q-function in
reinforcement learning)

Decision Networks

Action Selection

1. Instantiate all evidence

2. Set action node(s) each possible way; for each action value:

a. Calculate the posterior for the parents of the utility node,

given the evidence

b. Calculate the utility for each action

3. Choose the action with the highest utility

16Decision Networks

Simple Decision Network: Umbrella Example

● Umbrella = leave

● Umbrella = take

● Optimal decision: Umbrella = leave

17Decision Networks

Example due to Dan Klein & Peter Abeel, CS
188 @ UC Berkeley, Sp 14

W P(W)

sun 0.70

rain 0.30

A W U(A,W)

leave sun 100

leave rain 0

take sun 20

take rain 70

Umbrella Example with an Additional Variable

● Umbrella = leave

● Umbrella = take

● Optimal decision: Umbrella = take

18

A W U(A,W)

leave sun 100

leave rain 0

take sun 20

take rain 70

W P(W|F=bad)

sun 0.34

rain 0.66

Decision Networks

Example due to Dan Klein & Peter Abeel, CS
188 @ UC Berkeley, Sp 14

Decision Theoretic Agents
AIMA 16

CMPSC 442
Week 10, Meeting 30, Segment 3 of 3: Information Value
Theory

The Value of Information

● How an agent chooses what information to acquire: values of any of
the potentially observable chance variables in the model
○ Observation actions affect the agent’s belief state
○ Value of any observation derives from the potential effect on the agent’s

actions

20Information Value Theory

Example: Value of Information Affecting a Purchase

● An oil company can purchase one of n blocks of oil-drilling rights, at cost C/n,
where all blocks are worthless except one, which can generate C net profit

● Results of seismological survey of block #3 indicating whether it has oil can be
purchased. What should the company pay?
○ With probability 1/n, the company will buy block #3 for C/n and make a profit of

C-C/n = (n-1)C/n dollars
○ With probability (n-1)/n the company will buy a different block where the probability

of oil is now 1/(n-1), making a profit of C/(n-1) – C/n = C/n(n-1)
● Company should pay some portion of its expected profit:

21Information Value Theory

Value of Perfect Information (VPI)

● The value of discovering Ej is the average over all possible values ej
using the current belief state, less the expected utility

22Information Value Theory

Comparison of Utilities of Different Action Choices

VPI gives the potential gain regarding choice of action
● Fig 1: Utility of action 1 is almost always greater than action 2; no way to gain
● Fig 2: The choice is unclear, and information is crucial
● Fig 3: The choice is unclear, but with little difference because there is greater

certainty about the lower utility action: the information is less valuable

23

Fig 1 Fig 2 Fig 3

Information Value Theory

Fig. 16.8 from Norvig & Russell, 4th Ed

VPI for Umbrella Example

● MEU with no evidence

● MEU if F=good, assume leave has:

● MEU if F=bad, assume take has:

24

A W U(A,W)

leave sun 100

leave rain 0

take sun 20

take rain 70

F P(F)

good 0.59

bad 0.41

Example due to Dan Klein & Peter Abeel, CS
188 @ UC Berkeley, Sp 14

Information Value Theory

Value of Perfect Information

● Assume we have evidence E=e, and MEU value if we act now:

● Assume we see that E’ = e’. Value if we act then:

25Information Value Theory

Value of Perfect Information

● E’ is a random variable whose value is unknown, so we don’t know
what e’ will be. Expected value if E’ is revealed and then we act:

● Value of information: how much MEU goes up:

26Information Value Theory

VPI Properties

● Non-negative: one can ignore information that is not useful

● Non-additive, because the value depends on the current belief state

● Order-independence of sensing actions as distinct from other actions

27Information Value Theory

Summary

● Probability theory describes what an agent should believe on the basis of
evidence, utility theory describes what an agent wants, and decision theory
combines the two for a decision-theoretic agent

● An agent whose preferences between lotteries are consistent with the axioms
of utility theory has a utility function to select actions that maximize its
expected utility

● Decision Networks are an extension of Bayesian networks that provide a
formalism for expression and solving decision problems

● The value of information is defined as the expected improvement in utility
given the information

28Summary, Wk 10, Mtg 30