COMP 424 – Artificial Intelligence Utility Theory
Instructor: Jackie CK Cheung and Readings: R&N Ch 16
From probabilities to decisions
Copyright By PowCoder代写 加微信 powcoder
Probability theory
• What is the world like, accounting for uncertainty?
• Where is the location of the cheese to steal?
Utility theory
What do agents want? The cheese!
Not to get caught!
Decision theory
• What should an agent do?
• Keep running or hide?
• Make rational choice based on probability and utility theory
COMP-424: Artificial intelligence 2
Actions and consequences
• Intelligent agents should not only be observers, but also actors
i.e., they should choose actions in a rational way.
• Most often, actions produce consequences, which cause
changes in the world.
• Decision-making should maximize the overall utility of the agent’s actions.
COMP-424: Artificial intelligence 3
Preferences
• Actions have consequences. We call the consequences of an action payoffs or rewards
• A rational method would be to evaluate the benefit (desirability, value) of each consequence and weigh it by its probability
• To compare different actions, we need to know for each:
• set of consequences Ca = { c1, …, cm }
• probability distribution over consequences Pa(ci), s.t. ∑i Pa(ci) = 1.
COMP-424: Artificial intelligence 4
• A pair La = (Ca, Pa) is called a lottery (Luce and Raiffa, 1957).
• A lottery is usually represented as a list of pairs e.g., La = [ A, p; B, (1-p) ].
or as a tree-like diagram:
• Choosing between actions corresponds to choosing
between lotteries corresponding to these actions.
• Agents have preferences over consequences:
A > B A ~ B A ≳B
: A preferred to B
: indifference between A and B : B not preferred to A
COMP-424: Artificial intelligence 5
The axioms of utility theory
• For an agent to act rationally, its preferences have to obey certain constraints.
• These axioms are called the axioms of utility theory.
1. Orderability
2. Continuity
3. Substitutability
4. Monotonicity
5. Reduction of compound lotteries
COMP-424: Artificial intelligence 6
The axioms of utility theory (2)
1. Orderability: A linear and transitive preference relation must exist between the prizes of any lottery.
• Linearity: (A > B) v (B > A) v (A ~ B)
• Transitivity: (A > B) ∧ (B > C) ⇒ (A > C)
Suppose an agent with following preferences: B>C, A>B, C>A and it owns C.
• If B>C then the agent would pay (say) 1 cent to get B.
• If A>B then the agent (who now has B) would pay (say) 1 cent to
• If C>A then the agent (who now has A) would pay (say) 1 cent to get C.
The agent loses money forever. (Not rational behaviour!)
COMP-424: Artificial intelligence 7
The axioms of utility theory (3)
2. Continuity: If A>B>C, then there exists a lottery L with prizes A and C that is equivalent to receiving B for sure: ∃p, L = [p, A; (1-p) C] ~ B
The probability p at which equivalence occurs can be used to compare the merits of B w.r.t. A and C.
3. Substitutability: Adding the same prize with the same probability to two equivalent lotteries does not change the preference between them.
4. Monotonicity: If two lotteries have the same prizes, the one producing the best prize most often is preferred.
5. Reduction of compound lotteries (“No fun in gambling”): Two consecutive lotteries can be compressed into a single equivalent lottery.
COMP-424: Artificial intelligence 8
Reminder: Expected value
• For a discrete-valued random variable X, with n possible values {x1, …, xn}, occurring with probabilities p1, …, pn respectively.
• Then the expected value (mean) of X is: E[X] = Σi=1:n pi xi
COMP-424: Artificial intelligence 9
• Utilities map outcomes (or states) to real values.
• Given a preference behaviour, the utility function is non-unique.
e.g., behaviour is invariant w.r.t. additive linear transformations:
U’(x) = k1U(x) + k2, where k1>0
• With deterministic prizes only (no lottery choice), only ordinal utility matters. i.e., total order on prizes.
• Utilities don’t need to obey the same laws as expected values.
COMP-424: Artificial intelligence 10
• Suppose you had to choose between these two lotteries:
• L1: win $1M for sure.
• L2: win $5M with prob. 0.1
win $1M with prob. 0.89 win $0 with prob 0.01.
• Which would you choose?
COMP-424: Artificial intelligence 11
• Suppose you had to choose between these two lotteries:
• L1: win $1M for sure.
• L2: win $5M with prob. 0.1
win $1M with prob. 0.89 lose $1M with prob 0.01.
• Which would you choose?
COMP-424: Artificial intelligence 12
• Suppose you had to choose between these two lotteries:
• L1: win $1M for sure.
• L2: win $5M with prob. 0.1
win $1M with prob. 0.89 lose $1M with prob 0.01.
• Which would you choose?
• What if you were the head of the bank of Canada?
COMP-424: Artificial intelligence 13
• Suppose you had to choose between these two lotteries:
• L1: win $5M with prob 0.1
win $0 with prob 0.9.
• L2: win $1M with prob. 0.3
win $0 with prob 0.7.
• Which would you choose?
COMP-424: Artificial intelligence 14
• Suppose you had to choose between these two lotteries:
• L1: win $5M with prob 0.1
win $0 with prob 0.9.
• L2: win $1M with prob. 0.3
win $0 with prob 0.7.
• Which would you choose?
Most people are risk-averse!
COMP-424: Artificial intelligence 15
Utility models
• Capture preferences for rewards and resource consumption.
• Capture risk attitude
E.g. If risk-neutral, getting $5M has half the utility of getting
Utility 0.8
Utility 0.8
Utility 0.8
Risk Seeking
Risk Neutral
Risk Averse
(Utility= Expected reward)
COMP-424: Artificial intelligence
The utility of money
• Decision-theory is normative: describes how rational agents should act. => Useful to define an optimization criteria for AI agents.
COMP-424: Artificial intelligence 17
The utility of money
• Decision-theory is normative: describes how rational agents should act. => Useful to define an optimization criteria for AI agents.
• People systematically violate the axioms of utility theory!
=> Or maybe we don’t understand their utility function?
COMP-424: Artificial intelligence 18
Poll: Let’s Play a Game
• You are forced to play the following pair of lotteries concurrently. For each, indicate the option that you prefer.
Decision (i)
A. a sure gain of $240
B. 25% chance to gain $1000, 75% chance to gain nothing
Decision (ii)
C. a sure loss of $750
D. 75% chance to lose $1000, 25% chance to lose nothing
COMP-424: Artificial intelligence 19
Framing Effects
• Experiment by Tversky and Kahneman (1981)
• They found that 84% of respondents chose A, and 87% of
respondents chose D, so the majority chose A&D [N=150].
• Let’s compare A&D vs B&C:
A&D. 25% chance to win $240, 75% chance to lose $760 B&C. 25% chance to win $250, 75% chance to lose $750
• If presented this way, everybody chose B&C.
• The way in which decisions are presented matters! This is
called a framing effect.
COMP-424: Artificial intelligence 20
Utility at a societal level
• Government-run health insurance: how to decide what treatments are covered, given limited resources?
• NHS (UK): utility based on quality-adjusted life years (QALY) – maximize years of good health
• But non-intuitive outcomes are possible:
• Being completely blind is much worse than being blind in one eye
• Being blind in one eye is somewhat worse than being fully sighted
• Conclusion: lower priority to prevent vision loss in just one eye
• So, expensive treatment for macular degeneration only covered if you are already blind in one eye: http://www.telegraph.co.uk/news/uknews/4182723/You-must-lo se-sight-in-one-eye-before-NHS-will-treat-you.html
• Or for current news in Canada: https://www.cbc.ca/news/politics/astrazeneca-under-55-1.59681 28
COMP-424: Artificial intelligence 21
Acting Under Uncertainty
• MEU principle: Choose the action that maximizes expected utility.
• Most widely accepted as a standard for rational behavior.
• Note than an agent can be entirely rational, i.e. consistent with MEU, without ever representing or manipulating utilities and probabilities. Example??
COMP-424: Artificial intelligence
Maximizing expected utility (MEU)
Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944):
Given preferences that satisfy these axioms, there exists a real-valued function U such that:
A ≳ B iff U(A) ≥ U(B) where
U([p1, C1; …; pn, Cn]) = ∑i pi U(Ci)
COMP-424: Artificial intelligence 23
Example: single-stage decision-making
• One random variable, X: does the child have an ear infection or not?
• One decision, d: give antibiotic (yes) or not (no)
• Utility function: associates a real value to the possible
states of the world and possible decisions.
• Unfortunately X is not directly observable!
• But we know Pr(X=yes) = 0.1 and Pr(X=no)=0.9.
• According to MEU what is the best action?
COMP-424: Artificial intelligence 24
Maximizing expected utility
• Compute:
• Best action give this utility function and probability is d = no.
COMP-424: Artificial intelligence 25
Useful definitions for utility theory
• Utility function U(x)
• Numerical expression of the desirability of a state
• Expected Utility EU( a | x )
= ∑I Pr( Effect(a) | x ) U( Effect(a) )
• Utility of an action weighted by the expected outcome of that action
• Maximum Expected Utility maxa EU( a | x )
• Best average payoff that can be achieved in situation x.
• Optimal Action argmaxa EU( a | x ) • Action chosen according to the MEU principle.
• Policy π(x): X -> A
• A strategy for picking actions in all states.
COMP-424: Artificial intelligence 26
Decision graphs
• Represent decision models graphically:
• Random variables are represented as oval nodes.
• Parameters associated with such nodes are probabilities.
• Decisions (actions) are represented as rectangles.
• Utilities are represented as diamonds.
• Parameters associated with such nodes are utility values for all possible values of the parents.
• Restrictions on nodes:
• Utility nodes have no out-going arcs.
• Decision nodes have no incoming arcs.
• Computing the optimal action can be viewed as an inference task.
COMP-424: Artificial intelligence
Suppose we had evidence that X=yes:
• For each value, ask the utility node to give the utility of that situation,
then pick d according to MEU.
If there is no evidence at X: we will have to sum out (marginalize) over all possible values of X, like in Bayes net inference.
X can be a set of variables (incl. partially observable, e.g. HMM.)
This gives the expected utility at node U, for each choice of action d.
• We can set d to each possible value (yes/no).
COMP-424: Artificial intelligence 28
Buying oil drilling rights:
• Two blocks, A and B, exactly one has oil, worth k.
• Prior probability 0.5 for each block, mutually exclusive.
• Current price of each block is k/2.
• What does the decision network look like?
COMP-424: Artificial intelligence 29
Information gathering
• In an environment with hidden information, an agent can choose to perform information-gathering actions.
e.g., taking the child to the doctor.
e.g., scouting the price of a product at different companies.
• Sometimes, such actions take time, or have associated costs (e.g., medical tests.) When are they worth pursuing?
• The value of information specifies the utility of every piece of evidence that can be acquired.
COMP-424: Artificial intelligence 30
Example: Value of information
Buying oil drilling rights:
• Two blocks, A and B, exactly one has oil, worth k.
• Prior probability 0.5 for each block, mutually exclusive.
• Current price of each block is k/2.
• Consultant offers accurate survey of A.
What is a fair price for the survey?
COMP-424: Artificial intelligence 31
Solution for the example
• Compute:
Expected value of information =
expected value of best action given the information – expected value of best action without information.
Survey may say “oil in A” or “no oil in A” with Pr=0.5 each. Value = [0.5 * value of “buy A” given “oil in A”
+ 0.5 * value of “buy B” given “no oil in A”]
– [ expected return of “buy A” – cost of “buy A” ] = [0.5 * k/2 + 0.5 * k/2] – [ k/2 – k/2 ]
COMP-424: Artificial intelligence 32
Value of Perfect Information (VPI)
• Suppose you have current evidence E, current best action a*, with possible outcomes ci. Then the expected value of a* is:
• Suppose you could gather further evidence about a variable X, should you do it?
COMP-424: Artificial intelligence 33
Value of Perfect Information
• Suppose we knew X=x, then we would choose ax* such that:
• X is a random variable whose value is unknown, so we must compute expected gain over all possible values:
This is the value of knowing X exactly!
COMP-424: Artificial intelligence 34
Properties of VPI
• Non-negative:
∀ X, E VPIE(X) ≥ 0
Note that VPI is an expectation. Depending on the actual value we find for X, there can actually be a loss post-hoc.
• Non-additive: E.g. consider obtaining X twice. VPIE(X, X) ≠ VPIE(X) + VPIE(X)
• Order-independent:
VPIE(X, Y) = VPIE(X) + VPIE,X(Y) = VPIE(Y) + VPIE,Y(X)
COMP-424: Artificial intelligence 35
A more complex example
• X1: Symptoms present
• X2: Infection observed
• X3: Infection present
• d1: Go see the doctor?
• d2: Take antibiotics?
COMP-424: Artificial intelligence 36
A more complex example (cont’d)
• Total utility is U1+U2
• X2 is only observed if we decide that d1=1.
• X3 is never observed
Now we have to optimize d1 and d2 together!
COMP-424: Artificial intelligence 37
• To make decisions under uncertainty, we need to know likelihood (probability) of different outcomes, and have preferences among outcomes:
Decision Theory = Probability Theory + Utility Theory
• An agent with consistent preferences has a utility function, which associates a real number to each possible state.
• Rational agents try to maximize their (expected) utility.
• Utility theory allows us to tell if gathering more information is
• Decision graphs can be used to represent decision problems.
• An algorithm similar to variable elimination is useful to compute optimal decision, but this is very expensive in general.
COMP-424: Artificial intelligence 38
Bernoulli’s puzzle
• You have the opportunity to play a game in which a fair coin is tossed repeatedly until it comes up heads.
• If the head appears on the nth toss, you win 2n dollars.
• Question: How much would you pay to play the game?
COMP-424: Artificial intelligence 39
• The final exam for a course will test topic A or topic B. • Your prior: Pr(Test A = 0.4), Pr(Test B = 0.6)
• You can choose to focus your studying on topic A or topic B, with the following utilities:
• At the final review tutorial, the TA will tell you which topic will be tested. How much is this information worth to you? (After all, it takes time to attend the tutorial!)
COMP-424: Artificial intelligence 40
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com