CS计算机代考程序代写 Agent-based Systems

Agent-based Systems
Paolo Turrini
www.dcs.warwick.ac.uk/~pturrini R p.turrini@warwick.ac.uk

Risks and Decisions Expected utility
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Today
Probabilities with value attached
How to compare options: goals vs utilities Deciding means gambling
Risky moves and rationality
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

The book
Stuart Russell and Peter Norvig
Artificial Intelligence: a modern approach
Chapters 16-17
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

If you snooze you lose. Or do you?
I set the alarm clock(s) to wake up on time for the lectures. Let action St = snooze the alarm clock t times
Will St get me there on time?
Potential problems, e.g.:
1 Stagecoach busses run past me full of people
2 my phone and iPad die together
3 my mum forgets to call me
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

If you snooze you lose. Or do you?
Suppose I believe the following:
P(S0 gets me there on time| . . .) P(S1 gets me there on time| . . .) P(S3 gets me there on time| . . .)
P(S10 gets me there on time| . . .)
Which action should I choose?
= 0.99 = 0.90 = 0.6 = 0.1
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

If you snooze you lose. Or do you?
Suppose I believe the following:
P(S0 gets me there on time| . . .) P(S1 gets me there on time| . . .) P(S3 gets me there on time| . . .)
P(S10 gets me there on time| . . .)
Which action should I choose?
IT DEPENDS on my preferences e.g., missing class vs. sleeping
= 0.99 = 0.90 = 0.6 = 0.1
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Chances + preferences
Utility theory is used to represent and reason with preferences Decision theory = utility theory + probability theory
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Rewards
Sensors Actuators
Rewards Environment
Breeze, Glitter, Smell
Up, Down, Left, Right, Grab, Release, Shoot, Climb
1000 escaping with gold, -1000 dying, -10 using arrow, -1 walking
Squares adjacent to Wumpus are smelly Squares adjacent to pit are breezy
Glitter iff gold is in the same square Shooting kills Wumpus if you are facing it Shooting uses up the only arrow
Grabbing picks up gold if in same square Releasing drops the gold in same square
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Rewards
Sensors Actuators
Rewards
Environment
Breeze, Glitter, Smell
Up, Down, Left, Right, Grab, Release, Shoot, Climb
1000 escaping with gold, -1000 dying, -10 using arrow, -1 walking
Squares adjacent to Wumpus are smelly Squares adjacent to pit are breezy
Glitter iff gold is in the same square Shooting kills Wumpus if you are facing it Shooting uses up the only arrow
Grabbing picks up gold if in same square Releasing drops the gold in same square
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

State space
The universe in which the agent moves is a finite set of states
W = {w1,…,wn}
e.g., the possible grid configurations in the Wumpus World
States can also contain a description of:
the inner state of the agent, e.g., the knowledge base KB
relevant changes happened the history of the game so far
The set of states is our sample space
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Lotteries
A lottery is a probability distribution over the set of states. e.g., for states w1 and w2, and p ∈ [0, 1]
LotteryL1 =[p,w1; (1−p),w2] L is the set of lotteries over W .
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

States and lotteries
Observation: A state w ∈ W can be seen as a lottery : w is assigned probability 1
all other states probability 0
e.g.,
L1 = [1,w1;0,w2;…0,wn]
We get w1 with probability 1, and the rest with probability 0.
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Compound lotteries
Consider now the set L of lotteries over W . Observation: A lottery over L is a lottery over W :
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Compound lotteries
Consider now the set L of lotteries over W . Observation: A lottery over L is a lottery over W :
L1 = [q1,L1;q2,L2;…;qm,Lm]
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Compound lotteries
Consider now the set L of lotteries over W . Observation: A lottery over L is a lottery over W :
L1 = [q1,L1;q2,L2;…;qm,Lm]
= [q1,[p1,w1;p2,w2;…pn,wn];q2,L2;…;qm,Lm] = [q1p1,w1;q1p2,w2;…qnpn,wn;q2,L2;…;qm,Lm] = [q1,L1;q2[r1,w1;r2,w2;…rn,wn];…;qm,Lm]
= [q1,L1;[q2r1,w1;q2r2,w2;…qnrn,wn];…;qm,Lm]
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Compound lotteries
Consider now the set L of lotteries over W . Observation: A lottery over L is a lottery over W :
L1 = [q1,L1;q2,L2;…;qm,Lm]
= [q1,[p1,w1;p2,w2;…pn,wn];q2,L2;…;qm,Lm]
= [q1p1,w1;q1p2,w2;…qnpn,wn;q2,L2;…;qm,Lm]
= [q1,L1;q2[r1,w1;r2,w2;…rn,wn];…;qm,Lm]
= [q1,L1;[q2r1,w1;q2r2,w2;…qnrn,wn];…;qm,Lm]
=[[(q1p1 +q2r1),w1;(q1p2 +q2r2),w2;…(q1pn +q2rn),wn];…;qm,Lm]
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Compound lotteries
Consider now the set L of lotteries over W . Observation: A lottery over L is a lottery over W :
L1 = [q1,L1;q2,L2;…;qm,Lm]
= [q1,[p1,w1;p2,w2;…pn,wn];q2,L2;…;qm,Lm]
= [q1p1,w1;q1p2,w2;…qnpn,wn;q2,L2;…;qm,Lm]
= [q1,L1;q2[r1,w1;r2,w2;…rn,wn];…;qm,Lm]
= [q1,L1;[q2r1,w1;q2r2,w2;…qnrn,wn];…;qm,Lm]
=[[(q1p1 +q2r1),w1;(q1p2 +q2r2),w2;…(q1pn +q2rn),wn];…;qm,Lm] =…
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Compound lotteries
Consider now the set L of lotteries over W . Observation: A lottery over L is a lottery over W :
L1 = [q1,L1;q2,L2;…;qm,Lm]
= [q1,[p1,w1;p2,w2;…pn,wn];q2,L2;…;qm,Lm]
= [q1p1,w1;q1p2,w2;…qnpn,wn;q2,L2;…;qm,Lm]
= [q1,L1;q2[r1,w1;r2,w2;…rn,wn];…;qm,Lm]
= [q1,L1;[q2r1,w1;q2r2,w2;…qnrn,wn];…;qm,Lm]
=[[(q1p1 +q2r1),w1;(q1p2 +q2r2),w2;…(q1pn +q2rn),wn];…;qm,Lm] =…
Compound lotteries can be reduced to simple lotteries
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Comparing lotteries: the plan
Rewards are defined only on some states, and not on others. How do we choose between lotteries?
Here is the plan:
First we introduce a comparison relation between lotteries
Then some intuitive properties this relation ought to have
Then prove that it can be reduced to numbers. Notice: I said numbers, I haven’t said money.
When we don’t have numbers, we can often make them up.
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Preferences
A preference relation is a relation ≽⊆ L × L over the set of lotteries. A ≽ B means that lottery A is weakly preferred to lottery B.
A ≻ B = (A ≽ B and not B ≽ A) means that lottery A is strictly preferred to lottery B.
A ∼ B = (A ≽ B and B ≽ A) means that lottery A the same as lottery B value-wise (indifference).
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Orderability
(A≻B)∨(B ≻A)∨(B ∼A)
‘Either A over B, or B over A, or I don’t care.’
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Transitivity
(A≻B)∧(B ≻C) ⇒ (A≻C)
‘If A is better than B, and B better than C, then A is better than C.’
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Continuity
A≻B≻C ⇒ ∃p [p,A; 1−p,C]∼B
‘A is better than B, which is better than C. ButifyougivemetherightmixofAandC thenthisissameasB.’
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Substituability
A∼B ⇒ [p,A; 1−p,C]∼[p,B;1−p,C]
‘If I’m indifferent to A and B,
then I also don’t care of how likely they are.’
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Monotonicity
A≻B ⇒ (p􏰩q ⇔ [p,A; 1−p,B]≽[q,A; 1−q,B])
‘If I like A more than B,
then I’d rather have a bit more of A than a bit more of B.’
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Rational preferences
Violating the constraints leads to self-evident irrationality. Take transitivity.
If B ≻ C, then an agent who has C would pay (say) 1 cent to get B
If A ≻ B, then an agent who has B would pay (say) 1 cent to get A
IfC ≻A,thenanagentwhohasA would pay (say) 1 cent to get C
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Representation Theorem
Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944)
A preference relation ≽ satisfies the five previous principles if and only if there exists a real-valued function u : L → R such that:
u(A)􏰩u(B) ⇔ A≽B
u([p1,w1; … ; pn,wn]) = Σi piu(wi)
Proof idea:
[⇐] By contraposition. E.g., pick transitivity and show that if the relation is not transitive there is no way of associating numbers to outcomes.
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Representation Theorem
Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944)
A preference relation ≽ satisfies the five previous principles if and only if there exists a real-valued function u : L → R such that:
u(A)􏰩u(B) ⇔ A≽B
u([p1,w1; … ; pn,wn]) = Σi piu(wi)
Proof idea:
[⇐] By contraposition. E.g., pick transitivity and show that if the relation is not transitive there is no way of associating numbers to outcomes.
[⇒] We use the axioms to show that there are infinitely many functions that satisfy them, but they are all ”equivalent” to a unique real-valued utility function.
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Representation Theorem
Michael Maschler, Eilon Solan and Shmiel Zamir
Game Theory (Ch. 2)
Cambridge University Press, 2013.
The main message: Give me any order on outcomes that makes sense and I can turn it into a real-valued function!
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Utility functions
A utility function is a function
u:W→R associating a real number to each state.
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Utility functions
A utility function is a function
u:W→R associating a real number to each state.
Important:
Utility functions are not the same as money. Utility functions are a representation of happiness, goal satisfaction, fulfilment and the like. They are just a mathematical tool to represent a comparison between outcomes. So altruism, unselfishness, and so fort can be modelled using utility functions.
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Expected utility
Let L = [p1,w1;p2,w2;…pn,wn] be a lottery. The expected utility of L is
u(L) = 􏰗 pi × u(wi ) pi ,wi
e.g., rolling a fair six-sided dice, I win 27k if 6 comes out, lose 3k otherwise. The expected utility is = 127k − 53k = 2k.
66
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Humans and expected utility
‘rolling a fair six-sided dice, you win 27k if 6 comes out, lose 3k otherwise’
What would you do?
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Humans and expected utility
‘rolling a fair six-sided dice, you win 27k if 6 comes out, lose 3k otherwise’
What would you do? Let’s change the setup a little bit…
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Humans and expected utility
‘rolling a fair six-sided dice, you win 27k if 6 comes out, lose 3k otherwise’
What would you do? Let’s change the setup a little bit…
Modifying utilities and probabilities we can find the indifference point, passed which we change our mind. Not the same for everyone!
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Humans and expected utility
Tverski and Kahneman’s Prospect Theory: Humans have complex utility estimates Risk aversion, satisfaction level
Figure: Typical empirical data
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Humans and expected utility
Tverski and Kahneman’s Prospect Theory: Humans have complex utility estimates Risk aversion, satisfaction level
Warning! controversial statement:
PT does not refute the principle
of maximization of expected utility.
We can incorporate risk aversion and satisfaction as properties of outcomes.
Figure: Typical empirical data
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs and expected utility
Rewards:
−1000 for dying
0 any other square
What’s the expected utility of going to [3, 1], [2, 2], [1, 3]?
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Using conditional independence contd.
P(P1,3|known, b)
P(P2,2|known, b) Paolo Turrini
= α′ ⟨0.2(0.04 + 0.16 + 0.16), 0.8(0.04 + 0.16)⟩ ≈ ⟨0.31, 0.69⟩
≈ ⟨0.86, 0.14⟩
Risk and Decisions (ii) Agent-based Systems

Beliefs and expected utility
The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is:
u(1, 3) =
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs and expected utility
The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is:
u(1, 3) = u[0.31, −1000; 0.69, 0]
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs and expected utility
The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is:
u(1, 3) = u[0.31, −1000; 0.69, 0] = −310
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs and expected utility
The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is:
u(1, 3) = u[0.31, −1000; 0.69, 0] = −310 u(3, 1) = u(1, 3)
u(2, 2) = u[0.86, −1000; 0.14, 0] = −860
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs and expected utility
The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is:
u(1, 3) = u[0.31, −1000; 0.69, 0] = −310
u(3, 1) = u(1, 3)
u(2, 2) = u[0.86, −1000; 0.14, 0] = −860
Clearly going to [2, 2] from either [1, 2] or [2, 1] is irrational. Either going to [1, 3] or [3, 1] is the rational choice.
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Risky moves
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Actuators
Sensors Breeze, Glitter, Smell
Actuators Turn L/R, Go, Grab, Release, Shoot, Climb
Rewards 1000 escaping with gold, -1000 dying, -10 using arrow, -1 walking
Environment
Squares adjacent to Wumpus are smelly Squares adjacent to pit are breezy
Glitter iff gold is in the same square Shooting kills Wumpus if you are facing it Shooting uses up the only arrow
Grabbing picks up gold if in same square Releasing drops the gold in same square
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Deterministic actions
Actions in the Wumpus World are deterministic
If I want to go from [2,3] to [2, 2] I just go.
The probability of successfully executing action (2, 2) at [2, 3]:
P([2, 2] | [2, 3], (2, 2)) =1
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Stochastic actions
Stochastic actions ‘simulate’ lack of control. The agent can try to go to the intended direction but much can work against:
The environment
The opponents
The agent themselves!
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Stochastic actions
The result of performing a in state w is a lottery over W , i.e., probability distribution over the set of all possible states.
(w,a) = [p1,w1;p2,w2;…pn,wn]
e.g., the agent decides to go from [2, 1] to [2, 2] but: Goes to [2, 2] with probability 0.5
Goes to [3, 1] with probability 0.3
Goes back to [1, 1] with probability 0.1
Bumps their head to the wall and stays in [2, 1] with prob. 0.1 Goes to any other square with probability 0
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs, expected utility and stochastic actions
Rewards:
−1000 for dying
0 any other square
What’s the expected utility of going to [3, 1], [2, 2], [1, 3]?
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs, expected utility and stochastic actions
P(P1,3|known, b)
P(P2,2|known, b) Paolo Turrini
= α′ ⟨0.2(0.04 + 0.16 + 0.16), 0.8(0.04 + 0.16)⟩ ≈ ⟨0.31, 0.69⟩
≈ ⟨0.86, 0.14⟩
Risk and Decisions (ii) Agent-based Systems

Beliefs, expected utility and stochastic actions
Let (w,a) = [p1,L1;p2,L2;…pn,Ln] be the result of performing action a in state w, where each Li is of the form [q1,w1i;q2,w2i,…,qn,wni].
Then the utility of such action is given by:
u(w,a)=􏰗piu(Li)=􏰗pi 􏰗qiu(wi) pi ,Li pi qi ,wi
The expected utility of each outcome times the probability of reaching it.
It is a lottery of lotteries!
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs, expected utility and stochastic actions
u(1, 3) = 0.8 × u[0.31, −1000; 0.69, 0] + 0.1 × u[1, 0]+
+0.1 × u[0.86, −1000; 0.14, 0] = 0.8 × −310 + 0.1 × −860 = −248 − 86 = −334
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs, expected utility and stochastic actions
u(1, 3) = 0.8 × u[0.31, −1000; 0.69, 0] + 0.1 × u[1, 0]+
+0.1 × u[0.86, −1000; 0.14, 0] = 0.8 × −310 + 0.1 × −860 = −248 − 86 = −334
We can can get to [2, 2] from two directions, but by symmetry it’s the same.
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs, expected utility and stochastic actions
u(2, 2) = 0.8 × u[0.86, −1000; 0.14, 0] + 0.1 × u[0.31, −1000; 0.69, 0]+ +0.1 × u[1, 0] = 0.8 × −860 + 0.1 × −310 = −688 − 31 = −719
u(1, 3) = u(3, 1) (because of symmetry)
Going to [2, 2] is still the irrational choice, but not as bad.
The rational choice is either going to [1, 3] or [3, 1].
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs versus knowledge
A purely knowledge-based agent has nothing better to do than choosing at random. Which means 2 u(1, 3) + 1 u(2, 2).
A belief-based agent can improve the payoff using probabilistic reasoning and going for u(1, 3).
Obviously, the more chaotic the decision system the less the impact of reward difference.
33
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

New probability model
Assume pits can be in a square with probability 0.01
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

The fringe
Obviously, we can use exactly the same reasoning!
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Beliefs, expected utility and stochastic actions
With deterministic agents, the chance of death is 0.9902 when trying to go to [2, 2].
With deterministic agents, it tends to 1 with the probability of pit in a square tending to 0;
The more deterministic the agent, the higher the chance of death.
Because the way rewards are defined, the expected utility follows the same pattern.
Belief-based agents perform much better than knowledge-based ones
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Today’s class
Utility, lotteries and preferences Maximisation of expected utility
Stochastic actions
Knowledge-based versus belief-based agents
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Coming next
Game Theory = Decision Theory with many agents
Paolo Turrini Risk and Decisions (ii) Agent-based Systems

Related Posts