What to do?
(40 Acres and a Mule Filmworks/Universal Pictures)
⃝c -Trenn, King’s College London 2
What to do?
(mystorybook.com/books/42485)
⃝c -Trenn, King’s College London 3
What to do?
( & /Google )
⃝c -Trenn, King’s College London 4
Sequential decision making?
(mystorybook.com/books/42485)
Ultimately, we are interested in sequential decision making One decision leads to another.
Each decision depends on the ones before, and affects the ones after.
⃝c -Trenn, King’s College London 5
How to decide what to do
Start simple. Single decision.
Consider being offered a bet in which you pay £2 if an odd number is rolled on a die, and win £3 if an even number appears.
Is this a good bet?
⃝c -Trenn, King’s College London 6
How to decide what to do
Consider being offered a bet in which you pay £2 if an odd number is rolled on a die, and win £3 if an even number appears.
Is this a good bet?
To analyse this, we need the expected value of the bet.
⃝c -Trenn, King’s College London 7
How to decide what to do
We do this in terms of a random variable, which we will call X. X can take two values:
3 if the die rolls odd ́2 if the die rolls even
And we can also calculate the probability of these two values
P pX “ 3q “ 0.5 P pX “ ́2q “ 0.5
⃝c -Trenn, King’s College London 8
How to decide what to do
The expected value is then the weighted sum of the values, where the weights are the probabilities.
Formally the expected value of X is defined by: ÿ
ErXs “
where the summation is over all values of k for which P pX “ kq ‰ 0.
k
⃝c -Trenn, King’s College London 9
k ̈ PpX “ kq
How to decide what to do
Here the expected value is:
ErXs “ 3 ̈ 0.5 ` p ́2q ̈ 0.5 Thus the expected value of X, ErXs, is £0.5, and we take
this to be the value of the bet.
⃝c -Trenn, King’s College London 10
How to decide what to do
Do you take the bet?
Compare that £0.5 with not taking the bet. Not taking the bet has (expected) value £0
⃝c -Trenn, King’s College London 11
How to decide what to do
£0.5 is not the value you will get.
You can think of it as the long run average if you were offered the bet many times.
Again, even after a large number of rounds you won’t get that value (there will be some noise)
⃝c -Trenn, King’s College London 12
Sometimes the unlikely event can occur … it doesn’t mean the prediction was bad
(fivethirtyeight.com)
⃝c -Trenn, King’s College London 13
Example
Pacman is at a T-junction
Based on their knowledge, estimates that if they go Left:
‚ Probabilityof0.3ofgettingapayoffof10
‚ Probabilityof0.2ofgettingapayoffof1
‚ Withtheremainingprobabilityapayoffof-5
What is the expected value of Left?
⃝c -Trenn, King’s College London 14
Example
Pacman is at a T-junction
Based on their knowledge, estimates that if they go Left:
‚ Probabilityof0.3ofgettingapayoffof10
‚ Probabilityof0.2ofgettingapayoffof1
‚ Withtheremainingprobabilityapayoffof-5
What is the expected value of Left?
ErXs “ 0.3 ̈ 10 ` 0.2 ̈ 1 ` p1 ́ 0.3 ́ 0.2q ̈ p ́5q “ 3 ` 0.2 ́ 2.5 “ 0.7
⃝c -Trenn, King’s College London 15
How to decide what to do
Anotherbet: youget£1ifa2ora3isrolled,£5ifasixisrolled,andpay3 otherwise.
What’s the expected value?
⃝c -Trenn, King’s College London 16
How to decide what to do
Anotherbet: youget£1ifa2ora3isrolled,£5ifasixisrolled,andpay3 otherwise.
What’s the expected value?
ErXs “ 26 ̈ 1 ` 16 ̈ 5 ` 36 ̈ p ́3q “ ́13
⃝c -Trenn, King’s College London 17
How to decide what to do
What happens if you repeat this bet 10 times: you get £1 if a 2 or a 3 is rolled, £5 if a six is rolled, and pay 3 otherwise.
What’s the expected value now? (i.e., after all 10 games)
⃝c -Trenn, King’s College London 18
How to decide what to do
Let Xi,i P t1,2,…,10u and X “ ř10 Xi i“1
The expected value here is:
ErXis “ 26 ̈ 1 ` 16 ̈ 5 ` 36 ̈ p ́3q “ ́13
Thus, by linearity of expectation (i.e., ErαY ` Zs “ αErY s ` ErZs, for all Y, Z
and α),
ErXs “ E Xi i“1
“
⃝c -Trenn, King’s College London
19
«ff
10 10
ÿÿ
1 10 ErXis “ 10 ̈ ErXis “ ́10 ̈ 3 “ ́ 3
i“1
How an agent might decide what to do
Consider an agent with a set of possible actions A. Each a P A has a set of possible outcomes sa. Which action should the agent pick?
⃝c -Trenn, King’s College London 20
How an agent might decide what to do
The action a ̊ which a rational agent should choose is that which maximises the agent’s utility.
In other words the agent should pick:
a ̊ “ arg max upsaq, aPA
‚ where sa is the state obtained by choosing action a and ‚ upsaqistheutilityofthatstate
The problem is that in any realistic situation, the resulting state is probabilistic.
Instead we have to calculate the expected utility of each action and make the choice on the basis of that.
⃝c -Trenn, King’s College London 21
How an agent might decide what to do
In other words, for each action a with a set of outcomes sa, the agent should
calculate:
ÿ
Erupaqs “
and pick the best. Here: decide between Erupa1qs and Erupa2qs
s a2 s6 a1
s5
s1 Psa
ups1q. Prpsa “ s1q
s s4 s1 3
s2
⃝c -Trenn, King’s College London
22
How an agent might decide what to do
That is it picks the action that has the greatest expected utility. ‚ Therightthingtodo.
(40 Acres and a Mule Filmworks/Universal Pictures)
Here “rational” means “rational in the sense of maximising expected utility”.
⃝c -Trenn, King’s College London 23
Example
Pacman is at a T-junction
Based on their knowledge, estimates that if they go Left:
‚ Probabilityof0.3ofgettingapayoffof10 ‚ Probabilityof0.2ofgettingapayoffof1 ‚ Probabilityof0.5ofgettingapayoffof-5
If they go Right:
‚ Probabilityof0.5ofgettingapayoffof-5
‚ Probabilityof0.4ofgettingapayoffof3 ‚ Probabilityof0.1ofgettingapayoffof15
Should they choose Left or Right (MEU)?
⃝c -Trenn, King’s College London 24
Stochastic
Note that we are dealing with stochastic actions here. s a2 s6
a1
s5
s s4 s1 3
s2
A given action has several possible outcomes.
We don’t know, in advance, which one will happen.
⃝c -Trenn, King’s College London 25
Stochastic
(fivethirtyeight.com)
A lot like life.
⃝c -Trenn, King’s College London 26
Limitations of our notion of “rational”
Consider the following game. Let’s say your monthly income is m. W.p. 2{3 I double your income every month
With the remaining probability you have to give me your monthly income every month
ExpectedvalueifplayingErIncomes“2m23 `013 “ 34m Expected value if not playing ErIncomes “ m.
Would you play?
⃝c -Trenn, King’s College London 27