CS计算机代考程序代写 Bayesian algorithm Agent-based Systems

Agent-based Systems
Paolo Turrini
™ www.dcs.warwick.ac.uk/~pturrini R p.turrini@warwick.ac.uk

Opponent Modelling
Aggressive Moves
Paolo Turrini Opponent Modelling Agent-based Systems

Plan for today
We have seen extensive games and backwards induction. Now we look at situations in which this does not work.
games are too big to be calculated
players have weaknesses
We will look at how to play aggressively: maximising the expected reward by tricking the opponents into playing bad moves.
If you wanna know more:
Davide Grossi and Paolo Turrini
Short Sight in Extensive Games
International Conference on Autonomous Agents and Multiagent Systems, 2012
Paolo Turrini
Computing Rational Decisions in Extensive Games with Limited Foresight
AAAI Conference on Advances on Artificial Intelligence, 2015
Paolo Turrini Opponent Modelling Agent-based Systems

Rethinking backwards induction
At the beginning of the twentieth century, Zermelo proved a proposition which can be interpreted, “chess is a trivial game.” (…) To see this we can use the backwards induction technique
Ariel Rubinstein
Modelling Bounded Rationality.
MIT Press, 1998;
Paolo Turrini Opponent Modelling Agent-based Systems

Rethinking backwards induction
1 2 1 2 (35,30)
(5, 5) (0, 15) (30, 10) (25, 35)
Paolo Turrini Opponent Modelling Agent-based Systems

Rethinking backwards induction
1 2 1 2 (35,30)
(5, 5) (0, 15) (30, 10) (25, 35)
Paolo Turrini Opponent Modelling Agent-based Systems

Rethinking backwards induction
1 2 1 2 (35,30)
(5, 5) (0, 15) (30, 10) (25, 35)
Paolo Turrini Opponent Modelling Agent-based Systems

Rethinking backwards induction
1 2 1 2 (35,30)
(5, 5) (0, 15) (30, 10) (25, 35)
Paolo Turrini Opponent Modelling Agent-based Systems

Rethinking backwards induction
1 2 1 2 (35,30)
(5, 5) (0, 15) (30, 10) (25, 35)
Paolo Turrini Opponent Modelling Agent-based Systems

Rethinking backwards induction
1 2 1 2 (35,30)
(5, 5) (0, 15) (30, 10) (25, 35)
Backwards Induction implicitly assumes:
that everyone is playing the same game that everyone is playing it rationally
that all of the above is common knowledge
Paolo Turrini Opponent Modelling Agent-based Systems

Rethinking backwards induction
But [in games like chess] this calculation requires going through a huge number of steps, something no human being can accomplish. 1
Modeling games with limited foresight remains a great challenge [and the frameworks studied thus far] fall short of capturing the spirit of limited- foresight reasoning.
Ariel Rubinstein
Modelling Bounded Rationality.
MIT Press, 1998;
1Nor supercomputers for that matter.
Paolo Turrini Opponent Modelling Agent-based Systems

Calculating Variations
8 7 6 5 4 3 2 1
0Z0Z0j0Z
Z0Z0ZPZ0
0Z0ZPZ0Z
Z0Z0Z0J0
0Z0Z0Z0Z
ZpZ0Z0Z0
0Z0Z0Z0Z
Z0Z0Z0Z0
abcdefgh
Paolo Turrini
Opponent Modelling
Agent-based Systems

Calculating Variations
8 7 6 5 4 3 2 1
0Z0Z0j0Z
Z0Z0ZPZ0
0Z0ZPZ0Z
Z0Z0Z0J0
0Z0Z0Z0Z
Z0Z0Z0Z0
0o0Z0Z0Z
Z0Z0Z0Z0
abcdefgh
Paolo Turrini
Opponent Modelling
Agent-based Systems

Calculating Variations
8 7 6 5 4 3 2 1
0Z0Z0j0Z
Z0Z0ZPZ0
0Z0ZPJ0Z
Z0Z0Z0Z0
0Z0Z0Z0Z
Z0Z0Z0Z0
0o0Z0Z0Z
Z0Z0Z0Z0
abcdefgh
Paolo Turrini
Opponent Modelling
Agent-based Systems

Calculating Variations
8 7 6 5 4 3 2 1
0Z0Z0j0Z
Z0Z0ZPZ0
0Z0ZPJ0Z
Z0Z0Z0Z0
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z
ZqZ0Z0Z0
abcdefgh
Paolo Turrini
Opponent Modelling
Agent-based Systems

Calculating Variations
8 7 6 5 4 3 2 1
0Z0Z0j0Z
Z0Z0OPZ0
0Z0Z0J0Z
Z0Z0Z0Z0
0Z0Z0Z0Z
Z0Z0Z0Z0
0Z0Z0Z0Z
ZqZ0Z0Z0
abcdefgh
Paolo Turrini
Opponent Modelling
Agent-based Systems

Short Sight
Black loses for two reasons:
Partial view of the game Wrong evaluation
Chess-like games:
Limited foresight
Heuristic assessment of intermediate positions
Paolo Turrini Opponent Modelling Agent-based Systems

Calculating Variations
8 7 6 5 4 3 2 1
0Z0Z0j0Z
Z0Z0ZPZ0
0Z0ZPZ0Z
Z0Z0Z0Z0
0Z0Z0Z0J
ZpZ0Z0Z0
0Z0Z0Z0Z
Z0Z0Z0Z0
abcdefgh
Paolo Turrini Opponent Modelling Agent-based Systems

Calculating Variations
8 7 6 5 4 3 2 1
White is lost
0Z0Z0j0Z
Z0Z0ZPZ0
0Z0ZPZ0Z
Z0Z0Z0Z0
0Z0Z0Z0J
ZpZ0Z0Z0
0Z0Z0Z0Z
Z0Z0Z0Z0
abcdefgh
Paolo Turrini Opponent Modelling Agent-based Systems

Calculating Variations
8 7 6 5 4 3 2 1
Should he or she resign?
0Z0Z0j0Z
Z0Z0ZPZ0
0Z0ZPZ0Z
Z0Z0Z0Z0
0Z0Z0Z0J
ZpZ0Z0Z0
0Z0Z0Z0Z
Z0Z0Z0Z0
abcdefgh
Paolo Turrini Opponent Modelling Agent-based Systems

The idea
White should not resign if (s)he believes that Black won’t be able to solve the problem posed by White after Kg5!
Kg5 is technically as good as any other move, but it is aggressive and might turn out to be winning.
Paolo Turrini Opponent Modelling Agent-based Systems

Outline
1 The formal setup
2 An intuitive solution
3 Discussion
Paolo Turrini Opponent Modelling Agent-based Systems

The formal setup
Extensive Games
Definition
An extensive form game is a tuple (N, H, t, Σi , o, ≽i ) where: N is a set of players
H is a prefix-closed set of histories (sequences of actions);
t is a turn function, assigning a player to each non terminal history;
Σi is a set of strategies available to i i.e., functions that assign an action to each history h with turn(h) = i;
o is an outcome function assigning to each strategy profile σ ∈ 􏰣i∈N Σi the induced terminal history;
≽i is a total preorded on the set of terminal histories, i.e., the preference of i;
Paolo Turrini Opponent Modelling Agent-based Systems

The formal setup
An extensive game
1
50:50 70:30 222
Acc Rej Acc Rej Acc Rej (5,5) (0,0) (7,3) (0,0) (9,1) (0,0)
90:10
Paolo Turrini Opponent Modelling Agent-based Systems

The formal setup
Sight function
Definition (Sight function)
For an extensive game G a (short) sight function is a function s associating to each non-terminal history h a finite, non-empty and prefix-closed set of histories in H extending h.
The idea: at each decision point h, s(h) is the set of continuations that the player is considering before taking a decision;
Paolo Turrini Opponent Modelling Agent-based Systems

The formal setup
A sight
b(A) de
f gB
A
C
Paolo Turrini
Opponent Modelling
Agent-based Systems

The formal setup
Histories sequences
Definition (Histories-Sequences)
Let G be an extensive game and s a sight function. A histories-sequence q is a sequence of histories of the form (h0, h1, h2, · · · , hk ) such that
hj ∈ s(h0) for every j ∈ {1, 2, · · · k}, i.e., histories following h0 in the sequence are histories within the sight of the player moving at h0;
hj ▹hj+1 for each j with 0 􏰤 j < k, i.e., each history is a prefix of the ones with higher index; They encode sight-compatible higher-order points of view: (h0, h1, h2, · · · , hk ) is what turn(h0) thinks turn(h1) thinks turn(h2) thinks... Notice there can be jumps! hj+1 need not be hj plus one action. Paolo Turrini Opponent Modelling Agent-based Systems The formal setup A sight b(A) de f gB A Question: What are the possible histories-sequences? C Paolo Turrini Opponent Modelling Agent-based Systems The formal setup Belief Structures Definition (Epistemic games with short sight) Let G be a game. To each histories-sequence q = (h0, h1, h2, · · · , hk ) we associate an extensive game G(q), that is a subtree of s(h0) following hk; Intuitively, G(q) is what turn(h0) thinks turn(h1) thinks turn(h2) thinks... turn(hk) can see and how turn(h0) thinks turn(h1) thinks turn(h2) thinks... turn(hk) evaluates it; Notice that the preference relation on the terminal nodes of G(q) is completely independent of the preference relation in the underlying games. Paolo Turrini Opponent Modelling Agent-based Systems The formal setup A sight with beliefs b(A) de b(A)b(B) f C gB A Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Rethinking Backwards Induction b(A) de f gB A A can only see b(A) She can evaluate its terminal nodes C Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Rethinking Backwards Induction b(A) de f gB A A needs a belief about what C can see, b(A)b(C) ... and about how C evaluates those terminal nodes C Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Rethinking Backwards Induction b(A) de f gB A Then she can start solving the game; C Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Rethinking Backwards Induction b(A) de C gB A b(A)b(B) f To determine what B will do A has a belief about what B can see b(A)b(B) ... and about B evaluates those terminal nodes Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Rethinking Backwards Induction b(A) de C gB A b(A)b(B) f But on top of that A needs a belief about what B believes C can see ... and how B believes C evaluates those terminal nodes Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Rethinking Backwards Induction b(A) de b(A)b(B) f and then she keeps solving the game... C gB A Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Rethinking Backwards Induction b(A) de b(A)b(B) f and then she keeps solving the game... C gB A Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Rethinking Backwards Induction b(A) de f gB A and, at last, A will choose her favourite continuation, given her beliefs and her evaluation. C Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Sight-compatible epistemic solution A Sight-Compatible Epistemic Solution is the composition of best moves of players at each history; Each such move is a best response to what the current player believes other players will do; Each such belief is supported by all higher-order beliefs, compatible with the player’s sight, about what the opponents can perceive and how they evaluate it. Paolo Turrini Opponent Modelling Agent-based Systems An intuitive solution Complexity Proposition Computing a SCES is PTIME-complete, on the size of the (finite) game description. The algorithm constructs a SCES in PTIME; It’s PTIME hard because backwards induction is PTIME complete (Szymanik 2014), and we can show that backwards induction is a special SCES. Paolo Turrini Opponent Modelling Agent-based Systems Discussion Monte-Carlo Tree Search Selecting random endgames Inducing a preference relation over non-terminal histories Paolo Turrini Opponent Modelling Agent-based Systems Discussion Limited Foresight and MCTS Suppose we are playing Connect 4 and our opponent has a fixed sight: they can see either 1,2,3 or 4 steps ahead. Then we can start saying things like: "if she saw 4 steps ahead she wouldn’t have made this move" Using Monte-Carlo Tree Search as a way of evaluate non-terminal histories, Cataldo Azzariti showed that MCTS+Limited Foresight+Bayesian Updates can make accurate predictions and plays better than MCTS only. Cataldo Azzariti Adaptive MCTS Game-Play Algorithms Masters Thesis, Imperial College London (2016). Ongoing research line, with plenty of open questions to answer. Paolo Turrini Opponent Modelling Agent-based Systems Discussion Discussion Beyond backwards induction Players try to exploit their opponents’ believed weaknesses Built on complex beliefs about game restrictions relativised backwards induction no information sets What else? Evaluation criteria the way we do it are still caveman modelling More restrictions and reflection on the computational properties Is checking that h is SCES also PTIME complete, can fixpoint logics be of any help? It’s unawareness (Halpern, Rêgo, Heifets, Meier, Schipper), but what kind? Specific restrictions allow SCES! Paolo Turrini Opponent Modelling Agent-based Systems