CS计算机代考程序代写 Bayesian Agent-based Systems

Agent-based Systems
Paolo Turrini
™ www.dcs.warwick.ac.uk/~pturrini R p.turrini@warwick.ac.uk

Plan for Today
In this second lecture on extensive games, we are going to introduce the notion of imperfect information:
uncertainty about the current state of the game being played translation to (and now also from) the normal form imperfect-information games with perfect recall
subtle differences between mixed and behavioural strategies Kuhn’s Theorem: condition under which they coincide Computer Poker: example for an application of these concepts
Much of this (and more) is also covered in Chapter 5 of the Essentials.
K. Leyton-Brown and Y. Shoham.
Essentials of Game Theory: A Concise, Multidisciplinary Introduction
Claypool Publishers, 2008. Chapter 5.
Paolo Turrini Imperfect Information Agent-based Systems

Terminology: Incomplete vs. Imperfect Information
Distinguish imperfect information from incomplete information:
Incomplete information: uncertainty about the game itself, such as the utilities of other players (􏰫→ Bayesian games)
Example: Chess in practice (you don’t know what your opponent can see) Imperfect information: uncertainty about current state of play, such as the actions taken by others (􏰫→ today’s games)
Example: Poker (you don’t know what cards your opponent drew)
Paolo Turrini Imperfect Information Agent-based Systems

Extensive Games with Imperfect Information
A strategic imperfect-information game (in extensive form) is a tuple ⟨N,A,H,Z,i,A,σ,u,∼⟩, where ⟨N,A,H,Z,i,A,σ,u⟩ is a (finite) extensive-form game (with perfect information), i.e.,
N = {1,…,n} is a finite set of players;
A is a (single) set of actions;
H is a set of choice nodes (non-leaf nodes of the tree);
Z is a set of outcome nodes (leaf nodes of the tree);
i : H → N is the turn function, fixing whose turn it is when; A : H → 2A is the action function, fixing the playable actions; σ : H × A → H ∪ Z is the (injective) successor function;
u = (u1,…,un) is a profile of utility functions ui : Z → R;
and ∼ = (∼1,…,∼n) is a profile of equivalence relations ∼i on H for which h ∼i h′ implies i(h) = i(h′) and A(h) = A(h′) for all h, h′ ∈ H.
Paolo Turrini Imperfect Information Agent-based Systems

Modelling Imperfect Information
We use ∼i to relate states of the game that player i cannot distinguish.
Write Hi := {h ∈ H | i(h) = i} for the set of choice nodes in which it is player i’s turn. We only really need to know ∼i on Hi (not H \Hi).
Thus, for every player i ∈ N, ∼i is an equivalence relation on Hi such that h ∼i h′ implies i(h) = i(h′) and A(h) = A(h′) for all h, h′ ∈ Hi .
Remark: Constraints on i and A needed to ensure indistinguishability. Otherwise, you might infer extra information from i or A.
Paolo Turrini Imperfect Information Agent-based Systems

Example
An imperfect-information game where Player 1 cannot tell apart the two lower choice nodes in which it is her turn:
1
LR
2 (1,1)
AB
∼1 11
XY XY
(0, 0) (2, 4) (2, 4) (0, 0)
Remark: In later examples we will omit the subscript in ∼1, as it is always clear from
context who is uncertain.
Paolo Turrini Imperfect Information Agent-based Systems

Equivalence Classes
The indistinguishability relation ∼i partitions the space Hi . Notation:
The set of all choice nodes that are indistinguishable from node h as far as player i
is concerned (equivalence class): [h]∼i := {h′ ∈Hi |h∼i h′}
The set of all such equivalence classes for player i (quotient set): Hi/∼i := {[h]∼i |h∈Hi}
Paolo Turrini Imperfect Information Agent-based Systems

Pure Strategies
A pure strategy for player i maps any given equivalence class [h]∼i to an action that is playable in (all of!) the choice nodes in [h]∼i .
So: a function αi : Hi/∼i → A with αi([h]∼i ) ∈ A(h′) for all h′ ∈ [h]∼i .
Paolo Turrini Imperfect Information Agent-based Systems

Alternative Definition of Pure Strategies
Conceptually, pure strategies map equivalence classes to actions.
But mathematically, we could just as well choose to define pure strategies as mapping choice nodes to actions:
Apurestrategyisafunctionαi :Hi →Awithαi(h)∈A(h)forallh∈Hi, such that αi (h) = αi (h′) whenever h ∼i h′.
Thus, we can think of an imperfect-information game as a standard extensive game where certain strategies are not permitted.
Remark: This does not mean that imperfect-information games are special cases of extensive games (the opposite is true!).
Paolo Turrini Imperfect Information Agent-based Systems

Translation to Normal Form
We can translate imperfect-information games into normal-form games, just as we did for (perfect-information) extensive games.
Clear once you understand that incomplete-information games are just extensive games with restricted pure strategies (􏰫→ previous slide).
Thus: full machinery available (mixed Nash equilibria, . . . ).
Paolo Turrini Imperfect Information Agent-based Systems

Translation from Normal Form
We have seen that not every normal-form game can be translated into a corresponding (perfect-information) extensive game . . .
In contrast, every NF-game can be translated into an II-game. Use indistinguishability relation to “hide” sequential nature. Example:
CD
Row
CD
Col ∼ Col CD CD (−10, −10) (−25, 0) (0, −25) (−20, −20)
−10 −10
0
−25
−25 0
−20 −20
C
D
Paolo Turrini
Imperfect Information Agent-based Systems

Exercise: High-Risk/Low-Risk Gamble
Consider the following imperfect-information game:
1
High
22
Safe Gamble Gamble Safe
(5,5) 1 ∼ 1 (5,5) Left Right Left Right
(9, 1) (1, 9) (4, 6) (6, 4)
This meets our definitions. Still: what’s “wrong” with this game?
Low
Paolo Turrini Imperfect Information Agent-based Systems

Perfect Recall
In the game on the previous slide, in the final round Player 1 is assumed to forget the action she played in the first round . . .
An imperfect-information game has perfect recall, if h ∼i h0 implies h = h0 for the root node h0 and all players i ∈ N, and if the following hold for all i ∈ N, all choice nodes h,h′ ∈ H, and all actions a,a′ ∈ A:
(i) if σ(h, a) ∼i σ(h′, a′), then h ∼i h′
(ii) if σ(h,a) ∼i σ(h′,a′) and i(h) = i, then a = a′
Thus, in a perfect-recall game no player i can resolve indistinguishabilityby inspecting the history of (i) nodes visited or (ii) actions she played.
Note: Every perfect-information extensive game has perfect recall.
Remark: Games w/o perfect recall can make sense in certain contexts. Think of having different agents play on your behalf in different rounds and suppose communication between them is limited.
Paolo Turrini Imperfect Information Agent-based Systems

Mixed vs. Behavioural Strategies
LetHi/∼i ={[h1i]∼i,…,[hmi ]∼i}bethesetofequivalenceclassesofchoicenodeswhere it is player i’s turn.
Recall: A pure strategy for i is some αi ∈ A(hi1) × · · · × A(him).
Thus: A mixed strategy for i is some si ∈ Π(A(hi1) × · · · × A(him)).
Clean definition. Nice. But are mixed strategies the right concept? More natural to assume players mix locally in each choice node . . .
A behavioural strategy for i is some si ∈ Π(A(hi1)) × · · · × Π(A(him)).
Issue: Can we work with behavioural instead of mixed strategies?
Definition: Two strategies for player i are called outcome-equivalent if, for every partial profile of pure strategies of the other players, the induced probability distributions over outcomes are the same.
Paolo Turrini Imperfect Information Agent-based Systems

Example: Mixed ̸= Behavioural Strategy
In this one-player game, Ann is asked to play two actions in sequence and assumed to forget what she did after the first action got played. She wins (utility 1) if she chooses the same action twice.
L Ann
Ann

R Ann
Pure strate- gies:
LL, LR, RL, RR
LRLR (1) (0) (0) (1)
Suppose we want a non-pure strategy that maximises expected utility: Mixed strategy ( 1 , 0, 0, 1 ) does the job: EU = 1 · 1 + 1 · 1 = 1
Any behavioural strategy ((p, 1−p), (q, 1−q)) must specify probabilities p to choose L first and q to choose L second:
EU = p · q + (1−p) · (1−q) = 1 + 2pq − p − q < 1 unless p = q = 0, 1 22 22 Paolo Turrini Imperfect Information Agent-based Systems Another Example: Mixed ̸= Behavioural Strategy Recall: Just now we saw that there exist mixed strategies that do not admit any outcome-equivalent behavioural strategy. It gets worse. Here is another one-player game: ∼ Ann LR Ann (0) LR So Ann wins if she manages to play two distinct actions. But she forgets what (and whether!) she played. Her pure strategies are L and R (more precisely, H/∼ → {L,R}). (0) (1) Playing the behavioural strategy ( 1 , 1 ), she wins 25% of the time. 22 Playing whatever mixed strategy (picking L or R), she never wins. Thus: There exist behavioural strategies that do not admit any outcome-equivalent mixed strategy. End of story? Paolo Turrini Imperfect Information Agent-based Systems Kuhn’s Theorem Good news: Theorem (Kuhn, 1953) In a (finite) imperfect-information game with perfect recall, for any given mixed strategy of a given player, there exists an outcome-equivalent behavioural strategy for the same player. Proof on next slide. Remark: The converse holds as well (and sometimes is considered part of Kuhn’s Theorem). Proof omitted. Thus: We can freely move between our two types of randomised strategies. Nice. H.W. Kuhn. Extensive Game and the Problem of Information. Harold W. Kuhn (1925–2014) In: H.W. Kuhn and A.W. Tucker (eds.), Contr. to the Th. of Games, 1953. Paolo Turrini Imperfect Information Agent-based Systems Proof Claim: For any perfect-recall game, for any given mixed strategy si of player i we can find an outcome-equivalent behavioural strategy s⋆. Proof: We need to fix s⋆(h)(a) for any h ∈ Hi and a ∈ A(h) ... i Let p(si , h) denote the probability of passing through h when player i plays si and the others play pure strategies consistent with reaching h. Let p(si , σ(h, a)) be definied analogously. Fix: ⋆ p(si,σ(h,a)) 􏰟 ⋆ 1 􏰠 si (h)(a) := p(si,h) and si (h)(a) := |A(h)| if p(si,h) = 0 Clear: probabilities add up to 1 in each node: 􏰗 s⋆(h)(a) = 1 􏰨 i a∈A(h) But to be a well-defined behavioural strategy, s⋆ also must respect ∼i: h ∼i h′ ⇒ s⋆(h)(a) = s⋆(h′)(a) ii Due to perfect recall, the actions played by i on the path to h are the same as those on the path to h′. Nothing else affects probabilities. 􏰨 i Paolo Turrini Imperfect Information Agent-based Systems Application: Computer Poker The well-known recent work on building an intelligent agent to play Poker (for two players and a simplified set of rules) models the game as an extensive imperfect-information game and attempts to compute Nash equilibria in behavioural strategies. The main challenge is in dealing with the sheer size of such games. This is tackled using a mix of game theory, abstraction techniques, combinatorial optimisation, and machine learning. T.W. Sandholm. Solving Imperfect-Information Games. Science, 347(6218):122–123, 2015. Paolo Turrini Imperfect Information Agent-based Systems Summary This concludes our review of extensive games in general and of imperfect-information games in particular: model: basic extensive games + indistinguishability relation same expressive power as normal form: translation possible behavioural strategies more natural than usual mixed strategies Kuhn’s Theorem: for imperfect-information games of perfect recall, we can rewrite any mixed strategy as a behavioural strategy What next? Cracking poker with Counterfactual Regret Minimisation. Paolo Turrini Imperfect Information Agent-based Systems