Algorithmic Game Theory and Applications
Lecture 10: Games in Extensive Form
Copyright By PowCoder代写 加微信 powcoder
the setting and motivation
Most games in “real life” are not in “strategic form”: players don’t pick their entire strategies independently (“simultaneously”). Instead, the game transpires over time, with players making “moves” to which other players react with “moves”, etc. Examples: chess, poker, bargaining, dating, . . .
the setting and motivation
Most games in “real life” are not in “strategic form”: players don’t pick their entire strategies independently (“simultaneously”). Instead, the game transpires over time, with players making “moves” to which other players react with “moves”, etc. Examples: chess, poker, bargaining, dating, . . .
A “game tree” looks something like this:
Player II:
But we may also need some other “features”.
chance, information, etc.
Some tree nodes may be chance (probabilistic) nodes, controlled by no player (by “nature”). (Poker, Backgammon.) Also, a player may not be able to distinguish between several of its “positions” or “nodes”, because not all information is available to it. (Think Poker, with opponent’s cards hidden.) Whatever move a player employs at a node must be employed at all nodes in the same “information set”.
Player II:
information−set LRLRLRLR
To define extensive form games, we have to formalize all these.
Trees: a formal definition
Let Σ = {a1,a2,…,ak} be an alphabet. A tree over Σ is a setT⊆Σ∗,ofnodesw∈Σ∗ suchthat:ifw=w′a∈T, then w′ ∈ T. (I.e., it is a prefix-closed subset of Σ∗.)
For a node w ∈ T, the children of w are
ch(w) = {w′ ∈ T | w′ = wa , for some a ∈ Σ}. For w ∈ T, let Act(w) = {a ∈ Σ | wa ∈ T} be “actions” available at w.
A leaf (or terminal) node w ∈ T is one where ch(w) = ∅. LetLT ={w∈T|waleaf}.
A (finite or infinite) path π in T is a sequence
π = w0,w1,w2,… of nodes wi ∈ T, where if wi+1 ∈ T then wi+1 = wia, for some a ∈ Σ. It is a complete path (or play) if w0 = ε and every non-leaf node in π has a child in π.
Let ΨT denote the set of plays of T.
Tree example
games in extensive form
A Game in Extensive Form, G, consists of: 1. A set N = {1,…,n} of players.
games in extensive form
A Game in Extensive Form, G, consists of:
1. A set N = {1,…,n} of players.
2. A tree T, called the game tree, over some Σ.
games in extensive form
A Game in Extensive Form, G, consists of:
1. A set N = {1,…,n} of players.
2. A tree T, called the game tree, over some Σ.
3. A partition of the tree nodes T into disjoint sets Pl0,Pl1,…,Pln, such that T = ni=0 Pli. Where Pli, i ≥ 1, denotes the nodes of player i, where it is player i’s turn to move. (And Pl0 denotes the “chance”/”nature” nodes.)
games in extensive form
A Game in Extensive Form, G, consists of:
1. A set N = {1,…,n} of players.
2. A tree T, called the game tree, over some Σ.
3. A partition of the tree nodes T into disjoint sets Pl0,Pl1,…,Pln, such that T = ni=0 Pli. Where Pli, i ≥ 1, denotes the nodes of player i, where it is player i’s turn to move. (And Pl0 denotes the “chance”/”nature” nodes.)
4. For each “nature” node, w ∈ Pl0, a probability distribution qw : Act(w) → R over its actions: qw(a) ≥ 0, &
q (a)=1. a∈Act(w ) w
games in extensive form
A Game in Extensive Form, G, consists of:
1. A set N = {1,…,n} of players.
2. A tree T, called the game tree, over some Σ.
3. A partition of the tree nodes T into disjoint sets
Pl0,Pl1,…,Pln, such that T = ni=0 Pli. Where Pli, i ≥ 1,
denotes the nodes of player i, where it is player i’s turn to
move. (And Pl0 denotes the “chance”/”nature” nodes.)
4. For each “nature” node, w ∈ Pl0, a probability distribution
qw : Act(w) → R over its actions: qw(a) ≥ 0, &
q (a)=1. a∈Act(w ) w
5. For each player i ≥ 1, a partition of Pli into disjoint
non-empty information sets Infoi ,1 , . . . , Infoi ,ri , such that
Pli = ri Infoi,j. Moreover, for any i,j, & all nodes j=0
w,w′ ∈ Infoi,j, Act(w) = Act(w′). (I.e., the set of possible “actions” from all nodes in one information set is the same.)
games in extensive form
A Game in Extensive Form, G, consists of:
1. A set N = {1,…,n} of players.
2. A tree T, called the game tree, over some Σ.
3. A partition of the tree nodes T into disjoint sets
Pl0,Pl1,…,Pln, such that T = ni=0 Pli. Where Pli, i ≥ 1,
denotes the nodes of player i, where it is player i’s turn to
move. (And Pl0 denotes the “chance”/”nature” nodes.)
4. For each “nature” node, w ∈ Pl0, a probability distribution
qw : Act(w) → R over its actions: qw(a) ≥ 0, &
q (a)=1. a∈Act(w ) w
5. For each player i ≥ 1, a partition of Pli into disjoint
non-empty information sets Infoi ,1 , . . . , Infoi ,ri , such that
Pli = ri Infoi,j. Moreover, for any i,j, & all nodes j=0
w,w′ ∈ Infoi,j, Act(w) = Act(w′). (I.e., the set of possible “actions” from all nodes in one information set is the same.) 6. For each player i, a function ui : ΨT → R, from (complete) plays to the payoff for player i.
explanation and comments
Question: Why associate payoffs to “plays” rather than to leaves at the “end” of play?
explanation and comments
Question: Why associate payoffs to “plays” rather than to leaves at the “end” of play?
Answer: We in general allow infinite trees. We will later consider “infinite horizon” games in which play can go on for ever. Payoffs are then determined by the entire history of play. For “finite horizon” games, where tree T is finite, it suffices to associate payoffs to the leaves, i.e., ui : LT → R.
explanation and comments
Question: Why associate payoffs to “plays” rather than to leaves at the “end” of play?
Answer: We in general allow infinite trees. We will later consider “infinite horizon” games in which play can go on for ever. Payoffs are then determined by the entire history of play. For “finite horizon” games, where tree T is finite, it suffices to associate payoffs to the leaves, i.e., ui : LT → R.
We defined our alphabet of possible actions Σ to be finite, which is generally sufficient for our purposes. In other words, the tree is finitely branching. In more general settings, even the set of possible actions at a given node can be infinite.
explanation and comments
Question: Why associate payoffs to “plays” rather than to leaves at the “end” of play?
Answer: We in general allow infinite trees. We will later consider “infinite horizon” games in which play can go on for ever. Payoffs are then determined by the entire history of play. For “finite horizon” games, where tree T is finite, it suffices to associate payoffs to the leaves, i.e., ui : LT → R.
We defined our alphabet of possible actions Σ to be finite, which is generally sufficient for our purposes. In other words, the tree is finitely branching. In more general settings, even the set of possible actions at a given node can be infinite.
Later, we will focus on the following class of games: Definition An extensive form game G is called a game of perfect information, if every information set Infoi,j contains only 1 node.
pure strategies
A pure strategy si for player i in an extensive game G is a function si : Pli → Σ that assigns actions to each of player i’s nodes, such that si(w) ∈ Act(w), & such that if
w,w′ ∈ Infoi,j, then si(w) = si(w′).
Let Si be the set of pure strategies for player i.
Given pure profile s = (s1, . . . , sn) ∈ S1 × . . . × Sn,
if there are no chance nodes (i.e., Pl0 = ∅) then s uniquely determines a play πs of the game: players move according their strategies:
Initialize j := 0, and w0 := ε;
While (wj is not at a terminal node)
If wj ∈ Pli, then wj+1 := wj si(wj);
j := j + 1; πs =w0,w1,…
What if there are chance nodes?
pure strategies and chance
If there are chance nodes, then s ∈ S determines a
probability distribution over plays π of the game.
For finite extensive games, where T is finite, we can calculate
the probability ps (π) of play π, using probabilities qw (a):
Suppose π = w0,…,wm, is a play of T. Suppose further that
for each j < m, if wj ∈ Pli, then wj+1 = wj si(wj). Otherwise,
let ps(π) = 0.
Let wj1,...,wjr be the chance nodes in π, and suppose, for
each k = 1,...,r, wjk+1 = wjk ajk , i.e., the required action to
get from node wjk to node wjk +1 is ajk . Then r
ps (π) := qwjk (ajk ) k=1
For infinite extensive games, defining these distributions in general requires much more elaborate definitions (proper “measure theoretic” probability). We will avoid the heavy stuff.
chance and expected payoffs
For a finite extensive game, given pure profile
s = (s1, . . . , sn) ∈ S1 × . . . × Sn, we can, define the “expected payoff” for player i under s as:
hi(s):= ps(π)∗ui(π) π∈Ψt
Again, for infinite games, much more elaborate definitions of “expected payoffs” would be required.
Note: This “expected payoff” does not arise because any player is mixing its strategies. It arises because the game itself contains randomness.
We can also combine both: players may also randomize amongst their strategies, and we could then define the overall expected payoff.
from strategic to extensive games
Every finite strategic game Γ can be encoded easily and concisely as an extensive game GΓ. We illustrate this via the Rock-Paper-Scissor 2-player game (the n-player case is an easy
generalization):
"information set"
Player II:
from extensive to strategic games
Every extensive game G can be viewed as a strategic game ΓG: In ΓG , the strategies for player i are the mappings si ∈ Si . In ΓG, we define payoff ui(s) := hi(s), for all pure profiles s. If the extensive game G is finite, i.e., tree T is finite, then the strategic game ΓG is also finite.
Thus, all the theory we developed for finite strategic games also applies to finite extensive games.
Unfortunately, the strategic game ΓG is generally exponentially bigger than G. Note that the number of pure strategies for a player i with |Pli | = m nodes in the tree, is in the worst case |Σ|m.
So it is often unwise to naively translate a game from extensive to strategic form in order to “solve” it.
If we can find a way to avoid this blow-up, we should.
imperfect information & “perfect recall”
An extensive form game (EFG) is a game of imperfect information if it has non-trivial (size > 1) information sets. Players don’t have full knowledge of the current “state” (current node of the game tree).
Informally, an imperfect information EFG has perfect recall if each player i never “forgets” its own sequence of prior actions and information sets. I.e., a EFG has perfect recall if whenever w,w′ ∈ Infoi,j belong to the same information set, then the “visible history” for player i (sequence of information sets and actions of player i during the play) prior to hitting node w and w′ must be exactly the same.
[Kuhn’53]: with perfect recall it suffices to restrict players’ strategies to behavior strategies: strategies that only randomize (independently) on actions at each information set. Perfect recall is often assumed as a “sanity condition” for EFGs (most games we encounter do have perfect recall).
subgames and (subgame) perfection
A subgame of an extensive form game is any subtree of the game tree which has self-contained information sets. (I.e., every node in that subtree must be contained in an information set that is itself entirely contained in that subtree.)
For an extensive form game G, a profile of behavior strategies b = (b1 , . . . , bn ) for the players is called a subgame perfect equilibrium (SGPE) if it defines a Nash equilibrium for every subgame of G.
[Selten’75]: Nash equilibrium (NE) (and even SPGE) is inadequately refined as a solution concept for extensive form games. In particular, such equilibria can involve “Non-credible threats”:
Addressing this general inadequecy of NE and SGPE requires a more refined notion of equilibrium called trembling-hand perfect equilibrium [Selten’73].
solving games of imperfect info.
For EFGs with perfect recall there are ways to avoid the exponential blow-up of converting to normal form. We only briefly mention algorithms for imp-inf games.
(See, e.g., [Koller-Megiddo-von Stengel’94].)
In strategic form 2-player zero-sum games we can find minimax solutions efficiently (P-time) via LP. For 2-player zero-sum extensive imp-info games (without perfect recall), finding a minimax solution is NP-hard. NE’s of 2-player EFGs can be found in exponential time.
The situation is better with perfect recall: 2-player zero-sum imp-info games of perfect recall can be solved in P-time, via LP, and 2-player NE’s for arbitrary perfect recall games can be found in exponential time using a Lemke-type algorithm.
[Etessami’2014]: For EFGs with ≥ 3 players with perfect recall, computing refinements of Nash equilibrium (including “trembling-hand perfect” and “quasi-perfect”) can be reduced to computing a NE for a 3-player normal form game.
Our main focus will be games of perfect information. There the situation is much easier.
games of perfect information
A game of perfect information has only 1 node per information set. So, for these we can forget about information sets. Examples: Chess, Backgammon, . . .
counter-Examples: Poker, Bridge, . . .
Theorem([Kuhn’53]) Every finite extensive game of perfect information, G, has a NE (in fact a SGPE) in pure strategies. In other words, there is a pure profile (s1,…,sn) ∈ S that is a (and a subgame perfect equilibrium).
Our proof provides an efficient algorithm to compute such a pure profile, given G, using “backward induction”.
A special case of this theorem says the following: Proposition([Zermelo’1912]) In Chess, either:
1. White has a “winning strategy”, or
2.Black has a “winning strategy”, or
3.Both players have strategies to force a draw.
Next time, perfect information games.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com