程序代写 Algorithmic Game Theory and Applications

Algorithmic Game Theory and Applications
Lecture 13:
Parity Games and Mean-Payoff Games

Copyright By PowCoder代写 加微信 powcoder

AGTA: Lecture 13

G = (V0,V1,E,λ)
What are Parity Games?
Parity games are a class of h.o. win-lose games on a
• players: P0 (Even) and P1 (Odd)
• game graph: (V, E), where V = V0 ∪ V1
• priority function: λ: V →N
value λ(v) is called a priority of vertex v
• Notation: 􏰐 ∈ V0,􏰑 ∈ V1
What is the winning condition?
• play: π: v1v2 …
• λ(π) = λ(v1)λ(v2) . . .
• The top priority of a play:
Top(π) = max{p|p appears infinitely often in λ(π)}
• a play π is winning for player P0 if Top(π) is
even, and for player P1 if Top(π) is odd.
In other words, player P0 (P1) wins play π if the highest priority appearing infinitely often in λ(π) is even (odd).
AGTA: Lecture 13

Why are parity games interesting?
• Every h.o. win-lose game (i.e., every Muller game) can be converted to an “equivalent” parity game.
This can be done by attaching some data called a Latest Appearance Record to the current state (keeping track of some things about the history of the play so far).
However the translation will cause worst-case exponential blow-up in the size of the game graph.
• Several important problems in automated verification (model checking) boil down to solving a parity game. (μ-calculus Model checking.)
• Special complexity-theoretic status:
deciding which player has a winning strategy is known to be in NP ∩ co-NP, but not known to be in polynomial time.
AGTA: Lecture 13

More on parity games
• infinite duration games on graphs
• win-lose games of perfect information
• history oblivious, but not finitistic.
• but still, as we shall see, memorylessly
determined.
For a parity game graph G, we let Gv denote the
corresponding parity game that starts in state v.
We want to decide which player has a winning strategy, in Gv. (And to compute such a winning strategy.)
These win-lose games are determined (i.e., one player or the other has a winning strategy), because they can easily be seen to be particular examples of Borel games.
AGTA: Lecture 13

Memoryless determinacy
Main Theorem Parity games are memorylessly
determined.
This will give us a simple exponential-time algorithm to solve parity games: for every memoryless strategy σ of P0, try it against every memoryless strategy τ of player 1, and see if it beats each such strategy.
Proof. There are several different proofs of this theorem. The following is based on [Ehrenfeucht, Mycielski’73].
Finite duration parity game (FPG), G ̃v is defined and played as a normal PG, Gv, except that the play stops once some vertex w is revisited. The winner depends on the maximum priority p on the loop from w to w. If p is even, P0 wins, otherwise P1 wins.
(FPGdeterminacy)TheFPGgameG ̃ isdetermined v
(i.e., one player or the other has a winning strategy).
Follows easily from the Zermelo’s theorem: every finite zero-sum game of perfect information is determined.
AGTA: Lecture 13

Finite duration parity game with special node z (FPGz): Given Gv, and another vertex z, the game G ̃v,z, with “special” vertex z, is just like the FPG game except that the first time vertex z is visited, the “history is erased”.
The winner depends again on the maximum priority p on the first “loop” created, but now if z is visited then this “loop” might be arrived at “later”, because we ignore the history before z was visited.
Clearly, game G ̃v,z is also a finite zero-sum game, and thus determined, meaning one player or the other has a winning strategy.
Furthermore, if z belongs to Pi, then clearly any optimal strategy for Pi is memoryless from the vertex z, because the vertex is only encountered at most ONCE, and “history” is ignored if it is encountered.
Theorem The PG game Gv, its FPG versions G ̃v, and its FPGz version G ̃v,z, all have the same “value” (i.e., the same player has a winning strategy), for all vertices v and z.
Furthermore, in all these games both players have optimal memoryless strategies.
AGTA: Lecture 13

Proof of memoryless determinacy
First, suppose that a strategy σ ̃ is winning for player P0 in the game G ̃v. From this, we construct a strategy σ that is winning for P0 in Gv as follows: In σ, play just like in σ ̃ until a loop Lk is created. Then, erase the loop from the history, and continue playing according to σ ̃ from the last visited state of the loop (but without the loop as part of history).
”Set aside the loop, Lk, let k := k + 1.
S i n c e σ ̃ i s w i n n i n g i n G ̃ v i t m e a n s t h a t e v e r y created “loop” L1, L2, . . . will have top priority even. Therefore, for the entire infinite play in Gv the top priority occuring infinitely often is also even. (Because, the top priority occuring infinitely often must have the top priority in infinitely many such loops Lk.)
There was nothing special about player P0. We could have done the same arguement for player P1.
Therefore, the same player has a winning strategy in G v a n d i n G ̃ v .
AGTA: Lecture 13

7 Next (Key Point), we claim that the same player
has a winning strategy in Gv and G ̃v,z, for any z.
The reasoning is entirely the same: let σ ̃ be a winning strategy for player P0 in G ̃v,z, then we construct from it a winning strategy σ for P0 in Gv by erasing loops from the “history”, except that we treat the first occurence of z as erasing all history (not just the loop history).
But note: Pi playing out of z has a optimal memoryless move out of z in G ̃v,z. In other words,
we can rephrase G ̃v,z, as the new game G′v where we fix the edge out of z to be an optimal memoryless edge.
We have thus created a new game, G′v, with one fewer vertex for player Pi which has more than one outedge, and which has the same value as the game G ̃v,z (and thus Gv), and moreover, such that an optimal strategy in G′v yields an optimal strategy in Gv.
Now, by induction, we can find a game G′, which has no vertex of Pi with more than one outedge, and the (optimal) memoryless strategy for Pi in G′ yields an optimal memoryless strategy for Pi in Gv.
AGTA: Lecture 13

Solving 1-player parity games in P-time
Fact Let G be a 1-player parity game (no vertices V1 belonging to P1). Then P0 has a winning strategy starting at a vertex v in G iff there is a cycle reachable from v in which the top priority is even.
Proof: easy: if such a cycle exists, P0 can force it and win. If not, P0 can’t win with any memoryless strategy (any such strategy creates one cycle).
Deciding who has a winning strategy in a 1-player parity game can be solved efficiently in P-time.
We need to check whether there is a cycle with top priority even reachable from v:
1. If the current top priority p′ labeling any node is odd, eliminate nodes labeled p′ the graph.
2. otherwise (top priority p is even), check if ∃ node u labeled p reachable from start v, such that u is on a cycle.
If so, HALT: OUTPUT “P0 has a winning strat.”. If not, eliminate nodes labeled p from the graph.
Until (no nodes remain)
HALT: “P0 does not have a winning strat.”.
AGTA: Lecture 13

Implications for complexity of 2-player Parity Games
Now, since we know that we can solve 1-player parity games efficiently, in polynomial time, then to solve a 2-player parity game we can use the following approach:
1. “Nondeterministically guess” a memoryless optimal strategy σ′ for one of the two players (say, for player P1).
2. Check if in the remaining 1-player parity game player P0 has a winning stretegy. If not, then (since the game is determined), that means σ′ is a winning stretegy for player P1.
Of course, we could have done the same with the roles of P0 and P1 reversed.
But how do we “nondeterminstically guess” a memoryless strategy?? There are exponentially many (2m) such strategies for player P0 (exponential in the number m of nodes controlled by player P0), even if we only have 2 choices at each node.
AGTA: Lecture 13

We can’t do this deterministically in polynomial time, but we can do it in Nondeterministic polynomial time, and that is what the complexity class NP. (And coNP is the class of decision problems whose complement is in NP.)
Since we can do this for either player, this shows that the problem of deciding whether a player, say P0, has a winning strategy in a given parity game is in the complexity class NP ∩ coNP.
(N.B. We will later learn that this problem is actually also reducible to computing a Nash equilibrium in a 2-player normal form game.)
AGTA: Lecture 13

Mean-Payoff Games
Every parity game can be reduced to a mean payoff game.
Zero-sum game.
Two players: I (Max) and II (Min)
Payoff (utility) function: u : V 􏰎→ N for player I. Node v ∈ V has payoff u(v) ∈ N for player I (Max).
AGTA: Lecture 13

Player I wants to maximize the following
limiting average (mean) payoff
in an infinite play π = v1v2v3 . . .: Maximize: lim inf 􏰂ni=1 u(vi)
Thm: Mean-payoff games are memoryless determined (i.e., both players have deterministic memoryless strategies that achieve the value of the game.)
Proof: Basically the same proof as for parity games works.
The only modification needed is that in “Finite Mean- Payoff Games”, we calculate the mean payoff of the finite play to be the average payoff of those nodes on the single loop that was created when the game finished.
In the finite mean-payoff game with a special node z, we treat z in exactly the same way as before: it “erases” history up to that point.
Everything then in the argument works: we can show that whatever mean-payoff either player, Pi, can force
AGTA: Lecture 13

in the finite game can also be forced in the infinite game. We do this by taking the average of the infinitely many “optimal loops” that will be created by the optimal strategy in the finite game when it is translated to the infinite mean-payoff game in a way analogous to the way we went from a strategy for finite parity games to infinite parity game.
In fact, this proof was originally devised for mean payoff games by Ehrenfeucht and Mycielski (1973).
For both parity games and mean-payoff games, we do not know a worst-case efficient (polynomial time) algorithm. But we know that the relevant decision problems are in NP ∩ coNP.
(1-player Mean-payoff games can be solved in polynomial time by an algorithm due to Karp (1978) for finding the minimum mean-weight cycle value in a directed graph.)
AGTA: Lecture 13

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com