CS计算机代考程序代写 Lecture 6: Dynamic games with complete information

Lecture 6: Dynamic games with complete information

III

1 Last time

Finitely repeated games:

• If G has a unique NE then, in the �nitely repeated game G(T), this NE is played in
every stage game.

• Today we show that this is not true if G may be repeated in�nitely.

• Recall that if G has multiple NE then, in G(T), players can play strategies which are
not NE strategies of G for all t < T . The non-stage-game-NE strategies are supported in the SPE of the repeated game if players anticipate that these strategies will induce the better equilibrium in the end game. • We're using a similar intuition for the in�nitely repeated games. 2 In�nitely repeated games 2.1 De�nition and discount factor/probability of continuation Example 1. Suppose the following prisoners' dilemma game is repeated in�nitely Prisoner 2 Confess Not Confess Prisoner 1 Confess 1, 1 5, 0 Not Confess 0, 5 4, 4 • Bad idea to assume that total payo� = sum of stage game payo�s: �get 1 every period� and �get 4 every period� have a total sum of ∞ • We discount future payo� by a factor of δ ∈ (0, 1) and calculate the present value of the in�nite game 1 De�nition 1. Given the discount factor δ ∈ (0, 1), the present value of the in�nite sequence of payo�s π1, π2, π3... is π1 + δπ2 + δ 2π3 + ... = ∞∑ t=1 δt−1πt • Two applications of the discount factor: � Interest loss: suppose the interest rate per period is r, then $1 tomorrow is equiv- alent to $ 1 1+r today. In this case, δ = 1 1+r . � Uncertainty of game continuation: Suppose that after each stage is played, there is a chance that the game will end immediately with probability 1 − δ. The expected payo� from the next stage is only δπ. Remark 1. Calculate the value of a geometric series When δ ∈ (0, 1), π + δπ + δ2π + δ3π + ... = π 1− δ For example, if both prisoner play �Confess� in every stage game and the discount factor is 0.9, then the present value of the in�nite game for both prisoners is 1 1− 0.9 = 10 2.2 Strategy De�nition 2. In the �nitely repeated game G(T) or the in�nitely repeated game G(∞, δ), a player's strategy speci�es the action the player will take in each stage, for each possible history of play through the previous stage. E.g. in the two-stage prisoners' dilemma, each player's strategy speci�es 5 elements: 1. Action in stage 1 2. Action in stage 2 if outcome of stage 1 is (C, C) 3. Action in stage 2 if outcome of stage 1 is (C, NC) 4. Action in stage 2 if outcome of stage 1 is (NC, C) 5. Action in stage 2 if outcome of stage 1 is (NC, NC) Example 2. trigger strategy in in�nitely repeated prisoners' dilemma Play �Not Confess� in the �rst stage. In the tth stage, if the outcome of all t− 1 preceding stages has been (Not Confess, Not Confess) then play �Not Confess�; otherwise, play �Confess� Comment: • We will show that this is a SPE of the in�nitely repeated prisoners' dilemma for high δ • We know SPE = players' strategies constitute a NE in every subgame • But what is a subgame in an in�nitely repeated game? 2 2.3 Subgame perfect equilibrium De�nition 3. In a repeated game, there is one subgame beginning at stage t+ 1 for each of the possible histories of play through stage t. Example 3. In the two-stage PD, there are four subgames corresponding to the second-stage games that follow the 4 possible �rst-stage outcomes. In the in�nitely repeated prisoners' dilemma, the �future� in any subgame is the same: play the prisoners' dilemma in�nitely many times. Therefore, a subgame is identi�ed by its history. Two di�erent subgames = two subgames with di�erent histories. Therefore, SPE in a repeated game means De�nition 4. A Nash equilibrium of a repeated game is subgame-perfect if the players' strategies constitute a Nash equilibrium in every subgame following any history of play. • But even a 5-stage prisoners' dilemma game has 1 + 4 + 42 + 43 + 44 = 341 subgames, and an in�nitely repeated game has in�nitely many subgames! Plus, each subgame has in�nitely many possible deviations! How do we check SPE? • One shot deviation principle! (not in textbook but very important) Theorem 1. In a �nite repeated game G(T) or an in�nitely repeated game G(∞, δ), strategy pro�le s is subgame perfect if and only if it satis�es the one-shot deviation condition: No player can gain by deviation from s in a single stage and conforming to s thereafter. Proof. DRAW DIAGRAM TO ILLUSTRATE �Only if�: easy. Directly from the de�nition of SPE - if SPE then players cannot gain from any deviation. �If�: to see why the one-shot deviation principle is su�cient, let's prove by contradiction: • Suppose s satis�es the one-shot deviation condition but is not SPE • Then following some history h(t), there is a pro�table deviation ŝ for �nitely many stages t, t+ 1, ..., t+K � Why is this true for an in�nite game? Because the discount factor δ < 1, the present value of a distant tail in an in�nitely repeated game is close to zero. Therefore, if there is a pro�table deviation for in�nitely many stages, we can cut the distant tail of this deviation, and there will still be a pro�table deviation for �nitely many periods. • Construct an alternative deviation s̃ which lasts one fewer stage, i.e., s̃ = ŝ except in stage t+K, in which s̃ = s. • At any history in stage t+K, � s̃ = s from stage t+K� + � ŝ only di�ers from s by one period� + �s satis�es the one-shot deviation condition� ⇒ s̃ must be at least as good as ŝ from stage t+K 3 • i.e. If it is pro�table to deviate for K periods, it must be pro�table to deviate for K−1 periods • By induction, if it is pro�table to deviate for K-1 periods, it must be pro�table to deviate for K-2 periods, for K-3 periods, ... for 2 periods, for 1 period. • But a pro�table 1-period deviation violates the assumption that s satis�es the one-shot deviation condition! • We have a contradiction. => If s satis�es the one-shot deviation, it must be a SPE!

• This principle is fundamental to the theory of dynamic games. It not only applies to
repeated games but also to multi-stage games generally.

Now let’s �nd out when the trigger strategy is SPE

Example 4. apply one-shot deviation principle: trigger strategy with in�nite-stage penalty

Prisoner 2
Confess Not Confess

Prisoner 1
Confess 1, 1 5, 0

Not Confess 0, 5 4, 4

Recall: trigger strategy
Play �Not Confess� in the �rst stage.
In the tth stage, if the outcome of all t− 1 preceding stages has been (Not Confess, Not

Confess) then play (Not Confess); otherwise, play �Confess�
Now let’s �nd the condition on δ so that the trigger strategy is SPE

• Need to check: no pro�table one-shot deviation for any subgame

� all subgames can be separate into two classes:

1. Some prisoner played C in the past.

2. No prisoner has played C yet.

� Within the same class, trigger strategy orders the same sequence of actions in
the future, so it su�ces to check pro�table deviation from the trigger strategy in
these two classes.

1. Suppose some prisoner played C in the past so that, according to the trigger
strategy, both prisoners always confess in future stages.
Given the opponent plays C, deviating to NC is never pro�table. [X]

2. Suppose neither prisoner played C in the past.

∗ If both players stick to the trigger strategy, they will never confess and
the present value of the total payo� is

4 + 4δ + 4δ2 + 4δ3 + … =
4

1− δ

4

∗ The best one-shot deviation for prisoner 1 is to play C for one stage and
then revert to the trigger strategy afterwards. This leads to (C, NC) for
one stage and (C, C) thereafter. The present value for player 1 is

5 + δ + δ2 + δ3 + … = 5 +
δ

1− δ

∗ In order to prevent pro�table one-shot deviation, must have

4

1− δ
≥ 5 +

δ

1− δ

4 ≥ 5− 5δ + δ

4δ ≥ 1

δ ≥
1

4

� Conclude: the trigger strategy constitutes a SPE if and only if δ ≥ 1
4
.

� The average payo� for the prisoners in this SPE is 4. The present value at t = 1
is 4

1−δ .

• Question: is trigger strategy the only strategy that constitutes a subgame perfect
equilibrium?

Answer: No. Here’s another strategy that also constitutes a subgame perfect equilib-
rium.

Example 5. trigger strategy with 1-period punishment
In in�nitely repeated prisoners’ dilemma

Prisoner 2
Confess Not Confess

Prisoner 1
Confess 1, 1 5, 0

Not Confess 0, 5 4, 4
Play �Not Confess� in the �rst stage.
If some player deviated to �Confess� in the last period when he is supposed to play �Not

Confess�, then play �Confess� for 1 period and then restart the game. I.e., after playing
�Confess� for one stage, play �Not Confess� in the next stage as if it is the �rst stage of the
game. Otherwise, play �Not Confess�.

Let’s check this strategy constitutes a subgame perfect equilibrium. The �rst step is to
categorize the in�nitely many subgames into a few classes.

Notice that according to this strategy, the game has two phases:

1. Cooperation phase: either both players have always been playing NC or both players
have restarted the game after completing a one-stage punishment phase

2. Punishment phase: a deviation in the last period triggered the punishment this period

5

Therefore, it is su�cient to check one-shot deviations in two types of subgames: (a) a
subgame that starts with a cooperation phase, and (b) a subgame that starts with the
punishment phase.

(a) Check deviation from a subgame that starts with a cooperation phase:

• Given that player 2 is playing the 1-period trigger strategy, if player 1 also plays the
trigger strategy then the outcome path is (NC, NC) forever and the present value is
4 + 4δ + 4δ2 + 4δ3 + … = 4

1−δ .

• Given that player 2 is playing the 1-period trigger strategy, if player 1 deviates to C
for one period and then reverts back to the 1-period trigger strategy, then the outcome
path is (C, NC), (C, C), (NC, NC), (NC, NC), (NC, NC)… and the present value of
this outcome sequence is

5 + δ + 4δ2 + 4δ3 + 4δ4 + …

• This deviation is not pro�table when

4 + 4δ + 4δ2 + 4δ3 + … ≥ 5 + δ + 4δ2 + 4δ3 + 4δ4 + …

⇒ 4 + 4δ ≥ 5 + δ

⇒ δ ≥
1

3

(b) Check deviation from a subgame that starts with a punishment phase:

• In the punishment phase, given that player 2 is playing the 1-period trigger strategy, if
player 1 also plays the 1-period trigger strategy then the outcome path is (C, C), (NC,
NC), (NC, NC), (NC, NC), … and the present value from the outcome sequence is

1 + 4δ + 4δ2 + 4δ3 + 4δ4…

• Given that player 2 is playing the 1-perood trigger strategy, if player 1 deviates to NC
for one period and then reverts back to the 1-period trigger strategy, then the outcome
path is (NC, C), (NC, NC), (NC, NC), (NC, NC), … and the present value from the
outcome sequence is

0 + 4δ + 4δ2 + 4δ3 + 4δ4…

• This deviation is never pro�table for any δ ∈ (0, 1).

(a)+(b): the trigger strategy with 1-period punishment is subgame perfect if and only if
δ ≥ 1

3
.

Remark 2. Compare the two trigger strategies

• Trigger strategy with in�nite-period punishment: SPE i�. δ ≥ 1
4
.

• Trigger strategy with 1-period punishment: SPE i�. δ ≥ 1
3
.

• Conclusion: strategy with heavier penalty => easier to sustain cooperation

6

2.4 Folk theorem

• Note that the strategy �always play confess in every stage� is also a SPE: given that
the opponent always confesses, it’s never pro�table to deviate to �Not Confess�.

• The average payo� for the prisoners in this SPE is 1. The present value at t = 1 is
1

1−δ .

• We can express the average payo� of a game even when the stage payo� is not constant:
�rst calculate the present value of the game, V. Let the average payo� be π. If π is
received everyday, then the present value is π

1−δ . Hence, the average value π = (1−δ)V .

De�nition 5. Given the discount factor δ, the average payo� of the in�nite sequence of
payo�s π1, π2, π3… is (1− δ) · (present value), i.e.

(1− δ)
∞∑
t=1

δt−1πt

• In in�nite PD, 1 and 4 can both be achieved as average payo�s in some SPE.

• Can we de�ne the range of average payo�s in all SPE of an in�nitely repeated game?

1. What are the feasible payo�s of a stage game?

• Answer: all convex combinations of the payo� pairs
Example: Prisoners’ dilemma

2. What’s the lowest possible average payo� one can get in a stage game?

• Answer: the minmax value
� suppose your opponent is your evil archenemy: no matter what strategy you
choose, he always chooses the strategy to minimize your payo�.

� Play your best response to this archenemy

� The payo� you get is called the �minmax� value. This is the lowest possible
payo� for any rational individual in a stage game.

7

� For example
archenemy
q 1− q

Stag Hare

you
Stag 2, 2 0, 1
Hare 1, 0 1, 1

EU(Stag) = 2q
EU(Hare) = 1
Your best response function is

∗ Stag if q > 1
2
==> payo� = 2q > 1

∗ randomize between Stag and Hare in any probability if q = 1
2
==> payo�

= 1

∗ Hare if q < 1 2 ==> payo� =1

Knowing your best response function, your archenemy picks q ≤ 1
2
to mini-

mize your payo�.
This results in your minmax payo� : 1

Theorem 2. [Fudenberg and Maskin, 1986]
In a in�nitely repeated game with two players, suppose that, in the stage game G, the

minmax payo�s for each player are v1 and v2. Moreover, let V denote the set of all feasible
payo�s of G. Then,

for all feasible payo�s (v1, v2) ∈ V such that v1 > v1 and v2 > v2, if δ is su�ciently
close to 1, then the in�nitely repeated game G(∞, δ) has a subgame perfect equilibrium that
achieves (v1, v2) as the average payo� .

Remark 3. Why is the minmax payo� important?
Because we can use it to construct a credibly heavy penalty for deviation!

8

�If you don’t play the target strategy, I will be your archenemy and make you earn your
minmax forever.�

3 Practice problems

1. Find the minmax value for player 1 in the following game

Player 2
L R

Player 1
U -2, 2 1, -2
M 1, -2 -2, 2
D 0, 1 0, 1

2. Let G be the stage game represented below:

Player 2
L M R

Player 1
u 2, 1 3, 1 6, 5
m 4, 2 6, 0 4, 1
d 6, 4 3, 3 3, 1

Consider three possible outcomes of the in�nitely repeated game G(∞, δ):

outcome 1: (m, L), (u, M), (m, L), (u, M), (m, L), (u, M), …

outcome 2: (u, M), (m, L), (u, M), (m, L), (u, M), (m, L), …

outcome 3: (d, R), (u, R), (d, R), (u, R), (d, R), (u, R), …

For each δ ∈ (0, 1) �nd which outcome each player prefers.

3. Alternative trigger strategies

Suppose that the following prisoners’ dilemma is repeated in�nitely.

Prisoner 2
Confess Not Confess

Prisoner 1
Confess 1, 1 5, 0

Not Confess 0, 5 4, 4

Check whether each of the following strategies can constitute a subgame perfect Nash
equilibrium. If so, specify the range of discount factor δ for which the strategy is SPE.
If not, explain why.

9

(a) s1: Play �Not Confess� in the �rst stage. Also play �Not Confess� if the opponent
has played �Not Confess� in every previous period. Otherwise, play �Confess�.
(This is like the trigger strategy, but the punishment is not triggered by own
deviation.)

(b) s2 : Play �Not Confess� in the �rst stage. Also play �Not Confess� if both players
chose �Not Confess� in every previous period except for the last period. Choose
�Confess� otherwise. (This is like the trigger strategy, but the punishment is
delayed by one period.)

(c) s3 : Play �Not Confess� in the �rst stage. Also play �Not Confess� if the opponent
chose �Confess� in at most one period. Choose �Confess� otherwise. (This is
like s1 , but one mistake is forgiven.)

10