Lecture 6: Dynamic games with complete information
III
1 Last time
Finitely repeated games:
• If G has a unique NE then, in the �nitely repeated game G(T), this NE is played in
every stage game.
• Today we show that this is not true if G may be repeated in�nitely.
• Recall that if G has multiple NE then, in G(T), players can play strategies which are
not NE strategies of G for all t < T . The non-stage-game-NE strategies are supported
in the SPE of the repeated game if players anticipate that these strategies will induce
the better equilibrium in the end game.
• We're using a similar intuition for the in�nitely repeated games.
2 In�nitely repeated games
2.1 De�nition and discount factor/probability of continuation
Example 1. Suppose the following prisoners' dilemma game is repeated in�nitely
Prisoner 2
Confess Not Confess
Prisoner 1
Confess 1, 1 5, 0
Not Confess 0, 5 4, 4
• Bad idea to assume that total payo� = sum of stage game payo�s: �get 1 every period�
and �get 4 every period� have a total sum of ∞
• We discount future payo� by a factor of δ ∈ (0, 1) and calculate the present value of
the in�nite game
1
De�nition 1. Given the discount factor δ ∈ (0, 1), the present value of the in�nite sequence
of payo�s π1, π2, π3... is
π1 + δπ2 + δ
2π3 + ... =
∞∑
t=1
δt−1πt
• Two applications of the discount factor:
� Interest loss: suppose the interest rate per period is r, then $1 tomorrow is equiv-
alent to $ 1
1+r
today. In this case, δ = 1
1+r
.
� Uncertainty of game continuation: Suppose that after each stage is played, there
is a chance that the game will end immediately with probability 1 − δ. The
expected payo� from the next stage is only δπ.
Remark 1. Calculate the value of a geometric series
When δ ∈ (0, 1),
π + δπ + δ2π + δ3π + ... =
π
1− δ
For example, if both prisoner play �Confess� in every stage game and the discount factor
is 0.9, then the present value of the in�nite game for both prisoners is
1
1− 0.9
= 10
2.2 Strategy
De�nition 2. In the �nitely repeated game G(T) or the in�nitely repeated game G(∞, δ),
a player's strategy speci�es the action the player will take in each stage, for each possible
history of play through the previous stage.
E.g. in the two-stage prisoners' dilemma, each player's strategy speci�es 5 elements:
1. Action in stage 1
2. Action in stage 2 if outcome of stage 1 is (C, C)
3. Action in stage 2 if outcome of stage 1 is (C, NC)
4. Action in stage 2 if outcome of stage 1 is (NC, C)
5. Action in stage 2 if outcome of stage 1 is (NC, NC)
Example 2. trigger strategy in in�nitely repeated prisoners' dilemma
Play �Not Confess� in the �rst stage.
In the tth stage, if the outcome of all t− 1 preceding stages has been (Not Confess, Not
Confess) then play �Not Confess�; otherwise, play �Confess�
Comment:
• We will show that this is a SPE of the in�nitely repeated prisoners' dilemma for high
δ
• We know SPE = players' strategies constitute a NE in every subgame
• But what is a subgame in an in�nitely repeated game?
2
2.3 Subgame perfect equilibrium
De�nition 3. In a repeated game, there is one subgame beginning at stage t+ 1 for each
of the possible histories of play through stage t.
Example 3. In the two-stage PD, there are four subgames corresponding to the second-stage
games that follow the 4 possible �rst-stage outcomes.
In the in�nitely repeated prisoners' dilemma, the �future� in any subgame is the same:
play the prisoners' dilemma in�nitely many times. Therefore, a subgame is identi�ed by its
history. Two di�erent subgames = two subgames with di�erent histories.
Therefore, SPE in a repeated game means
De�nition 4. A Nash equilibrium of a repeated game is subgame-perfect if the players'
strategies constitute a Nash equilibrium in every subgame following any history of play.
• But even a 5-stage prisoners' dilemma game has 1 + 4 + 42 + 43 + 44 = 341 subgames,
and an in�nitely repeated game has in�nitely many subgames! Plus, each subgame
has in�nitely many possible deviations! How do we check SPE?
• One shot deviation principle! (not in textbook but very important)
Theorem 1. In a �nite repeated game G(T) or an in�nitely repeated game G(∞, δ), strategy
pro�le s is subgame perfect if and only if it satis�es the one-shot deviation condition:
No player can gain by deviation from s in a single stage and conforming to s thereafter.
Proof. DRAW DIAGRAM TO ILLUSTRATE
�Only if�: easy. Directly from the de�nition of SPE - if SPE then players cannot gain
from any deviation.
�If�: to see why the one-shot deviation principle is su�cient, let's prove by contradiction:
• Suppose s satis�es the one-shot deviation condition but is not SPE
• Then following some history h(t), there is a pro�table deviation ŝ for �nitely many
stages t, t+ 1, ..., t+K
� Why is this true for an in�nite game? Because the discount factor δ < 1, the
present value of a distant tail in an in�nitely repeated game is close to zero.
Therefore, if there is a pro�table deviation for in�nitely many stages, we can cut
the distant tail of this deviation, and there will still be a pro�table deviation for
�nitely many periods.
• Construct an alternative deviation s̃ which lasts one fewer stage, i.e., s̃ = ŝ except in
stage t+K, in which s̃ = s.
• At any history in stage t+K, � s̃ = s from stage t+K� + � ŝ only di�ers from s by one
period� + �s satis�es the one-shot deviation condition� ⇒ s̃ must be at least as good
as ŝ from stage t+K
3
• i.e. If it is pro�table to deviate for K periods, it must be pro�table to deviate for K−1
periods
• By induction, if it is pro�table to deviate for K-1 periods, it must be pro�table to
deviate for K-2 periods, for K-3 periods, ... for 2 periods, for 1 period.
• But a pro�table 1-period deviation violates the assumption that s satis�es the one-shot
deviation condition!
• We have a contradiction. => If s satis�es the one-shot deviation, it must be a SPE!
• This principle is fundamental to the theory of dynamic games. It not only applies to
repeated games but also to multi-stage games generally.
Now let’s �nd out when the trigger strategy is SPE
Example 4. apply one-shot deviation principle: trigger strategy with in�nite-stage penalty
Prisoner 2
Confess Not Confess
Prisoner 1
Confess 1, 1 5, 0
Not Confess 0, 5 4, 4
Recall: trigger strategy
Play �Not Confess� in the �rst stage.
In the tth stage, if the outcome of all t− 1 preceding stages has been (Not Confess, Not
Confess) then play (Not Confess); otherwise, play �Confess�
Now let’s �nd the condition on δ so that the trigger strategy is SPE
• Need to check: no pro�table one-shot deviation for any subgame
� all subgames can be separate into two classes:
1. Some prisoner played C in the past.
2. No prisoner has played C yet.
� Within the same class, trigger strategy orders the same sequence of actions in
the future, so it su�ces to check pro�table deviation from the trigger strategy in
these two classes.
1. Suppose some prisoner played C in the past so that, according to the trigger
strategy, both prisoners always confess in future stages.
Given the opponent plays C, deviating to NC is never pro�table. [X]
2. Suppose neither prisoner played C in the past.
∗ If both players stick to the trigger strategy, they will never confess and
the present value of the total payo� is
4 + 4δ + 4δ2 + 4δ3 + … =
4
1− δ
4
∗ The best one-shot deviation for prisoner 1 is to play C for one stage and
then revert to the trigger strategy afterwards. This leads to (C, NC) for
one stage and (C, C) thereafter. The present value for player 1 is
5 + δ + δ2 + δ3 + … = 5 +
δ
1− δ
∗ In order to prevent pro�table one-shot deviation, must have
4
1− δ
≥ 5 +
δ
1− δ
4 ≥ 5− 5δ + δ
4δ ≥ 1
δ ≥
1
4
� Conclude: the trigger strategy constitutes a SPE if and only if δ ≥ 1
4
.
� The average payo� for the prisoners in this SPE is 4. The present value at t = 1
is 4
1−δ .
• Question: is trigger strategy the only strategy that constitutes a subgame perfect
equilibrium?
Answer: No. Here’s another strategy that also constitutes a subgame perfect equilib-
rium.
Example 5. trigger strategy with 1-period punishment
In in�nitely repeated prisoners’ dilemma
Prisoner 2
Confess Not Confess
Prisoner 1
Confess 1, 1 5, 0
Not Confess 0, 5 4, 4
Play �Not Confess� in the �rst stage.
If some player deviated to �Confess� in the last period when he is supposed to play �Not
Confess�, then play �Confess� for 1 period and then restart the game. I.e., after playing
�Confess� for one stage, play �Not Confess� in the next stage as if it is the �rst stage of the
game. Otherwise, play �Not Confess�.
Let’s check this strategy constitutes a subgame perfect equilibrium. The �rst step is to
categorize the in�nitely many subgames into a few classes.
Notice that according to this strategy, the game has two phases:
1. Cooperation phase: either both players have always been playing NC or both players
have restarted the game after completing a one-stage punishment phase
2. Punishment phase: a deviation in the last period triggered the punishment this period
5
Therefore, it is su�cient to check one-shot deviations in two types of subgames: (a) a
subgame that starts with a cooperation phase, and (b) a subgame that starts with the
punishment phase.
(a) Check deviation from a subgame that starts with a cooperation phase:
• Given that player 2 is playing the 1-period trigger strategy, if player 1 also plays the
trigger strategy then the outcome path is (NC, NC) forever and the present value is
4 + 4δ + 4δ2 + 4δ3 + … = 4
1−δ .
• Given that player 2 is playing the 1-period trigger strategy, if player 1 deviates to C
for one period and then reverts back to the 1-period trigger strategy, then the outcome
path is (C, NC), (C, C), (NC, NC), (NC, NC), (NC, NC)… and the present value of
this outcome sequence is
5 + δ + 4δ2 + 4δ3 + 4δ4 + …
• This deviation is not pro�table when
4 + 4δ + 4δ2 + 4δ3 + … ≥ 5 + δ + 4δ2 + 4δ3 + 4δ4 + …
⇒ 4 + 4δ ≥ 5 + δ
⇒ δ ≥
1
3
(b) Check deviation from a subgame that starts with a punishment phase:
• In the punishment phase, given that player 2 is playing the 1-period trigger strategy, if
player 1 also plays the 1-period trigger strategy then the outcome path is (C, C), (NC,
NC), (NC, NC), (NC, NC), … and the present value from the outcome sequence is
1 + 4δ + 4δ2 + 4δ3 + 4δ4…
• Given that player 2 is playing the 1-perood trigger strategy, if player 1 deviates to NC
for one period and then reverts back to the 1-period trigger strategy, then the outcome
path is (NC, C), (NC, NC), (NC, NC), (NC, NC), … and the present value from the
outcome sequence is
0 + 4δ + 4δ2 + 4δ3 + 4δ4…
• This deviation is never pro�table for any δ ∈ (0, 1).
(a)+(b): the trigger strategy with 1-period punishment is subgame perfect if and only if
δ ≥ 1
3
.
Remark 2. Compare the two trigger strategies
• Trigger strategy with in�nite-period punishment: SPE i�. δ ≥ 1
4
.
• Trigger strategy with 1-period punishment: SPE i�. δ ≥ 1
3
.
• Conclusion: strategy with heavier penalty => easier to sustain cooperation
6
2.4 Folk theorem
• Note that the strategy �always play confess in every stage� is also a SPE: given that
the opponent always confesses, it’s never pro�table to deviate to �Not Confess�.
• The average payo� for the prisoners in this SPE is 1. The present value at t = 1 is
1
1−δ .
• We can express the average payo� of a game even when the stage payo� is not constant:
�rst calculate the present value of the game, V. Let the average payo� be π. If π is
received everyday, then the present value is π
1−δ . Hence, the average value π = (1−δ)V .
De�nition 5. Given the discount factor δ, the average payo� of the in�nite sequence of
payo�s π1, π2, π3… is (1− δ) · (present value), i.e.
(1− δ)
∞∑
t=1
δt−1πt
• In in�nite PD, 1 and 4 can both be achieved as average payo�s in some SPE.
• Can we de�ne the range of average payo�s in all SPE of an in�nitely repeated game?
1. What are the feasible payo�s of a stage game?
• Answer: all convex combinations of the payo� pairs
Example: Prisoners’ dilemma
2. What’s the lowest possible average payo� one can get in a stage game?
• Answer: the minmax value
� suppose your opponent is your evil archenemy: no matter what strategy you
choose, he always chooses the strategy to minimize your payo�.
� Play your best response to this archenemy
� The payo� you get is called the �minmax� value. This is the lowest possible
payo� for any rational individual in a stage game.
7
� For example
archenemy
q 1− q
Stag Hare
you
Stag 2, 2 0, 1
Hare 1, 0 1, 1
EU(Stag) = 2q
EU(Hare) = 1
Your best response function is
∗ Stag if q > 1
2
==> payo� = 2q > 1
∗ randomize between Stag and Hare in any probability if q = 1
2
==> payo�
= 1
∗ Hare if q < 1 2 ==> payo� =1
Knowing your best response function, your archenemy picks q ≤ 1
2
to mini-
mize your payo�.
This results in your minmax payo� : 1
Theorem 2. [Fudenberg and Maskin, 1986]
In a in�nitely repeated game with two players, suppose that, in the stage game G, the
minmax payo�s for each player are v1 and v2. Moreover, let V denote the set of all feasible
payo�s of G. Then,
for all feasible payo�s (v1, v2) ∈ V such that v1 > v1 and v2 > v2, if δ is su�ciently
close to 1, then the in�nitely repeated game G(∞, δ) has a subgame perfect equilibrium that
achieves (v1, v2) as the average payo� .
Remark 3. Why is the minmax payo� important?
Because we can use it to construct a credibly heavy penalty for deviation!
8
�If you don’t play the target strategy, I will be your archenemy and make you earn your
minmax forever.�
3 Practice problems
1. Find the minmax value for player 1 in the following game
Player 2
L R
Player 1
U -2, 2 1, -2
M 1, -2 -2, 2
D 0, 1 0, 1
2. Let G be the stage game represented below:
Player 2
L M R
Player 1
u 2, 1 3, 1 6, 5
m 4, 2 6, 0 4, 1
d 6, 4 3, 3 3, 1
Consider three possible outcomes of the in�nitely repeated game G(∞, δ):
outcome 1: (m, L), (u, M), (m, L), (u, M), (m, L), (u, M), …
outcome 2: (u, M), (m, L), (u, M), (m, L), (u, M), (m, L), …
outcome 3: (d, R), (u, R), (d, R), (u, R), (d, R), (u, R), …
For each δ ∈ (0, 1) �nd which outcome each player prefers.
3. Alternative trigger strategies
Suppose that the following prisoners’ dilemma is repeated in�nitely.
Prisoner 2
Confess Not Confess
Prisoner 1
Confess 1, 1 5, 0
Not Confess 0, 5 4, 4
Check whether each of the following strategies can constitute a subgame perfect Nash
equilibrium. If so, specify the range of discount factor δ for which the strategy is SPE.
If not, explain why.
9
(a) s1: Play �Not Confess� in the �rst stage. Also play �Not Confess� if the opponent
has played �Not Confess� in every previous period. Otherwise, play �Confess�.
(This is like the trigger strategy, but the punishment is not triggered by own
deviation.)
(b) s2 : Play �Not Confess� in the �rst stage. Also play �Not Confess� if both players
chose �Not Confess� in every previous period except for the last period. Choose
�Confess� otherwise. (This is like the trigger strategy, but the punishment is
delayed by one period.)
(c) s3 : Play �Not Confess� in the �rst stage. Also play �Not Confess� if the opponent
chose �Confess� in at most one period. Choose �Confess� otherwise. (This is
like s1 , but one mistake is forgiven.)
10