程序代写 Computational Methods &

Computational Methods &
Game Theory Iterated Prisoner’s Dilemma
Dr Computer Science University of Auckland

Copyright By PowCoder代写 加微信 powcoder

What about computation?
In the last lecture we learned the basics of game theory.
– Pay-off matrices
– Pure strategies
– Mixed strategies
We also talked about the simplifying assumptions…
– Only two players
– Games are discrete events that take place once
– Decisions are discrete, either/or decisions instead
of ever-changing, etc.
Simplifying assumptions are great; they help to make models “analytically tractable” i.e. they make it possible to prove sweeping claims about their properties
– e.g. identify
But sometimes we want to consider the more complex cases. Computational models provide a way for us to study (rigorously!) these more complex systems, where classic approaches can not be applied.
We’ll look at an example of this today.

Relaxing assumptions in the Prisoner’s Dilemma
As we talked about last time, the PD is not just about prisoners, but can also be used to study cooperation.
The basic formulation poses a situation with two possible actions (defect or cooperate).
This omits the possibility of interacting with the same individual repeatedly, i.e. situations where “trust” can develop.
What happens when we allow repeated interactions?

Iterated Prisoner’s Dilemma
The game is essentially the same, but the two players play against each other more than once, and can change how they play based upon previous games.
How does this change the game?
The qualitative diversity of possible strategies increases. It is no longer just a matter of picking defect/cooperate, but deciding how to act based upon a history of interaction.
Always defect
Always cooperate
Defect XX% of the time “tit-for-tat”
“tit-for-tat with forgiveness” “tit-for-two-tats”
“win-stay, lose-shift”
“grim trigger”
https://plato.stanford.edu/entries/prisoner-dilemma/strategy-table.html

strategies are now more like programs
tit-for-tat
On the first round, cooperate.
On subsequent rounds, if the opponent defected previously, then defect, otherwise cooperate.
grim trigger
if the opponent has ever defected before, then defect, otherwise cooperate
these strategies can be complicated!
https://plato.stanford.edu/entries/prisoner-dilemma/strategy-table.html

The set of qualitatively different strategies is very large. How can things like be established?
1. impose constraints (e.g. of these N strategies, what are the ?)
2. monte carlo / evolutionary sampling

“Axelrod solicited iterated prisoner’s-dilemma algorithms from many [human] sources, and ran them in a competitive environment, in which the highest- scoring algorithms were allowed to “reproduce,” and the lowest-scoring ones were eliminated.” (Chess, 1988; p644)
Chess, D. M. (1988). Simulating the evolution of behavior: the iterated prisoners’ dilemma problem. Complex systems, 2(6), 663-670. Axelrod (1984) The Evolution of Cooperation, (Basic Books, ) .

Chess, D. M. (1988). Simulating the evolution of behavior: the iterated prisoners’ dilemma problem. Complex systems, 2(6), 663-670.

Prisoner’s Dilemma
“defecting completely dominates cooperation; that is, no matter what one’s partner does, one’s own payoff is higher if one defects.”
“On the other hand, the highest total payoff is obtained when both parties cooperate […] the payoff to each party in that case is in fact higher than if both parties behave “rationally” and defect.”
(both prisoners keep their mouths shut and are convicted, but only do a small amount of prison time)
(the defector goes free for ratting out their partner who gets a maximum sentence)
(the defector goes free for ratting out their partner who gets a maximum sentence)
(both criminals betray each other and the police can then convict them both)
Chess, D. M. (1988). Simulating the evolution of behavior: the iterated prisoners’ dilemma problem. Complex systems, 2(6), 663-670.

Chess (1988) simulated a kind of evolutionary process, where
1. strategies compete against each other
2. losing strategies “die” and are replaced with copies
of more successful strategies
3. the copying process is imperfect–i.e. when a
winning strategy is copied, there is a chance that the copy is “mutated”, changed in a random (generally small) way
Chess, D. M. (1988). Simulating the evolution of behavior: the iterated prisoners’ dilemma problem. Complex systems, 2(6), 663-670.

We’ve seen this before!

Chess (1988) simulated a kind of evolutionary process, where
1. strategies compete against each other
2. losing strategies “die” and are replaced with copies
of more successful strategies
3. the copying process is imperfect–i.e. when a
winning strategy is copied, there is a chance that the copy is “mutated”, changed in a random (generally small) way
Chess, D. M. (1988). Simulating the evolution of behavior: the iterated prisoners’ dilemma problem. Complex systems, 2(6), 663-670.

Mutatable Strategies
For this to work, strategies must not all be pre-specified. They must be mutatable.
To make this possible, Chess represents strategies as arithmetic expressions involving three variables:

Mutatable Strategies
During a competition, if the expression is less than 0, then the action produced by the strategy is defect otherwise it is cooperate.
Classic strategies are easily represented in this form:
But also, an (infinite!) variety of other strategies can also be generated by the random mutations of the evolutionary process…

Chess’s Evolutionary process

Chess observes four stages commonly observed in the evolutionary process.
1. The “Era of Exploitation” – the initial population of random strategies tends to be pure always defect or always cooperate strategies.
[HOMEWORK: Why is that true?]
Those that are pure defectors tend to do quite well, and represent more and more of the population as time passes.
2. “The Nadir” – Virtually all strategies other than always defect are eliminated. Total population score is quite low.
3. The “Growth of Trust” – strategies with Axelrod’s desirable properties (nice, provokable and forgiving) emerge.
4. “Equilibrium” Pure defectors are eliminated from the population, except for the occasional that emerges via mutation. Strategies such as tit-for-tat and nice-but-unforgiving dominate and remain dominant despite the occasional appearance of a pure defector.

Questions to think about and discuss on 1. Is this the final equilibrium?
Q2. Do the arithmetic expressions used by Chess capture all possible strategies?
Chess lists some strategies found at the equilibrium stage. What can you say about the first three?
tit-for-tat
nice but unforgiving
a.k.a. grim-trigger
Q3. Why do we see so many versions of the same strategy?

In this final “equilibrium” phase, we see a glimpse of what is called an Evolutionarily Stable Strategy. A strategy that cannot be invaded by (small numbers of) another strategy.
If most of the population is playing tit-for-tat, and a small number of pure defectors appear, the defectors will NOT win enough to take over the population.
tit-for-tat is “robust against invasion” by pure defectors.
If tit-for-tat is robust against invasion from all relevant strategies, then it is considered an evolutionarily stable strategy.

In extending the Prisoner’s Dilemma to include repeated interaction (Iterated Prisoner’s Dilemma), diversity of strategies grew enormously.
Computational methods allow us to consider formally a wide variety of strategies, and the computational simulation of an evolution-like process allowed us to understand how populations of these different strategies typically change over time.
The results of this kind of simulation are less “proof like” – and more used as a kind of data in support of an argument. This is part of the idea of “artificial life” research, to study life as it could be.”

Considering space

What happens when a population of competitors are spatially distributed?

In the last section, we considered a population of competing strategies. This population was “well-stirred”, meaning that each individual strategy had an equal chance of interacting with any other individual.
When populations are spatially distributed, this generally means that that “well-stirred” assumption no longer holds. To be in a location, essentially means to be more likely to interact with certain individuals than others.

When thinking about spatially distributed individuals, we can consider a graph of interactions.
Vertices indicate individuals
Connections (lines) show which individuals can interact with each other

Rock Paper Scissors
Rock beats Scissors Scissors beats {aper Paper beats Rock
A> B > C >A…
so there’s no good strategy to play right?
https://www.vox.com/2014/5/1/5671404/your-complete-guide-to-using-science-to-win-at-rock-paper-scissors

Each individual will randomly either cooperate (blue) or defect (red) in the initial start of the model.
Each cycle, it interacts with all of its 8 neighbors.
If it cooperates, its score will be the number of neighbors that also cooperated.
If it defects, then its score is the product of the Defection-Award multiple and the number of neighbors that cooperated (i.e. the patch has taken advantage of the patches that cooperated).
In the subsequent round, the patch will set its old-cooperate? to be the strategy it used in the previous round. For the upcoming round, the patch will adopt the strategy of one of its neighbors that scored the highest in the previous round.
Wilensky, U. (2002). NetLogo PD Basic Evolutionary model. http://ccl.northwestern.edu/netlogo/models/PDBasicEvolutionary. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com