CITS4404 Artificial Intelligence Semester 2, 2019
Game Theory
Unit Coordinator: Yuliya Karpievitch
From simple rules to modeling people, animals and other entities that interacts and gather information about the world
Complex Dynamics & Agent-based models
Are Complex Dynamics a product of complex rules?
– Can emerge as a result of the interaction of simple rules that change over time
What is an Agent-Based Model?
– The world can be modelled using agents, an environment, and a description about how agents interact with other agents and the environment
Game Theory
Strategic Interactions for self-interested agents Repeated games
Will communication affect the game?
Rational opponent
Prisoner’s Dilemma
Here prisoners cooperate with each other, not the police
Prisoner’s Dilemma – Cooperate or Defect
Prisoner’s Dilemma
ØThe dilemma has been studied extensively in Game Theory and has been applied widely in biology, business, sociology, philosophy, etc.
ØIn most experiments, people cooperate much more than the rational agent models predict
ØThe problem of nice
Altruism: many people are willing to incur a cost to themselves in order to benefit other people https://www.youtube.com/watch?v=1tz6WE4ALUs
Prisoner’s Dilemma Tournaments
ØProposed by Robert Axelrod in late 70s
ØIterated version of the PD, in which the agents play multiple rounds
against the same opponent, so their decisions can be based on history ØA simple strategy that did surprisingly well was called “tit for tat”, or TFT.
Ø TFT always cooperates during the first round of an iterated match; after that, it copies whatever the opponent did during the previous round
Øhttps://www.youtube.com/watch?v=BOvAbjfJ0x0
Prisoner’s Dilemma Tournaments (Cont.)
Characteristics of the strategies that did well:
ØNice: The strategies that do well cooperate during the first round, and
generally cooperate as often as they defect in subsequent rounds
ØRetaliating: Strategies that cooperate all the time did not do as well as strategies that retaliate if the opponent defects
ØForgiving: But strategies that were too vindictive tended to punish themselves as well as their opponents
ØNon-envious: Some of the most successful strategies seldom outscore their opponents; they are successful because they do well enough against a wide variety of opponents
TFT has all of the above
Game Theory – Rock-Paper-Scissors
This is a Zero-sum game where one player wins the other looses, can have a draw
In non-zero/non-constant sum games, players’ interests are not always in direct conflict, there are opportunities for both to gain
Nash Equilibrium – John Nash is known for…
Nash Equilibrium
A stable state of a system that involves several interacting participants in which NO participant can gain by a change of strategy as long as all the other participants remain unchanged
Nash Equilibrium
A stable state of a system that involves several interacting participants in which NO participant can gain by a change of strategy as long as all the other participants remain unchanged
Nash Equilibrium
A stable state of a system that involves several interacting participants in which NO participant can gain by a change of strategy as long as all the other participants remain unchanged
Dominant Strategy Nash Equilibrium
Dominant Strategy
A dominant strategy is a particular choice that a player can make such that the payoff (utility) is strictly higher than any of the payoffs if the player makes another choice
Dominant Strategy Nash Equilibrium
Very Weakly Dominant Strategy
A very weakly dominant strategy is a particular choice that a player can make such that the payoff (utility) is higher or equal to any of the payoffs if the palyer makes another choice
Prisoner’s Dilemma
The optimal outcome of a game is one where no player has an incentive to deviate from his chosen strategy after considering an opponent’s choice.
Early publications
Nash, John Forbes (1950). “Equilibrium Points in N-person Games”. Proceedings of the National Academy of Sciences of the United States of America. 36 (1): 48–49. doi:10.1073/pnas.36.1.48. MR 0031701. PMC 1063129. PMID 16588946.
Nash, John Forbes (1950). “The Bargaining Problem”. Econometrica. Basel, Switzerland: MDPI. 18 (2): 155–62. doi:10.2307/1907266. JSTOR 1907266. MR 0035977.
Nash, John Forbes (1951). “Non-cooperative Games”. Annals of Mathematics. Princeton, New Jersey: Princeton University. 54 (2): 286–95. doi:10.2307/1969529. JSTOR 1969529. MR 0043432.
Nash, John Forbes (1953). “Two-person Cooperative Games”. Econometrica. Basel, Switzerland: MDPI. 21 (1): 128–40. doi:10.2307/1906951. MR 0053471. Archived from the original (PDF) on March 29, 2017. Retrieved January 4, 2017.
Extensive Form vs Normal Form Games
Extensive form game – players take turns
A specification of a game allowing for the explicit representation of the sequencing of players’ possible moves, their choices at every decision point, the information each player has about the other player’s moves when they make a decision, and their payoffs for all possible game outcomes.
Perfect information
Imperfect information
Normal form games – players make a move at the same time
Perfect Information Extensive Form Games
Players
Actions
Utility function – assigns a value to each terminal node – for each choice node we have an Action function and a Player function
Terminal nodes
Successor function – maps choice nodes and action to a new node
Extensive-form vs Normal-form
Extensive form
Player 1: 4 pure strategies: S1 = {(A,G), (A,H), (B,G), (B,H)} Player 2: 4 pure strategies: S2 = {(C,E), (C,F), (D,E), (D,F)}
A pure strategy completely specifies how a player will play that game. Pure strategy is a specific move that a player will follow in every possible situation in a game. Every combination of moves.
Extensive-form vs Normal-form
Extensive form Normal form
Can convert Extensive form to Normal from – List Pure Strategies Normal form has repetitions –increase in size!
Cannot always do a transformation in reverse
Theorem: Every Perfect information extensive form game has a Pure Strategy Nash Equilibrium
Three pure strategy equilibria
Extensive form Normal form
BHCE – can any of the players get a better payoff if they deviate? Player 1 options are 5(tie) or 3
Player 2 options are 5(tie) or 1
https://www.coursera.org/learn/game-theory-1/lecture/1eI6P/4-3-perfect- information-extensive-form-strategies-br-ne
Why is BG-CF not an equilibrium?
Extensive form Normal form
BG-CF – can any of the players get a better payoff if they deviate? Player 1 options are 3 (better!) or 1 – can deviate!
Player 2 options are 10 (tie) or 5
https://www.coursera.org/learn/game-theory-1/lecture/1eI6P/4-3-perfect- information-extensive-form-strategies-br-ne
What is the advantage of Extensive form?
Extensive form Normal form
Clear what the order of play is
The tree shows that player 1 moves first and player 2 observes this move… We can see the actions as a sequence
Extensive form for Prisoner’s dilemma
Playoff shown for Player 1 only!
Strategy = always H
P2: Always guess H à P1 can choose Play every time coin lands T EV for Play becomes $1
For P1
EV H = 0.5
EV T = 1 Average = 0.75
Strategy = always T
P2: Always guess H à P1 can choose Play every time coin lands H EV for Play becomes $1
For P1
EV H = 1
EV T = -0.5 Average = 0.25
Strategy = Mixed!
We take a weighted average!
P2
What is the Average Expected Payoff for P1?
For P1
EV H = 0.5 EV T = -0.5 Average = 0
What if we switch the payoffs?
P2 has to change strategy also – by changing the payoff in C1 we have affected the P2’s strategy
In imperfect information games the ultimate strategy in some subgame it may depend on the ultimate strategy in the unrelated part of the tree
Important to know the EV at the ‘unrelated’ cell not the actual strategy that follows
Libratus (Noam Brown): https://www.youtube.com/watch?v=2dX0lwaQRX0
Imperfect Information Games
Poker – challenge for Game Theory
2005 – Rhode Island Hold’em Solved (109 decisions)
2015 – Heads-up limits Texas Hold’em (1013 decisions)
2017 – Heads-up No-Limits Texas Hold’em – no limit betting structure
Unique decisions points 10161
Two-player game allows us to determine a clear winner (assuming no collusion)
Libratus
Played 120k hands in 20 days in Jan 2017 Against 4 out of 10 best poker players
Optimal strategy for a subgame cannot be determined from that subgame alone
Perfect Information Game
Imperfect Information Game
Find Nash Equilibrium – a set of strategies in which no player can improve by deviating
Libratus
Noam Brown: https://www.youtube.com/watch?v=2dX0lwaQRX0
L
Action Abstraction
Action Abstraction
• Various betting amounts can create large # of branches
• Limit allowed bet amount – can cause rounding errors
Information (Card) Abstraction
Use one strategy for various card combinations Can be problematic…
Opponents best hand
Libratus Abstraction
For next week
1. We are switching topics: next topic is Sentiment Analysis! We will learn about Computer Vision
Watch the Video: (TBA)
2. Implement the Prisoner’s Dilemma Tournament
• Allow for a selection of games to play – n
• Allow for at least 4 strategy selections including
Always Cooperate Always Defect Tit-For-Tat
Your choice
Workshop!!
Driving side
2 pure strategy Nash Equilibria
Watching a movie
2 pure strategy Nash Equilibria
Practice Problems
NO pure strategy Nash Equilibria
Watching a movie
Many people guess a number [0 100]
Winner is closest number to the 2/3 of the average
Many people guess a number [0 100]
Winner is closest number to the 2/3 of the average
Guess a number [0 100]
Winner is closest number to the 2/3 of the average
Play again
https://www.coursera.org/learn/game-theory- 1/lecture/vay6D/1-6-strategic-reasoning
What is the best response? What is the dominant strategy?
Consider the following ‘collective-action’ game
When P1 plays ‘Not’, P2 gets -1 from ‘Revolt’ and 0 from ‘Not. Thus ‘Not’ is the best response.
No strategy is a dominant strategy:
When the other player plays ‘Not’, it is strictly better to play ‘Not’; When the other player plays ‘Revolt’, it is strictly better to play ‘Revolt’; No strategy always dominates the other strategy.