代写 game math theory CITS4404 Artificial Intelligence Semester 2, 2019

CITS4404 Artificial Intelligence Semester 2, 2019
Game Theory
Unit Coordinator: Yuliya Karpievitch

From simple rules to modeling people, animals and other entities that interacts and gather information about the world

Complex Dynamics & Agent-based models
Are Complex Dynamics a product of complex rules?
– Can emerge as a result of the interaction of simple rules that change over time
What is an Agent-Based Model?
– The world can be modelled using agents, an environment, and a description about how agents interact with other agents and the environment

Game Theory
Strategic Interactions for self-interested agents Repeated games
Will communication affect the game?
Rational opponent

Prisoner’s Dilemma
Here prisoners cooperate with each other, not the police

Prisoner’s Dilemma – Cooperate or Defect

Prisoner’s Dilemma
ØThe dilemma has been studied extensively in Game Theory and has been applied widely in biology, business, sociology, philosophy, etc.

ØIn most experiments, people cooperate much more than the rational agent models predict
ØThe problem of nice
Altruism: many people are willing to incur a cost to themselves in order to benefit other people https://www.youtube.com/watch?v=1tz6WE4ALUs

Prisoner’s Dilemma Tournaments
ØProposed by Robert Axelrod in late 70s
ØIterated version of the PD, in which the agents play multiple rounds
against the same opponent, so their decisions can be based on history ØA simple strategy that did surprisingly well was called “tit for tat”, or TFT.
Ø TFT always cooperates during the first round of an iterated match; after that, it copies whatever the opponent did during the previous round
Øhttps://www.youtube.com/watch?v=BOvAbjfJ0x0

Prisoner’s Dilemma Tournaments (Cont.)
Characteristics of the strategies that did well:
ØNice: The strategies that do well cooperate during the first round, and
generally cooperate as often as they defect in subsequent rounds
ØRetaliating: Strategies that cooperate all the time did not do as well as strategies that retaliate if the opponent defects
ØForgiving: But strategies that were too vindictive tended to punish themselves as well as their opponents
ØNon-envious: Some of the most successful strategies seldom outscore their opponents; they are successful because they do well enough against a wide variety of opponents
TFT has all of the above

Game Theory – Rock-Paper-Scissors
This is a Zero-sum game where one player wins the other looses, can have a draw
In non-zero/non-constant sum games, players’ interests are not always in direct conflict, there are opportunities for both to gain

Nash Equilibrium – John Nash is known for…

Nash Equilibrium
A stable state of a system that involves several interacting participants in which NO participant can gain by a change of strategy as long as all the other participants remain unchanged

Nash Equilibrium
A stable state of a system that involves several interacting participants in which NO participant can gain by a change of strategy as long as all the other participants remain unchanged

Nash Equilibrium
A stable state of a system that involves several interacting participants in which NO participant can gain by a change of strategy as long as all the other participants remain unchanged
Dominant Strategy Nash Equilibrium

Dominant Strategy
A dominant strategy is a particular choice that a player can make such that the payoff (utility) is strictly higher than any of the payoffs if the player makes another choice
Dominant Strategy Nash Equilibrium

Very Weakly Dominant Strategy
A very weakly dominant strategy is a particular choice that a player can make such that the payoff (utility) is higher or equal to any of the payoffs if the palyer makes another choice

Prisoner’s Dilemma
The optimal outcome of a game is one where no player has an incentive to deviate from his chosen strategy after considering an opponent’s choice.

Early publications
Nash, John Forbes (1950). “Equilibrium Points in N-person Games”. Proceedings of the National Academy of Sciences of the United States of America. 36 (1): 48–49. doi:10.1073/pnas.36.1.48. MR 0031701. PMC 1063129. PMID 16588946.
Nash, John Forbes (1950). “The Bargaining Problem”. Econometrica. Basel, Switzerland: MDPI. 18 (2): 155–62. doi:10.2307/1907266. JSTOR 1907266. MR 0035977.
Nash, John Forbes (1951). “Non-cooperative Games”. Annals of Mathematics. Princeton, New Jersey: Princeton University. 54 (2): 286–95. doi:10.2307/1969529. JSTOR 1969529. MR 0043432.
Nash, John Forbes (1953). “Two-person Cooperative Games”. Econometrica. Basel, Switzerland: MDPI. 21 (1): 128–40. doi:10.2307/1906951. MR 0053471. Archived from the original (PDF) on March 29, 2017. Retrieved January 4, 2017.

Extensive Form vs Normal Form Games
Extensive form game – players take turns
A specification of a game allowing for the explicit representation of the sequencing of players’ possible moves, their choices at every decision point, the information each player has about the other player’s moves when they make a decision, and their payoffs for all possible game outcomes.
Perfect information
Imperfect information
Normal form games – players make a move at the same time

Perfect Information Extensive Form Games
Players
Actions
Utility function – assigns a value to each terminal node – for each choice node we have an Action function and a Player function
Terminal nodes
Successor function – maps choice nodes and action to a new node

Extensive-form vs Normal-form
Extensive form
Player 1: 4 pure strategies: S1 = {(A,G), (A,H), (B,G), (B,H)} Player 2: 4 pure strategies: S2 = {(C,E), (C,F), (D,E), (D,F)}
A pure strategy completely specifies how a player will play that game. Pure strategy is a specific move that a player will follow in every possible situation in a game. Every combination of moves.

Extensive-form vs Normal-form
Extensive form Normal form
Can convert Extensive form to Normal from – List Pure Strategies Normal form has repetitions –increase in size!
Cannot always do a transformation in reverse
Theorem: Every Perfect information extensive form game has a Pure Strategy Nash Equilibrium

Three pure strategy equilibria
Extensive form Normal form
BHCE – can any of the players get a better payoff if they deviate? Player 1 options are 5(tie) or 3
Player 2 options are 5(tie) or 1
https://www.coursera.org/learn/game-theory-1/lecture/1eI6P/4-3-perfect- information-extensive-form-strategies-br-ne

Why is BG-CF not an equilibrium?
Extensive form Normal form
BG-CF – can any of the players get a better payoff if they deviate? Player 1 options are 3 (better!) or 1 – can deviate!
Player 2 options are 10 (tie) or 5
https://www.coursera.org/learn/game-theory-1/lecture/1eI6P/4-3-perfect- information-extensive-form-strategies-br-ne

What is the advantage of Extensive form?
Extensive form Normal form
Clear what the order of play is
The tree shows that player 1 moves first and player 2 observes this move… We can see the actions as a sequence

Extensive form for Prisoner’s dilemma

Playoff shown for Player 1 only!

Strategy = always H
P2: Always guess H à P1 can choose Play every time coin lands T EV for Play becomes $1
For P1
EV H = 0.5
EV T = 1 Average = 0.75

Strategy = always T
P2: Always guess H à P1 can choose Play every time coin lands H EV for Play becomes $1
For P1
EV H = 1
EV T = -0.5 Average = 0.25

Strategy = Mixed!
We take a weighted average!
P2
What is the Average Expected Payoff for P1?
For P1
EV H = 0.5 EV T = -0.5 Average = 0

What if we switch the payoffs?
P2 has to change strategy also – by changing the payoff in C1 we have affected the P2’s strategy
In imperfect information games the ultimate strategy in some subgame it may depend on the ultimate strategy in the unrelated part of the tree
Important to know the EV at the ‘unrelated’ cell not the actual strategy that follows
Libratus (Noam Brown): https://www.youtube.com/watch?v=2dX0lwaQRX0

Imperfect Information Games
Poker – challenge for Game Theory
2005 – Rhode Island Hold’em Solved (109 decisions)
2015 – Heads-up limits Texas Hold’em (1013 decisions)
2017 – Heads-up No-Limits Texas Hold’em – no limit betting structure
Unique decisions points 10161
Two-player game allows us to determine a clear winner (assuming no collusion)

Libratus
Played 120k hands in 20 days in Jan 2017 Against 4 out of 10 best poker players
Optimal strategy for a subgame cannot be determined from that subgame alone

Perfect Information Game

Imperfect Information Game
Find Nash Equilibrium – a set of strategies in which no player can improve by deviating

Libratus
Noam Brown: https://www.youtube.com/watch?v=2dX0lwaQRX0

L

Action Abstraction
Action Abstraction
• Various betting amounts can create large # of branches
• Limit allowed bet amount – can cause rounding errors

Information (Card) Abstraction
Use one strategy for various card combinations Can be problematic…
Opponents best hand

Libratus Abstraction

For next week
1. We are switching topics: next topic is Sentiment Analysis! We will learn about Computer Vision
Watch the Video: (TBA)
2. Implement the Prisoner’s Dilemma Tournament
• Allow for a selection of games to play – n
• Allow for at least 4 strategy selections including
Always Cooperate Always Defect Tit-For-Tat
Your choice

Workshop!!

Driving side
2 pure strategy Nash Equilibria

Watching a movie
2 pure strategy Nash Equilibria

Practice Problems
NO pure strategy Nash Equilibria

Watching a movie

Many people guess a number [0 100]
Winner is closest number to the 2/3 of the average

Many people guess a number [0 100]
Winner is closest number to the 2/3 of the average

Guess a number [0 100]
Winner is closest number to the 2/3 of the average

Play again
https://www.coursera.org/learn/game-theory- 1/lecture/vay6D/1-6-strategic-reasoning

What is the best response? What is the dominant strategy?
Consider the following ‘collective-action’ game
When P1 plays ‘Not’, P2 gets -1 from ‘Revolt’ and 0 from ‘Not. Thus ‘Not’ is the best response.
No strategy is a dominant strategy:
When the other player plays ‘Not’, it is strictly better to play ‘Not’; When the other player plays ‘Revolt’, it is strictly better to play ‘Revolt’; No strategy always dominates the other strategy.