Game is a search problem
Example:tic‐tac‐toe(alsocallednoughtsandcrosses)isa game for two players, X and O, who take turns marking the spaces in a 3×3 grid. The player who succeeds in placing three of their marks in a diagonal, horizontal, or vertical row is the winner.
Start state Goal state
Game Tree
Game is typically an Adversarial Search problem where your opponent has something to say about your strategy.
Types of Games
Manydifferenttypesofgames
Istheresomerandomnesselementinthegame
Deterministic, e.g. Tic‐Tac‐Toe, Chess, Chinese Chess, Go
Stochastic, e.g. Poker, Mahjong
Howmanyplayers
One, e.g. Solitaire, various puzzle games
Two, e.g. Tic‐Tac‐Toe, Chess, Chinese Chess, Go
More than two, e.g. many Poker games, Mahjong
Areyouplayingagainsteachother(strictlycompetitive)
Zero‐sum, e.g. Tic‐Tac‐Toe, Chess, Go, Poker
Non‐zero‐sum, e.g. Prisoner’s Dilemma
Canyouseethestate
Perfect information, e.g. Tic‐Tac‐Toe, Chess, Chinese Chess, Go
Imperfect information, e.g. Poker, Mahjong
Formalisation
States : (Start State )
Actions: (may depend on player/state)
Transitionfunction:
Terminaltest:
Players: (usuallytaketurns)
Utilities: (valuesonoutcomes)
A utility function (also called an objective function or payoff function), defines the final numeric value for a game that ends in terminal state for a player .
We want algorithms to find a strategy (Policy) which recommends a move for each state, i.e.
Value of a State
Valueofastate:thebestachievableoutcome(utility)fromthatstate. Example:4‐Queenspuzzle,letussaytheutilityofavalidsolutionis1
and an invalid solution is 0.
Q1 Q2
1
Q1 Q1 Q1 Q1
Q2 Q2 Q2 Q2
Q3 Q3 Q3 Q3
0001
Q1 Q1 Q1 Q1 Q2 Q2 Q2 Q2
Q3 Q3 Q3 Q3 Q4 Q4 Q4 Q4
0100
State value in adversarial games
Example:tic‐tac‐toe,utilityforX:win(1),draw(0),loss(‐1).
Agent X
XOX OO
current state
Agent O
XOX XOX XOX XOO OO OO XXXXX
Agent X
XOX XOX XOX XOX XOO XOO OOO OO OXXOXXXXO
XOX XOX OOO OO XX OXX
0
X
0 ‐1 ‐1
0 1 ‐1 1 ‐1 0
XOX XOX XOO XOO OXX XXO
XOX XOX XOO XOO XXO OXX
01
10
Minimax
Minimaxvalueofanodeistheutilityoftheterminalstateto which both players play optimally from that node.
Sotheprocessofatwo‐playergameisthatoneplayer(called player Max) is to maximise its utility whereas its opponent (called player Min) is to minimise the utility of Max.
For a state , its minimax value
is
∈ ∈
Minimax: example
Max
3
Min
321
4 3 8 2 6 7 17 1 5