代写代考 PowerPoint Presentation

PowerPoint Presentation

We are given a game search tree, showing sequence of states to outcome

Copyright By PowCoder代写 加微信 powcoder

At N = 5, reward r(s5) ∈ {+1, 0, -1}

A good evaluation function should be reasonably stable from one move to next

eval(s1, w) ∈ [- ∞, + ∞]

eval(s2, w) ∈ [- ∞, + ∞]

eval(s3, w) ∈ [- ∞, + ∞]

eval(s4, w) ∈ [- ∞, + ∞]

r(s5) ∈ {+1, 0, -1}

A good evaluation function should be reasonably stable from one move to next

eval(s1, w) ∈ [- ∞, + ∞] r(s1, w) = tanh(eval(s1, w))

eval(s2, w) ∈ [- ∞, + ∞] r(s2, w) = tanh(eval(s2, w))

eval(s3, w) ∈ [- ∞, + ∞] r(s3, w) = tanh(eval(s3, w))

eval(s4, w) ∈ [- ∞, + ∞] r(s4, w) = tanh(eval(s4, w))

r(s5) ∈ {+1, 0, -1}

We can convert evaluation score into rewards using the tanh function to “squash” [- ∞, + ∞] into [- 1, + 1]

Actually, we can use the minimax value at node si based on minimax search from si to cut-off depth L
using our evaluation function

eval(sL1, w) ∈ [- ∞, + ∞] r(sL1, w) = tanh(eval(sL1, w))

eval(sL2, w) ∈ [- ∞, + ∞] r(sL2, w) = tanh(eval(sL2, w))

eval(sL3, w) ∈ [- ∞, + ∞] r(sL3, w) = tanh(eval(sL3, w))

eval(sL4, w) ∈ [- ∞, + ∞] r(sL4, w) = tanh(eval(sL4, w))

r(s5) ∈ {+1, 0, -1}

Slide Number 1
Slide Number 2
Slide Number 3
Slide Number 4

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com