PowerPoint Presentation
We are given a game search tree, showing sequence of states to outcome
Copyright By PowCoder代写 加微信 powcoder
At N = 5, reward r(s5) ∈ {+1, 0, -1}
A good evaluation function should be reasonably stable from one move to next
eval(s1, w) ∈ [- ∞, + ∞]
eval(s2, w) ∈ [- ∞, + ∞]
eval(s3, w) ∈ [- ∞, + ∞]
eval(s4, w) ∈ [- ∞, + ∞]
r(s5) ∈ {+1, 0, -1}
A good evaluation function should be reasonably stable from one move to next
eval(s1, w) ∈ [- ∞, + ∞] r(s1, w) = tanh(eval(s1, w))
eval(s2, w) ∈ [- ∞, + ∞] r(s2, w) = tanh(eval(s2, w))
eval(s3, w) ∈ [- ∞, + ∞] r(s3, w) = tanh(eval(s3, w))
eval(s4, w) ∈ [- ∞, + ∞] r(s4, w) = tanh(eval(s4, w))
r(s5) ∈ {+1, 0, -1}
We can convert evaluation score into rewards using the tanh function to “squash” [- ∞, + ∞] into [- 1, + 1]
Actually, we can use the minimax value at node si based on minimax search from si to cut-off depth L
using our evaluation function
eval(sL1, w) ∈ [- ∞, + ∞] r(sL1, w) = tanh(eval(sL1, w))
eval(sL2, w) ∈ [- ∞, + ∞] r(sL2, w) = tanh(eval(sL2, w))
eval(sL3, w) ∈ [- ∞, + ∞] r(sL3, w) = tanh(eval(sL3, w))
eval(sL4, w) ∈ [- ∞, + ∞] r(sL4, w) = tanh(eval(sL4, w))
r(s5) ∈ {+1, 0, -1}
Slide Number 1
Slide Number 2
Slide Number 3
Slide Number 4
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com