More on Reinforcement Learning
AIMA 22.3
All CS188 materials are available at http://ai.berkeley.edu.]
● Q
●
● Q
More on Reinforcement Learning
AIMA 22.3
●
○
○
○
○
Value Iteration Cannot Work
●
○
○
● T(s,a,s’)
○ s
○ T(s,a,s’)
Solution: Q-Value Iteration
● Q
○ Q0(s,a) = 0
○ Qk k+1 Q Q
● Q
● Q(s,a)
○
○
○
○
● ɑ
●
●
●
●
○
○
○
More on Reinforcement Learning
AIMA 22.3
●
○
○
●
●
●
●
●
○
○
●
●
○ t
1/t
○
k
● k
●
● t
k
● q a t
● Qt(a) a
● t, a Qt(a)
● Qt(a)
●
○ Qt(a)
●
●
●
○ Qt(a) q*(a)
Sutton & Barto, Reinforcement Learning, Fig
2.2
●
●
●
○
■
■
○
More on Reinforcement Learning
AIMA 22.3
Q
●
●
● Q
○ s, a, r, s’, a’
○ Q a’
Q- s’
Q
● Q
○
○
●
○
○
● Q
○
○
○
●
○