CMPUT397 Winter 2021
Final2: Sample Final
导师:Baihong Qi
Final Exam
LOTE: Law of total expectation
MP: Markov Property
• Conditional Probability depends only upon the present states
LOTUS
3. Sarsa算法(on-policy)
Sarsa
3. Sarsa算法(on-policy)
Sarsa
4. Expected Sarsa公式 (off-policy)
Sarsa
• St+1时look back
• current state 是 St+1
5. Sarsa与Expected Sarsa的关系
• Sarsa是on-policy, Expected Sarsa多数情况下是off-policy
3. Dyna Q
Planning & Learning