CS计算机代考程序代写 CMPUT397 Winter 2021

CMPUT397 Winter 2021
Final2: Sample Final
导师:Baihong Qi

Final Exam

LOTE: Law of total expectation

MP: Markov Property
• Conditional Probability depends only upon the present states

LOTUS

3. Sarsa算法(on-policy)
Sarsa

3. Sarsa算法(on-policy)
Sarsa

4. Expected Sarsa公式 (off-policy)
Sarsa
• St+1时look back
• current state 是 St+1
5. Sarsa与Expected Sarsa的关系
• Sarsa是on-policy, Expected Sarsa多数情况下是off-policy

3. Dyna Q
Planning & Learning