代写 algorithm game matlab statistic SYSC 5401

SYSC 5401
Date: Due Date:
Question 1:
Carleton University
Department of Systems and Computer Engineering
Adaptive and Learning Systems Assignment #5
Thursday, March 21, 2019 Tuesday, April 9, 2019
Winter 2019
Simulate the 10 – armed bandit problem. Arbitrarily set the true action payoffs to a normal random variable with statistics N(0, 1). Then set the first estimate of the value function to another random variable with the same statistics. Plot the Average reward as a function of the number of plays. Do the simulation three times for ε = 0.0, ε = 0.01 and ε = 0.1. Essentially, replicate the results on Schwartz p. 16 or S&B p. 29.
Question 2:
You are given the following 3×3 grid game. If the robot enters state 1, it will then move to state 9 given any action and receives a reward of +20. If in other states it hits the wall then it gets a reward of -2. Set up the 9 equations and 9 unknowns given by equation 2.13 of Schwartz p. 18 or equation 4.4 p. 90 of S&B.
Use matlab to solve the equations and determine the value of each state given a 25% chance that at each state an action of up, down, right and left can be taken.
Question 3:
Using the algorithm 2.1 p. 26 of Schwartz or the algorithm in S&B Fig. 4.1 p. 92, write a Matlab or other program to compute the value of the state in the following grid described in question #2.
Question 4:
Using the algorithm 2.2 p. 27 in Schwartz or the algorithm in figure 4.3, S&B p. 98, write a Matlab (or other tool) to compute the optimal policy and the optimal value function. Show on a grid the optimal policy decisions.
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9