代写 algorithm game Instruction:

Instruction:
Homework for Section 3
This assignment includes one written part and a programming part.
➢ The answer to written parts is expected in Section3_ Report_StudentName_StudentID.pdf.
➢ The solution to programming part is expected to be explained in
Section3_Report_StudentName _StudentID.pdf as well.
➢ And you should write code in Section3_Submission_StudentName_StudentID.py, which
should be well packaged and fully annotated. A ReadMe file is required to guide how to run the code.
Problem: Q1((Written Part):
Given a basic game, the set of possible states is {-2, -1, 0, 1, 2}. You start at state 0, and if you reach either -2 or 2, the game ends. At each state, you can take one of two actions: {-1, +1}.
If you’re in state s and choose -1:
⚫ You have an 80% chance of reaching the state s−1.
⚫ You have a 20% chance of reaching the state s+1.
If you’re in state s and choose +1:
⚫ You have a 70% chance of reaching the state s+1.
⚫ You have a 30% chance of reaching the state s−1.
If your action results in transitioning to state -2, then you receive a reward of 20. If your action results in transitioning to state 2, then your reward is 100. Otherwise, your reward is -5. Assume the discount factor γ is 1.
Give the value of 𝑉 (𝑠) for each state s after 0, 1, and 2 iterations of value iteration. Iteration 𝑜𝑝𝑡
0 just initializes all the values of V to 0. Terminal states do not have any optimal policies and take on a value of 0.
Q2(Programming Part):
Let’s go on a cliff walk task as shown and the rule is as follow:

⚫ The start node is colored yellow and the terminal is colored orange. If you walk into the terminal node, you can get a reward of 10. You can choose to take action of “up”, “down”, “left” or “right” at each round.
⚫ The cliff is colored gray and if you walk into the cliff, you would receive a reward of -100 as a punishment.
⚫ If you walk into the red node, you would get -1 reward.
⚫ The black node means barrier which cannot walk into.
⚫ The discount factor is set as 0.9.
Note: The code of the environment creation is also provided, you can try other environments if interested.
Implement the SARSA algorithm with 𝜖-greedy in cliffwalk.py. Select the
proper parameter such as maximal episode num, the update ratio 𝛼, and 𝜖. Please
show the final Q (s, a) matrix and final policy in the report.