SEHS4678 Artificial Intelligence
Due 23:59, 18 Mar 2022 (Friday, Week 7)
The objective of this assignment is twofold: (a) to explore the use of a reinforcement learning model to solve a real-world problem, and (b) to get familiar with OpenAI Gym, one of the most popular toolkits for implementing reinforcement learning simulation environments. More specifically, you will use the classical reinforcement learning algorithm Q-learning to solve OpenAI Gym’s FrozenLake problem.
Copyright By PowCoder代写 加微信 powcoder
You should have acquired sufficient knowledge to do this assignment right after Lecture 3. So please start doing this assignment as early as you can, as this will help you follow and understand this course. Don’t wait until Week 7!
A. Things to do
1. Study Chapters 7 and 8 of our textbook [P] AI Crash Course to learn Q-learning and its implementation in Python.
2. Study both the documentation on OpenAI Gym at https://gym.openai.com/docs/
and the FrozenLake problem at https://gym.openai.com/envs/FrozenLake-v0/ .
[Note that you don’t need to install Gym on your computer as you will use Google Colab which already has Gym installed.]
Winter is here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you’ll fall into the freezing water. At this time, there’s an international frisbee shortage, so it’s absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won’t always move in the direction you intend.
The surface of the lake is described using a grid like the following:
SFFF (S: starting point, safe)
FHFH (F: frozen surface, safe)
FFFH (H: hole, fall to your doom)
HFFG (G: goal, where the frisbee is located)
The episode (iteration) ends when you reach the goal or fall in a hole. You receive a reward of 1 if you reach the goal, and zero otherwise.
Solve this problem using Q-learning by following the approach in Chapters 7 and 8 of our textbook [P] AI Crash Course. The skeleton of the AI program (in Python) is provided in Table 1 below. Follow the comment on the right to fill in the corresponding missing Python code on the left (rows 1 – 9).
Python code
Import the toolkit Gym. 1 line.
Import the library Numpy. 1 line.
env = gym.make(‘FrozenLake-v0’, is_slippery = False)
Reset the environment. 1 line.
Display the environment in its current state. 1 line.
print(env.observation_space.n)
print(env.action_space.n)
Create and initialize the matrix of Q-values, which is named Q . 1 line.
Set the parameters alpha, gamma and the number of episodes (iterations) for Q-learning to be 0.6, 0.75 and 1000, respectively. 3 lines.
Training Mode –
Update the matrix of Q-values using the Bellman equation. Actions are chosen by random selection only. You need to use the functions env.reset() and env.step(). Find out what these functions do and what values they return from the documentation. Estimated number of lines: 10
print(Q[14, 2])
route = [0]
actions = [ ]
Inference Mode –
Utilize the updated matrix of Q-values above to find a route to move from S to G. The variable “route” is a list holding the states (in order) in the route. Initially, it holds just the state 0 as it starts at S. The variable “actions” is a list holding the actions done (in order) in moving along the route. Initially, it is empty. You need to use the functions env.reset() and env.step(). Estimated number of lines: 10
print(route)
print(actions)
Display the states of the environment and the actions done, one after another, in moving along the route, starting from S to G. You need to use the functions env.reset(), env.render() and env.step().
Estimated number of lines: 6
Your marks:
Some points to note:
1. The numbers below show the states of the environment. For example, when you are in the yellow square, the state is 6, and when you are in the brown square, the state is 13.
2. Action 0 is Left, action 1 is Down, action 2 is Right, and action 3 is Up.
3. The estimated number of lines is just an estimation and gives you a rough idea of how many lines to write. You may, of course, write more or fewer lines so long as your code works.
Login Google Colab and open a new notebook named YourName_Asm1 (e.g., ChanTaiMan_Asm1). Type your completed Python program in Table 1 above (including the code already written for you) into your notebook, one row in one cell. Make sure that your program runs and gives the correct outputs.
B. Things to submit
1. This file named YourName_Asm1.docx (e.g., ChanTaiMan_Asm1.docx)
2. Your notebook named YourName_Asm1.ipynb (e.g., ChanTaiMan_Asm1.ipynb)
This assignment is intended to be easy. Don’t blindly copy and paste code from the Internet. Study Chapters 7 and 8 of our textbook [P] AI Crash Course. You are welcome to consult the lecturer if you encounter any difficulties in completing this assignment. Please take the initiative to seek help!
~ End of Assignment ~
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com