CSCI 5512: Artificial Intelligence II (Spring 2022)
Homework 3
(Due Tue, Mar 29, 11:59 pm central)
1. (60 points) [Programming Assignment] A Markov Decision Process (MDP) specifies a set of actions, set of states, transition function, and reward function. In this problem, you will need to implement the Value Iteration and Policy Iteration algorithms to compute the optimal policy (which maps states to actions).
Copyright By PowCoder代写 加微信 powcoder
We will consider the MDP specified in Figure 1. The set of actions are to move up, down, left, and right, the set of states are those empty squares (all except (2,2)), the transition function is specified in Figure 1(b) where with probability 0.8 the desired action will take the agent into the next state in that direction, with probability 0.1 the agent will move to the left, and with probability 0.1 the agent will move to the right. The reward function is defined by the amount of reward given to the agent when moving to another state (either terminal or non-terminal state). When the agent moves to one of the terminal states, either (4,3) or (4,2), the agent will receive a reward of +1 or -1 respectively. When the agent moves to a non-terminal state, we will consider three different rewards the agent may receive (i) r = −2, (ii) r = −0.2, and (iii) r = −0.01.
1234 (a) States of the MDP
(b) Transitions due to action
Figure 1: Markov Decision Process. For each of the three reward functions, do the following:
(a) (30 points) Write Python code (from scratch) to implement the Value Iteration algo- rithm in file mdpVI.py. Use your code to compute the optimal policy for the MDP in Figure 1.
(b) (30 points) Write Python code (from scratch) to implement the Policy Iteration algo- rithm in file mdpPI.py. Use your code to compute the optimal policy for the MDP in Figure 1. For solving the linear equation Ax = b, you can use numpy.
For (a) and (b) above, the computed optimal policy and the utility of each state has to be shown in the state space diagram. You have to submit code for mdpVI.py and mdpPI.py which each take in one argument: reward, which determines the reward for every non-terminal state. The output should be the policy of each state, on separate lines, in the form: row, column, policy, where row ∈ {1,2,3}, column ∈ {1,2,3,4}, and policy ∈ {u,d,l,r}, which respectively correspond to up, down, left, and right. Note that the policy has to be stated for 9 non-terminal states, since (2, 2) is not a valid state, and (4, 2) and (4, 3) are terminal states. Sample input when r = −2: $python mdpVI.py -2 and $python mdpPI.py -2.
2. (40 points) [Programming Assignment] In this problem, we will consider decision trees of depth d to classify the Restaurant dataset (hw3 dataset.csv) shown in Table 1. You will need to compute the leave-one-out-cross-validation (LOOCV) classification error rate for both the training and test folds. For LOOCV, the number of folds is k = n so you will train on k − 1 = 11 data points (training folds) and predict on the 1 left out data point (test fold).
Attributes Target
Alt Bar Fri Price Rain Res Type Est WillWait
X1 TFFTSome$$$FTFrench0–10 T
X2 TFFTFull$FFThai30–60 F
X3 FTFFSome$FFBurger0–10 T
X4 TFTTFull$FFThai10–30 T
X5 TFTFFull$$$FTFrench>60 F
X6 FTFTSome$$TTItalian0–10 T
X7 FTFFNone$TFBurger0–10 F
X8 FFFTSome$$TTThai0–10 T
X9 FTTFFull$TFBurger>60 F
X10 T T T T Full $$$ F T Italian 10–30 F
X11 F F F F None $ F F Thai 0–10 F
X12 T T T T Full $ F F Burger 30–60 T
Table 1: Restaurant dataset.
For this problem, we want to compare the training error rate and the test error rates. So we will compute both during cross validation. To compute the training error rate, fit the model using the training data, then pass the training data back through the model and predict the label of each of the 11 training data points in the current training fold. The number of misclassified data points divided by 11 is the error rate for that round. Do this for each of the 12 rounds and then compute the average error rate across the 12 rounds and report this number. Similarly, for the test data, for each of the 12 rounds of cross validation, use the fitted model to predict the label of the 1 held out data point. After the 12 rounds of cross validation, you will have 12 predicted labels for the test data points, compute the average error rate across the 12 data points and report this number.
Write Python code (from scratch) to implement the decision tree algorithm in file dtree.py using information gain. Apply your code to the hw3 dataset.csv and learn a decision tree of depth d. For example, for d = 1 the decision tree splits on only one attribute; for d = 2 it splits on two attributes; etc. Report the LOOCV classification error rates for both training and test data for each value of d = 1, 2, 3, and 4. Compare the training and test error rates. What do the error rates imply is happening? (Note, you must implement all code from scratch. You may only use 3rd party tools (like numpy) for basic mathematical computations.)
Instructions
Code can only be written in Python 3.6+; no other programming languages will be accepted. One should be able to execute all programs from the Python command prompt or terminal. Each program must take the inputs in the order specified in the problem and display the textual output via the terminal and plots/figures should be included in the report. Please specify instructions on how to run your program in the README file.
For each part, you can submit additional files/functions as needed. You may use libraries for basic matrix computations and plotting such as numpy, pandas, and matplotlib. Put comments in your code so that one can follow the key parts and steps in your code.
Your code must be runnable on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu). One option is to SSH into a machine. Learn about SSH, and other options, at these links: https:// cse.umn.edu/cseit/self-help-guides/secure-shell-ssh and https://cse.umn.edu/cseit/ self-help-guides.
Follow the rules strictly. If we cannot run your code, you will not get any credit. Things to submit
1. hw3.pdf: A document which contains solutions to all problems. This document must be in PDF format (doc, jpg, etc. formats are not accepted). If you submit a scanned copy of a hand-written document, make sure the copy is clearly readable, otherwise no credit may be given.
2. Python code for Problem 1 (must include the required mdpVI.py and mdpPI.py files).
3. Python code for Problem 2 (must include the required dtree.py file).
4. README.txt: README file that contains your name, student ID, email, assumptions you are making, other students you discussed the homework with, and other necessary details.
5. Any other files, except the data, which are necessary for your code (such as package dependencies like a requirements.txt or yml file).
Homework Policy. (1) You are encouraged to collaborate with your classmates on homework problems, but each person must write up the final solutions individually. You need to list in the README.txt which problems were a collaborative effort and with whom. (2) Regarding online resources, you should not:
Google around for solutions to homework problems,
Ask for help on online,
Look up things/post on sites like Quora, StackExchange, etc.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com