6CCS3AIN & 7CCSMAIN, 2016, Coursework 2 (Version 1) 1 Introduction
This coursework exercise asks you to implement a version of value iteration and then use the results to control an agent. It makes use of the same Wumpus World environment as the previous coursework.
This exercise will be assessed.
2 Getting started
You should download the file wumpus-mdp.nlogo from the KEATS page (right by where you found this document). This is a text file, so your browser will probaby open it in a text editor. If it does, save the file as wumpus-mdp.nlogo
so that NetLogo handles it properly.
Then start up Netlogo, and load the program using File > Open
This should bring up the following interface:
If the interface does not have the same version number you see here, then you are using the wrong version. Go back and download it again.
Details of what the program does can be found using the Info tab, but the basic functionality of the world has not changed from the version you used before. Your job in this coursework is to treat the world as a Markov
1
Decision Process and implement value iteration to establish an optimal policy, and then program agent controllers that make use of the results.
3 What you need to do
This coursework requires you to do four things:
1. Write a version of value iteration that assumes a deterministic action model and implement an agent control program that makes use of this model.
We start with this version of value iteration because it is simpler to get running and is a good point of comparison for the more complex case.
2. Evaluate the results of using this agent control program.
3. Write a version of value iteration that assumes a non-deterministic action model and implement an agent control program that makes use of this model.
This is the more complex “full” implementation of the solution to an MDP.
4. Evaluate the results of using this second agent control program
Check Section 3.6 for a list of the limitations on what you are allowed to do with the code.
I suggest that you read through the whole of this document before starting on the coursework.
3.1 Value iteration with a deterministic model
As downloaded, the agents in the wumpus-mdp.nlogo model do not do anything — if you run the code you will just see the wumpus doing its thing. However, there is quite a lot of scaffolding to help you write the value iteration and controllers.
For value iteration, my suggestion is that you use patches, writing the function get-path-utility. Each patch has a value attribute which can hold the utility when it is computed, and there are API functions by which the agent can access the value of the patches around the patch that it is on.
If you take this route, value iteration will be automatically invoked when setup is run, and if the value of the global variable renew? is true, value iteration will be run again the next time around the go loop.
Of course, can also implement value iteration in individual agents — building a model of the world in the same way as you did for the first coursework — and you can do this if you want. I just think it is harder.
Clearly you will need to assign a reward to the patches that contain the pits to make the agents avoid them, and to assign a reward to the patches that contain the gold to make the agents travel to them. (In this version of the Wumpus World, there is no gold in pits.) What you do with the Wumpus is up to you. You can:
• Make the Wumpus stationary and give its patch a suitable value so that it is taken into account by value iteration.
• Ignore the Wumpus and either have it move or sit still — either way the agent(s) may avoid it.
• Have the Wumpus be mobile, factor its location into the reward scheme and re-run value iteration every tick so that agents can be sure to avoid it.
2
and I am sure that there are probably solutions I have not thought of.
When it comes to writing the agent controller, you have to implement the functions move-deterministic. To actually move the agent, you need to use the functions north, south, east and west, each of which comes in a deterministic and a non-deterministic flavour (controlled by the switch in the interface). The deterministic one moves just as the name suggests — north moves the agent up the grid and so on. The non-deterministic version moves just like the model we saw in the lecture — north moves the agent up the grid 80% of the time, moves it west (left) 10% of the time and east (right) 10% of the time. The other actions move the agent symmetrically.
The agent is allowed to get information about the value of patches only through direct calls to the functions here-value, north-value, south-value, east-value and west-value. If you compute patch utility using the patches directly these are the only values you need to get optimal performance.
As part of your solution, you need to find a way to visualize the utility of each patch. If you have the value stored in each patch, this is relatively easy to do by colouring the patches. The code to do this can be placed in the function color-by-value.
3.2 Evaluate the first version
Now evaluate combination of your controller and the result of value iteration just as you did for the first coursework. That is, you should run the program 20 times, and for each run you need to record:
• The outcome: win, lose or draw.
• The score.
• The number of agents left at the end of the run.
• The number of golds left in the dungeon at the end of the run. • The number of ticks at the end of the run.
• The value of the pits and gold sliders.
You should also compute the average and standard deviation of each of these measures over the 20 runs and record that in the Info tab as well. (It would be best of you kept the value of the pits and gold sliders constant for all 20 runs. Of course, with the sliders constant, the actual number of pits and gold will vary.)
When you do the evaluation, you have a choice of how many agents to use, from 1 up to 10. Record how many you use along with the rest of the data.
3.3 Value iteration with a non-deterministic model
Now add code to perform value iteration for a non-deterministic action model. The code that you hand it should be able to run both the simple case, where the motion model is deterministic and the more complex case, when the motion model is non-deteriministic (in other words, I want to be able to run both versions when I mark your work).
3.4 Evaluate the second version
Repeat the evaluation from Section 3.2.
3
3.5 Write a report
Write up your evaluation of the controllers in a separate report that you will submit along with your code.
Your report should assess the performance of the controllers you wrote and the value iteration on which it is based. Your analysis should be based on the evaluation, and all your conclusions should be justified by the data that you have collected.
In addition to the experiments that you are required to run, you might like to see (1) how the utilities established by running value iteration with the deterministic action model work when the agent runs the controller with non- deterministic actions, and (2) how the utilities established by running value iteration with the non-deterministic action model work when the agent runs the controller with deterministic actions. We wouldn’t expect optimal behaviour, but we might see quite reasonable behaviour.
3.6 Limitations
As with the first coursework, there are some limitations on what you are able to do in the code. These limitations are primarily to ensure that each student’s work is solving the same problem, and so there is, in the cliche, a level playing field.
This is why:
• You are not allowed to modify the code of the simulator — this means you all have to write agents that work in the same world.
• You can only connect with the simulator through the API — this means that everyone’s agents have the exact same kind of information abut the world, and are constrained to move in the world in the same way.
To try to ensure this, you must:
• Write a controller by modifying the methods move-deterministic, move-non-deterministic and, if
you want to take the patch route to value iteration get-patch-utility.
You can, and good software design practice suggests that you should, break these into sub-functions, but
the top-level functions must be as above.
• Write a controller that interacts with the world through the API functions in wumpus-mdp.nlogo.
Your controller must only sense the environment using direct calls to breezy?, smelly? and glitters? so that your agent has exactly the information specified in the original Wumpus World.
Your controller can only access information about patch utility from patches adjacent to the agent. If you compute patch value using the patches themselves, you must access this information using here-value, north-value, south-value, east-value and west-value.
Your agent must act in the environment using the functions north, south, east, west and grab-gold. Only one of these can be called each tick.
You must not:
• Modify any of the existing functionality.
You are not allowed to change the API that defines how agents interact with the world, nor are you allowed
to change the way that the simulator works.
That means that you cannot modify any of the existing code, and you cannot add any code to the API or to the simulator.
4
4
•
You are allowed to add monitors, for example for debugging purposes. However, since you are not allowed to modify the existing code, you cannot introduce new global (or wumpus) variables to use in a monitor.
You are allowed to introduce new agents-own variables, and if you use patches to do the value iteration, you can add new patches-own variables.
Fix any “bugs” you find in wumpus-mdp.nlogo
This is really covered by the previous point, but just to be clear, you cannot change the simulator or the
API even if you think there is a bug.
If you think there is a bug, let me know and if it needs fixing, I will release a new version of wumpus-mdp.nlogo.
What you have to hand in
You should hand in your NetLogo program and your report.
The file containing your program should be named: wumpus-mdp-
so my program would be named wumpus-parsons-simon.nlogo. Your report must be a PDF document, and should be named: wumpus-mdp-
The names you use must be the names under which you are registered for the module. Zip these two files together into a single ZIP archive and submit it through KEATS.
5 How your work will be marked
There will be four components of the mark for your work:
1. Functionality
As discussed above, the aim of your code is to provide two implementations of value iteration. I will assess the functionality of your code by looking at your code, running your code, and reading your evaluation of your code.
2. Style
There are no particular requirements on the way that your code is written this time, but it should follow standard good practice in software development.
3. Documentation
All good code is well documented, and your work will be partly assessed by the documentation you provide.
Netlogo offers two approaches to documentation, through the use of comments in the code, and through the use of the Info tab. The sample program I gave you is an example of what I consider to be acceptable documentation.
4. Results
In addition to looking at your experimental results to assess the functionality of your code, I will be looking to check that you did the experiments described above, that you analysed the data as required, and that you have drawn sensible conclusions from the experiments.
5
6 Remember
Netlogo is a free download. If you want to obtain a copy to run on your own computer, please go to: https://ccl.northwestern.edu/netlogo/download.shtml
and follow the instructions to download version 5.3.1.
Version 5.3.1 is the one that is running in the labs, and is the version that will be used to run yoru code for the assessed exercises.
7 Version list
• Version 1, October 27th 2016
6