COMP3702 Artificial Intelligence (Semester 2, 2021)
Assignment 2: DragonGame MDP
Key information:
• Due: 4pm, Friday 1 October (Extended by 1 week from previous date of 24 September)
• This assignment will assess your skills in developing algorithms for solving MDPs.
• Assignment 2 contributes 15% to your final grade.
• This assignment consists of two parts: (1) programming and (2) a report.
• This is an individual assignment.
• Both code and report are to be submitted via Gradescope (https://www.gradescope.com/). You
can find instructions on how to register for the COMP3702 Gradescope site on Blackboard.
• Your program (Part 1, 60/100) will be graded using the Gradescope code autograder, using testcases
similar to those in the support code provided at https://gitlab.com/3702-2021/a2-support.
• Your report (Part 2, 40/100) should fit the template provided, be in .pdf format and named according
to the format a2-COMP3702-[SID].pdf, where SID is your student ID. Reports will be graded by the
teaching team.
The DragonGame AI Environment
“Untitled Dragon Game”1 or simply DragonGame, is a 2.5D Platformer game in which the player must
collect all of the gems in each level and reach the exit portal, making use of a jump-and-glide movement
mechanic, and avoiding landing on lava tiles. DragonGame is inspired by the “Spyro the Dragon” game
series from the original PlayStation. For this assignment, there are some changes to the game environment
indicated in pink font. In Assignment 2, actions may have non-deterministic outcomes!
To optimally solve a level, your AI agent must find a policy (mapping from states to actions) which collects
all gems and reaches the exit while incurring the minimum possible expected action cost.
Levels in DragonGame are composed of a 2D grid of tiles, where each tile contains a character representing
the tile type. An example game level is shown in Figure 1.
XXXXXXXXXXXXXXXXXXXX
XX XXXXXX
X G XXXX
XXX=X XX
X = G X
X = XX= X
X = G = X
X=XXXXXX = X
X= X = G X
X= X = XXXX
X= =X = XX X
X=XX G =XX = EX
X=PXXX XXXXXXX =X
XXXXXX**XXXXXXX**XXX
Figure 1: Example game level of DragonGame
1The full game title was inspired by Untitled Goose Game, an Indie game developed by some Australians in 2019
1
https://www.gradescope.com/
https://gitlab.com/3702-2021/a2-support
https://goose.game
COMP3702 Assignment 2: DragonGame MDP
Game state representation
Each game state is represented as a character array, representing the tiles and their position on the board. In
the visualizer and interactive sessions, the tile descriptions are triples of characters, whereas in the input file
these are single characters.
Levels can contain the tile types described in Table 1.
Table 1: Table of tiles in DragonGame, their corresponding symbol and effect
Tile Symbol in
Input File
Symbol in
Visualiser
Effect
Solid ‘X’ ‘XXX’
The player cannot move into a Solid tile. Walk and jump
actions are valid when the player is directly above a Solid tile.
Ladder ‘=’ ‘===’
The player can move through Ladder tiles. Walk, jump, glide
and drop actions are all valid when the player is directly above
a Ladder tile. When performing a walk action on top of a
ladder tile, there is a small chance to fall downwards by 2 tiles
(when the position 2 tiles below is non-solid).
Air ‘ ’ ‘ ’
The player can move through Air tiles. Glide and drop actions
are all valid when the player is directly above an Air tile.
Lava ‘*’ ‘***’
Moving into a Lava tile or a tile directly above a Lava tile
results in Game Over.
Super
Jump
‘J’ ‘[J]’
When the player performs a ‘Jump’ action while on top of a
Super Jump tile, the player will move upwards by between 2
and 5 tiles (random outcome – probabilities given in input file).
If blocked by a solid tile, probabilities for moving to each of
the blocked tiles roll over to the furthest non-blocked tile. For
all other actions, Super Jump tiles behave as ‘Solid’ tiles.
Super
Charge
‘C’ ‘[C]’
When the player performs a ‘Walk Left’ or ‘Walk Right’ ac-
tion while on top of a Super Charge tile, the player will move
(left or right) until on top of the last adjoining Super Charge
tile (see example in Table 2), then move in the same direc-
tion by between 2 and 5 tiles (random outcome – probabilities
given in input file). If blocked by a solid tile, probabilities for
moving to each of the blocked tiles roll over to the furthest
non-blocked tile. For all other actions, Super Charge tiles be-
have as ‘Solid’ tiles.
Gem ‘G’ ‘ G ’
Gems are collected (disappearing from view) when the player
moves onto the tile containing the gem. The player must
collect all gems in order to complete the level. Gem tiles
behave as ‘Air’ tiles, and become ‘Air’ tiles after the gem is
collected.
Exit ‘E’ ‘ E ’
Moving to the Exit tile after collecting all gems completes the
level. Exit tiles behave as ‘Air’ tiles.
Player ‘P’ ‘ P ’
The player starts at the position in the input file where this
tile occurs. The player always starts on an ‘Air’ tile. In the
visualiser, the type of tile behind the player is shown in place
of the spaces.
Page 2
COMP3702 Assignment 2: DragonGame MDP
An example of performing the Walk Right action on a Super Charge tile is shown in Table 2:
Table 2: Example Walk Right action on a Super Charge tile
P >>> >>> >>> (P) (P) (P) (P)
XXX [C] [C] [C] [C] XXX XXX XXX XXX XXX
‘P’ represents the current player position, ‘>>>’ represents tiles which are skipped over, and each (P)
represents a possible new position of the player.
Actions
At each time step, the player is prompted to select an action. Each action has an associated cost, representing
the amount of energy used by performing that action. Each action also has requirements which must be
satisfied by the current state in order for the action to be valid. The set of available actions, costs and
requirements for each action are shown in Table 3.
Table 3: Table of available actions, costs and requirements
Action Symbol Cost Description Validity Requirements
Walk Left wl 1.0
Move left by 1 position; if above
a Ladder tile, random chance to
move down by 2; if above a Su-
per Charge tile, move a variable
amount (see example in Table 2)
Walk Right wr 1.0
Move right by 1 position; if above
a Ladder tile, random chance to
move down by 2; if above a Su-
per Charge tile, move a variable
amount (see example in Table 2)
Jump j 2.0
Move up by 1 position; if above a
Super Jump tile, move up by be-
tween 2 and 5 (random outcome)
Current player must be above a
Solid or Ladder tile, and new player
position must not be a Solid tile.
Glide Left 1 gl1 0.7
Move left by between 0 and 2 (ran-
dom outcome) and down by 1.
Glide Left 2 gl2 1.0
Move left by between 1 and 3 (ran-
dom outcome) and down by 1
Glide Left 3 gl3 1.2
Move left by between 2 and 4 (ran-
dom outcome) and down by 1
Glide Right 1 gr1 0.7
Move right by between 0 and 2
(random outcome) and down by 1
Glide Right 2 gr2 1.0
Move right by between 1 and 3
(random outcome) and down by 1
Glide Right 3 gr3 1.2
Move right by between 2 and 4
(random outcome) and down by 1
Current player must be above a
Ladder or Air tile, and all tiles in the
axis aligned rectangle enclosing both
the current position and new
position must be non-solid (i.e. Air
or Ladder tile). See example below.
Drop 1 d1 0.3 Move down by 1
Drop 2 d2 0.4 Move down by 2
Drop 3 d3 0.5 Move down by 3
Current player must be above a
Ladder or Air tile, and all cells in the
line between the current position
and new position must be non-solid
(i.e. Air or Ladder tile).
Page 3
COMP3702 Assignment 2: DragonGame MDP
Example of glide action validity requirements for GLIDE RIGHT 2 (‘gr2’):
Current Position Must be Non-Solid Must be Non-Solid
Must be Non-Solid Must be Non-Solid New Position
Interactive mode
A good way to gain an understanding of the game is to play it. You can play the game to get a feel for how
it works by launching an interactive game session from the terminal with the following command:
$ python play_game.py
where
In interactive mode, type the symbol for your chosen action and press enter to perform the action. Type ‘q’
and press enter to quit the game.
DragonGame as an MDP
In this assignment, you will write the components of a program to play DragonGame, with the objective
of finding a high-quality solution to the problem using various sequential decision-making algorithms based
on the Markov decision process (MDP) framework. This assignment will test your skills in defining a MDP
for a practical problem and developing effective algorithms for large MDPs.
What is provided to you
We will provide supporting code in Python only, in the form of:
1. A class representing DragonGame game map and a number of helper functions
2. A parser method to take an input file (testcase) and convert it into a DragonGame map
3. A policy visualiser
4. A simulator script to evaluate the performance of your solution
5. Testcases to test and evaluate your solution
6. A solution file template
The support code can be found at: https://gitlab.com/3702-2021/a2-support. Autograding of code
will be done through Gradescope, so that you can test your submission and continue to improve it based on
this feedback — you are strongly encouraged to make use of this feedback.
Page 4
https://gitlab.com/3702-2021/a2-support
COMP3702 Assignment 2: DragonGame MDP
Your assignment task
Your task is to develop algorithms for computing paths (series of actions) for the agent (i.e. the Dragon), and
to write a report on your algorithms’ performance. You will be graded on both your submitted program (Part
1, 60%) and the report (Part 2, 40%). These percentages will be scaled to the 15% course weighting for
this assessment item.
The provided support code formulates DragonGame as an MDP, and your task is to submit code imple-
menting the following MDP algorithms:
1. Value Iteration (VI)
2. Policy Iteration (PI)
3. Monte Carlo Tree Search (MCTS) [optional]
Individual testcases will not impose the use of a particular strategy (value iteration/policy iteration/MCTS),
but the difficulty of higher level testcases will be designed to require a more advanced solution (e.g. linear
algebra policy iteration or MCTS). Each testcase will specify different amounts of time available for use for
offline planning and for online planning, so these can be used to encourage your choice of solution e.g. towards
choosing offline or online solving for a particular testcase. You can choose which search approach to use in
your code at runtime if desired (e.g. by checking the given offline time and online time, and choosing offline
planning if offline time � online time, online planning otherwise).
Once you have implemented and tested the algorithms above, you are to complete the questions listed in the
section “Part 2 – The Report” and submit the report to Gradescope.
More detail of what is required for the programming and report parts are given below.
Part 1 — The programming task
Your program will be graded using the Gradescope autograder, using testcases similar to those in the support
code provided at https://gitlab.com/3702-2021/a2-support.
Interaction with the testcases and autograder
We now provide you with some details explaining how your code will interact with the testcases and the
autograder (with special thanks to Nick Collins for his efforts making this work seamlessly).
Implement your solution using the supplied solution.py Template file. You are required to fill in the following
method stubs:
• init (game env)
• plan offline()
• select action()
You can add additional helper methods and classes (either in solution.py or in files you create) if you wish.
To ensure your code is handled correctly by the autograder, you should avoid using any try-except blocks in
your implementation of the above methods (as this can interfere with our time-out handling). Refer to the
documentation in solution.py for more details.
Grading rubric for the programming component (total marks: 60/100)
For marking, we will use 8 different testcases of ascending level of difficulty to evaluate your solution.
There will be a total of 60 code marks, consisting of:
Page 5
https://gitlab.com/3702-2021/a2-support
COMP3702 Assignment 2: DragonGame MDP
• 20 Threshold Marks
– Program runs without errors (+5 marks)
– Program solves at least 1 testcase within 2x time limit (+7.5 marks)
– Program solves at least 2 testcases within 2x time limit (+7.5 marks)
• 40 Testcase Marks
– 8 testcases worth 5 marks each
– A maximum of 5 marks for each testcase, with deductions for taking more than the time limit or
solution having higher than optimal cost (CostTgt), proportional to the amount exceeded
– For each test case:
mark = 5.0×
[
1.0−
clip(Cost− CostTgt, 0)
CostTgt
−
clip(Time− TimeLimit, 0)
TimeLimit
]
– Program will be terminated after 2× time limit has elapsed
Part 2 — The report
The report tests your understanding of MDP algorithms and the methods you have used in your code, and
contributes 40/100 of your assignment mark.
Question 1. MDP problem definition (15 marks)
a) Define the State space, Action space, Transition function, and Reward function components of the
DragonGame MDP as well as where these are represented in your code. (10 marks)
b) Describe the purpose of a discount factor in MDPs, and if you are using a discount factor, γ ∈ [0, 1),
please justify how you selected its value. (2.5 marks)
c) State what the following dimensions of complexity are of your agent in the DragonGame MDP. (See
https://artint.info/html/ArtInt_12.html for definitions) (2.5 marks)
• Planning Horizon
• Sensing Uncertainty
• Effect Uncertainty
• Computational Limits
• Learning
Question 2. Comparison of algorithms and optimisations (15 marks)
a) Consider a representative testcase (or set of testcases) and compare the performance of Value Iteration
(VI) and Policy Iteration (PI) according to the following measures. Please include the numerical values
of your experiments in your comparisons.
• Time to converge to the solution. (4 marks)
• Number of iterations to converge to the solution. (4 marks)
b) Describe any optimisation procedures you have used to speed up performance or trade-off online vs
offline time limits e.g. a linear algebra policy iteration solution; utilising online algorithms such as
Monte Carlo Tree Search (MCTS). If you were unable to implement any of these, describe how such
approaches could speed up performance or trade-off the offline/online time limits. (7 marks)
Page 6
https://artint.info/html/ArtInt_12.html
COMP3702 Assignment 2: DragonGame MDP
Question 3. “Risky Business”: Optimal policy variation based on probabilities & costs (10 marks)
One consideration in the solution of a Markov Decision Process (i.e. the optimal policy) is the trade off between
a risky higher reward vs a lower risk lower reward, which depends on the probabilities of non-deterministic
dynamics of the environment and the rewards associated with certain states and actions.
Consider testcase a2-t6.txt, and explore how the policy of the agent changes with ladder fall prob and
game over penalty. The agent must cross from the bottom left (to collect a gem) to the bottom right (to
reach the exit). There are 3 possible paths from left to right (bottom, middle and top). Because of the
chance to fall when performing a walk action while on top of a ladder, there is a risk of falling into the lava
in the centre.
• The bottom path has the lowest cost (fewest jumps needed) and highest risk, as the lava tiles above
prevent using jump+glide (which is more expensive than walking but has no fall risk).
• The middle path also has lava tiles above preventing jump+glide, but if the agent falls once it can
glide and land in a safe part of the bottom path for a second chance at making it across.
• The top path has enough headroom to allow jump+glide (which eliminates the risk of falling), but
requires a large number of jumps to reach, so is expensive.
The level of risk presented by the lower paths can be tuned by adjusting the ladder fall prob (a higher value
favours the safest path) and the game over penalty.
Hint: This question can be completed using VI. If necessary, feel free to increase the offline time limit of
the test case when answering this question.
a) Fix game over penalty = 500 and vary ladder fall prob = {0.001, 0.01, 0.1}. For each value of
ladder fall prob, is the agent’s optimal policy to take the safe top path or the risky bottom path (or
something else)? (2.5 marks)
b) Fix ladder fall prob = 0.01 and vary game over penalty = {100, 500, 750, 1000}. For each value of
game over penalty, is the agent’s optimal policy to take the safe top path or the risky bottom path
(or something else)? (2.5 marks)
c) Generate a plot of the (game over penalty, ladder fall prob) indicating whether the agent’s optimal
policy is to traverse the top, middle or bottom path, using colours to denote which path is optimal for
each combination. (5 marks)
Page 7
COMP3702 Assignment 2: DragonGame MDP
Academic Misconduct
The University defines Academic Misconduct as involving “a range of unethical behaviours that are designed
to give a student an unfair and unearned advantage over their peers.” UQ takes Academic Misconduct very
seriously and any suspected cases will be investigated through the University’s standard policy (https://
ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct). If you are found guilty,
you may be expelled from the University with no award.
It is the responsibility of the student to ensure that you understand what constitutes Academic Misconduct
and to ensure that you do not break the rules. If you are unclear about what is required, please ask.
It is also the responsibility of the student to take reasonable precautions to guard against unauthorised access
by others to his/her work, however stored in whatever format, both before and after assessment.
In the coding part of COMP3702 assignments, you are allowed to draw on publicly-accessible resources and
provided tutorial solutions, but you must make reference or attribution to its source, by doing the following:
• All blocks of code that you take from public sources must be referenced in adjacent comments in your
code.
• Please also include a list of references indicating code you have drawn on in your solution.py docstring.
However, you must not show your code to, or share your code with, any other student under any
circumstances. You must not post your code to public discussion forums (including Ed Discussion)
or save your code in publicly accessible repositories (check your security settings). You must not
look at or copy code from any other student.
All submitted files (code and report) will be subject to electronic plagiarism detection and misconduct proceed-
ings will be instituted against students where plagiarism or collusion is suspected. The electronic plagiarism
detection can detect similarities in code structure even if comments, variable names, formatting etc. are
modified. If you collude to develop your code or answer your report questions, you will be caught.
For more information, please consult the following University web pages:
• Information regarding Academic Integrity and Misconduct:
– https://my.uq.edu.au/information-and-services/manage-my-program/student-integrity-and-
conduct/academic-integrity-and-student-conduct
– http://ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct
• Information on Student Services:
– https://www.uq.edu.au/student-services/
Late submission
Students should not leave assignment preparation until the last minute and must plan their workloads to meet
advertised or notified deadlines. It is your responsibility to manage your time effectively.
Late submission of the assignment will not be accepted. Unless advised, assessment items received after the
due date will receive a zero mark unless you have been approved to submit the assessment item after the due
date.
In the event of exceptional circumstances, you may submit a request for an extension. You can find guide-
lines on acceptable reasons for an extension here https://my.uq.edu.au/information-and-services/manage-my-
program/exams-and-assessment/applying-extension All requests for extension must be submitted on the UQ
Application for Extension of Progressive Assessment form at least 48 hours prior to the submission deadline.
Page 8
https://ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct
https://ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct
https://my.uq.edu.au/information-and-services/manage-my-program/student-integrity-and-conduct/academic-integrity-and-student-conduct
https://my.uq.edu.au/information-and-services/manage-my-program/student-integrity-and-conduct/academic-integrity-and-student-conduct
http://ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct
https://www.uq.edu.au/student-services/
https://my.uq.edu.au/information-and-services/manage-my-program/exams-and-assessment/applying-extension
https://my.uq.edu.au/information-and-services/manage-my-program/exams-and-assessment/applying-extension
https://my.uq.edu.au/node/218/2
https://my.uq.edu.au/node/218/2