CMSC 421, Fall 2018, Project 2:
Racetrack with Unreliable Steering Last update October 8, 2018
Due date: Oct 29, 11:59pm
Late date (10% off): Nov 1, 11:59pm
1
Racetrack problem with unreliable steering
Just as in Project 1,
• Current state si−1 = (pi−1, zi−1) location pi−1 = (xi−1, yi−1) velocity zi−1 = (ui−1, vi−1)
• You choose new velocity zi = (ui, vi), where
ui ∈ {ui−1 − 1, ui−1, ui−1 + 1}, vi ∈ {vi−1 − 1, vi−1, vi−1 + 1}.
What’s different
• control system introduces random error ei = (qi, ri)
• you might move to a different location than you had planned
20 |
||||||||||||||||||||||||||||||||||
obstac |
le |
|||||||||||||||||||||||||||||||||
10 |
||||||||||||||||||||||||||||||||||
p2 |
= |
(5, |
8 |
) |
||||||||||||||||||||||||||||||
p1 |
||||||||||||||||||||||||||||||||||
p |
||||||||||||||||||||||||||||||||||
0 |
(( |
24,8), |
(26 |
,8)) |
||||||||||||||||||||||||||||||
fi |
ni |
sh line |
||||||||||||||||||||||||||||||||
0 |
1 |
0 |
2 |
0 |
3 |
0 |
2
Steering errors
20 |
||||||||||||||||||||||||||||||||||
obstac |
le |
|||||||||||||||||||||||||||||||||
10 |
||||||||||||||||||||||||||||||||||
p2 |
= |
(5, |
8 |
) |
||||||||||||||||||||||||||||||
p1 |
||||||||||||||||||||||||||||||||||
p |
||||||||||||||||||||||||||||||||||
0 |
(( |
24,8), |
(26 |
,8)) |
||||||||||||||||||||||||||||||
fi |
ni |
sh line |
||||||||||||||||||||||||||||||||
0 |
1 |
0 |
2 |
0 |
3 |
0 |
• Suppose you choose zi =(ui,vi)
• Steering error ei =(qi,ri)
qi,ri∈{−1,0,1} are independent
random variables
• If|ui|>1
then Pr[qi =0]=0.6 and Pr[qi =1]=Pr[qi =−1]=0.2
else Pr[qi =0]=1 and Pr[qi =1]=Pr[qi =−1]=0
• If|vi|>1
then Pr[ri =0]=0.6 and Pr[ri =1]=Pr[ri =−1]=0.2
else Pr[ri =0]=1 and Pr[ri =1]=Pr[ri =−1]=0
3
Example
- The control error changes your position, but not your velocity
Unrealistic,butmakesthe problems easier to solve
- State s2 = ((5, 8), (0, 2)) p2, z2
- You choose
z3 =z2 +(1,0)=(1,2) - Control error e3 = (0, 1)
- Newlocationp3 =p2+z3+e3 =(5,8)+(1,2)+(0,1)=(6,11)
- New state s3 = (p3, z3) = ((6, 11), (1, 2))
20 |
||||||||||||||||||||||||||||||||||
obstac |
le |
|||||||||||||||||||||||||||||||||
10 |
||||||||||||||||||||||||||||||||||
p2 |
= |
(5, |
8 |
) |
||||||||||||||||||||||||||||||
p1 |
||||||||||||||||||||||||||||||||||
p |
||||||||||||||||||||||||||||||||||
0 |
(( |
24,8), |
(26 |
,8)) |
||||||||||||||||||||||||||||||
fi |
ni |
sh line |
||||||||||||||||||||||||||||||||
0 |
1 |
0 |
2 |
0 |
3 |
0 |
4
obstac
5)
le
0
(4,
start
1
0
((
24,8),
fi
ni
(26
sh line
,8))
2
0
3
0
Examples
20
10
Steering always works perfectly Pr = (0.36)10
le
0
p
obstac
0
1
p2
0
=(
((
5, 8)
24,8),
fi
ni
(26
sh line
,8))
2
0
3
0
p
obstac
0
le
((
24,8),
(26
,8))
0
1
0
fi
ni
sh line
2
0
3
0
20 20
10 10
Always pulls left, Pr = (0.12)10
Always pulls right, Pr = (0.12)10
5
• State s4 = ((7, 13), (2, 1)) p4, z4
Example
20 |
||||||||||||||||||||||||||||||||||
obstac |
le |
|||||||||||||||||||||||||||||||||
10 |
||||||||||||||||||||||||||||||||||
p2 |
= |
(5, |
8 |
) |
||||||||||||||||||||||||||||||
p1 |
||||||||||||||||||||||||||||||||||
p |
||||||||||||||||||||||||||||||||||
0 |
(( |
24,8), |
(26 |
,8)) |
||||||||||||||||||||||||||||||
fi |
ni |
sh line |
||||||||||||||||||||||||||||||||
0 |
1 |
0 |
2 |
0 |
3 |
0 |
- You choose
z5 =z4 +(1,−1)=(3,0) - Crash if control error e5 is (1, 0) or (1, −1)
Pr=0.2(0.6)+0.2(0.2) = 0.16
- Want to minimize the probability of crashing
Ideally0,butthatisn’talwayspossible
• Also want to minimize the number of moves Useaweightedcombination
6
What you need to do
Write a Python function proj2.main(s, f, walls) to choose the next velocity UseUCT,butyouprobablyshouldmodifyit
See page 10 for ideas about possible modifications We’llgiveyouasupervisorprogram
At each turn i, it will call proj2.main(si−1, f, walls)
proj2.main should search for better and better choices for zi = (ui, vi)
– additional Monte Carlo rollouts
Should print each choice, followed by a linebreak, to a file called
choices.txt (2, 2)
(1, 3) (1, 2) (1, 2)
You may have it exit whenever you think is best
•
• •
•
•
7
The supervisor program
The supervisor will do the following:
• Initialization (optional, see below)
• loop:
call proj2.main as a process, let it run for time limit seconds
killproj2.mainifithasn’talreadyexited,anduseitslastchoiceforzi
choose ei using the probability distribution discussed earlier
computethenewstatesi,andexitiftherunhasfinished
if vehicle crashed, or reached the finish line with velocity (0, 0) elsecontinuetheloop
• Initialization: probably not needed unless you use a heuristic function
- If your proj2 file contains a proj2.initialize function, then
the supervisor will call it as a process, let it run for 10 seconds
- Itshouldwritetoafile,elseitsworkwillbelost
•
8
UCT for Racetrack
global Seen ← ∅ // all states we’ve generated
UCT(s, d)
if s ∈ Sg then return 0
if d = 0 then return h(s) // depth limit = 0, so call a heuristic function
if s ∈/ Seen then do // {generated states}; you’ll need to add loop-checking
add s to Seen
t[s] ← 0 // how many moves tried at s for every velocity z ∈ applicable(s) do
Q[s,z] ← 0 // average cost of using z at s
t[s,z] ← 0 // how many times z was tried at s Untried ← {z ∈ applicable(s) | t[s, z] = 0}
if Untried ̸= ∅ then choose z′ at random from Untried else z′ ← arg minz∈applicable(s) Q[s, z] − klog t[s]/t[s, z] e ← steering error, computed as described earlier
if s, z′, e cause a crash then cost ← a (large?) penalty else cost ← 1 + UCT(s′, d − 1)
Q[s, z′] ← (t[s, z′] ∗ Q[s, z′] + cost)/(1 + t[s, z′])
t[s] ← t[s] + 1
t[s, z′] ← t[s, t′] + 1
return cost
// usually k ≥ 2
// average cost of z′ at s
9
Possible Modifications to UCT
• Below are a few ideas; you can probably think of others
ObjectiveistoachievegoodperformanceinRacetrackproblems
• Maybe put a loop in the code, to try several z′ values rather than just 1
• Experiment with various values of k ≥ 2, to see what works best
• What penalty to use for crashes?
Probablyshouldn’talwaysbethesame;somecrashesworsethanothers Maybecomputeheuristically?
10
What to Submit
• One file, proj2.py
Inthefile,yourfunctionshouldbenamedmain
Inthedocstringatthestartofthefile,includeacopyofthehonorpledge:
I pledge on my honor that I have not given or received any unauthorized assistance on this project. <your name>
• Submit it at the submit server, submit.cs.umd.edu
11
Grading
35% correctness: – whether your heuristic works correctly, whether your submission follows the instructions
15% programming style – see the following
Style guide: https://www.python.org/dev/peps/pep-0008/ Python essays: https://www.python.org/doc/essays/
15% documentation
Docstrings at the start of the file and the start of each function Comments elsewhere
35% performance
running time, length of solution path, how many times you crash Top n performers (n ≈ 5) will get extra credit
• •
• •
• •
12