python代写 CMSC 421

CMSC 421, Fall 2018, Project 2:

Racetrack with Unreliable Steering Last update October 8, 2018

􏰀 Due date: Oct 29, 11:59pm
􏰀 Late date (10% off): Nov 1, 11:59pm

1

Racetrack problem with unreliable steering

Just as in Project 1,

• Current state si−1 = (pi−1, zi−1) location pi−1 = (xi−1, yi−1) velocity zi−1 = (ui−1, vi−1)

• You choose new velocity zi = (ui, vi), where

ui ∈ {ui−1 − 1, ui−1, ui−1 + 1}, vi ∈ {vi−1 − 1, vi−1, vi−1 + 1}.

What’s different
• control system introduces random error ei = (qi, ri)
• you might move to a different location than you had planned

20

obstac

le

10

p2

=

(5,

8

)

p1

p

0

((

24,8),

(26

,8))

fi

ni

sh line

0

1

0

2

0

3

0

2

Steering errors

20

obstac

le

10

p2

=

(5,

8

)

p1

p

0

((

24,8),

(26

,8))

fi

ni

sh line

0

1

0

2

0

3

0

• Suppose you choose zi =(ui,vi)

• Steering error ei =(qi,ri)

􏰀 qi,ri∈{−1,0,1} are independent

random variables

• If|ui|>1
then Pr[qi =0]=0.6 and Pr[qi =1]=Pr[qi =−1]=0.2

else Pr[qi =0]=1 and Pr[qi =1]=Pr[qi =−1]=0

• If|vi|>1
then Pr[ri =0]=0.6 and Pr[ri =1]=Pr[ri =−1]=0.2

else Pr[ri =0]=1 and Pr[ri =1]=Pr[ri =−1]=0

3

Example

  • The control error changes your position, but not your velocity

    􏰀 Unrealistic,butmakesthe problems easier to solve

  • State s2 = ((5, 8), (0, 2)) p2, z2
  • You choose
    z3 =z2 +(1,0)=(1,2)
  • Control error e3 = (0, 1)
  • Newlocationp3 =p2+z3+e3 =(5,8)+(1,2)+(0,1)=(6,11)
  • New state s3 = (p3, z3) = ((6, 11), (1, 2))

20

obstac

le

10

p2

=

(5,

8

)

p1

p

0

((

24,8),

(26

,8))

fi

ni

sh line

0

1

0

2

0

3

0

4

obstac

5)

le

0

(4,

start

1

0

((

24,8),

fi

ni

(26

sh line

,8))

2

0

3

0

Examples

20

10

Steering always works perfectly Pr = (0.36)10

le

0

p

obstac

0

1

p2

0

=(

((

5, 8)

24,8),

fi

ni

(26

sh line

,8))

2

0

3

0

p

obstac

0

le

((

24,8),

(26

,8))

0

1

0

fi

ni

sh line

2

0

3

0

20 20

10 10

Always pulls left, Pr = (0.12)10

Always pulls right, Pr = (0.12)10

5

• State s4 = ((7, 13), (2, 1)) p4, z4

Example

20

obstac

le

10

p2

=

(5,

8

)

p1

p

0

((

24,8),

(26

,8))

fi

ni

sh line

0

1

0

2

0

3

0

  • You choose
    z5 =z4 +(1,−1)=(3,0)
  • Crash if control error e5 is (1, 0) or (1, −1)

    􏰀 Pr=0.2(0.6)+0.2(0.2) = 0.16

  • Want to minimize the probability of crashing

􏰀 Ideally0,butthatisn’talwayspossible

• Also want to minimize the number of moves 􏰀 Useaweightedcombination

6

What you need to do

Write a Python function proj2.main(s, f, walls) to choose the next velocity 􏰀 UseUCT,butyouprobablyshouldmodifyit

See page 10 for ideas about possible modifications 􏰀 We’llgiveyouasupervisorprogram

At each turn i, it will call proj2.main(si−1, f, walls)
proj2.main should search for better and better choices for zi = (ui, vi)

– additional Monte Carlo rollouts

Should print each choice, followed by a linebreak, to a file called

       choices.txt
          (2, 2)
          (1, 3)
          (1, 2)
          (1, 2)

You may have it exit whenever you think is best

• •

7

The supervisor program

The supervisor will do the following:

• Initialization (optional, see below)

• loop:

􏰀 call proj2.main as a process, let it run for time limit seconds

􏰀 killproj2.mainifithasn’talreadyexited,anduseitslastchoiceforzi

􏰀 choose ei using the probability distribution discussed earlier

􏰀 computethenewstatesi,andexitiftherunhasfinished

if vehicle crashed, or reached the finish line with velocity (0, 0) 􏰀 elsecontinuetheloop

• Initialization: probably not needed unless you use a heuristic function

  • 􏰀  If your proj2 file contains a proj2.initialize function, then

    the supervisor will call it as a process, let it run for 10 seconds

  • 􏰀  Itshouldwritetoafile,elseitsworkwillbelost

8

UCT for Racetrack

global Seen ← ∅ // all states we’ve generated

UCT(s, d)
if s ∈ Sg then return 0
if d = 0 then return h(s) // depth limit = 0, so call a heuristic function
if s ∈/ Seen then do // {generated states}; you’ll need to add loop-checking

add s to Seen
t[s] ← 0 // how many moves tried at s for every velocity z ∈ applicable(s) do

Q[s,z] ← 0 // average cost of using z at s

t[s,z] ← 0 // how many times z was tried at s Untried ← {z ∈ applicable(s) | t[s, z] = 0}
if Untried ̸= ∅ then choose z′ at random from Untried else z′ ← arg minz∈applicable(s) Q[s, z] − k􏰕log t[s]/t[s, z] e ← steering error, computed as described earlier
if s, z′, e cause a crash then cost ← a (large?) penalty else cost ← 1 + UCT(s′, d − 1)
Q[s, z′] ← (t[s, z′] ∗ Q[s, z′] + cost)/(1 + t[s, z′])
t[s] ← t[s] + 1
t[s, z′] ← t[s, t′] + 1
return cost

// usually k ≥ 2

// average cost of z′ at s

9

Possible Modifications to UCT

• Below are a few ideas; you can probably think of others
􏰀 ObjectiveistoachievegoodperformanceinRacetrackproblems

• Maybe put a loop in the code, to try several z′ values rather than just 1

• Experiment with various values of k ≥ 2, to see what works best

• What penalty to use for crashes?
􏰀 Probablyshouldn’talwaysbethesame;somecrashesworsethanothers 􏰀 Maybecomputeheuristically?

10

What to Submit

• One file, proj2.py
􏰀 Inthefile,yourfunctionshouldbenamedmain

􏰀 Inthedocstringatthestartofthefile,includeacopyofthehonorpledge:

     I pledge on my honor that I have not given or received
     any unauthorized assistance on this project.
     <your name>

• Submit it at the submit server, submit.cs.umd.edu

11

Grading

35% correctness: – whether your heuristic works correctly, whether your submission follows the instructions

15% programming style – see the following

Style guide: https://www.python.org/dev/peps/pep-0008/ Python essays: https://www.python.org/doc/essays/

15% documentation

Docstrings at the start of the file and the start of each function Comments elsewhere

35% performance

running time, length of solution path, how many times you crash Top n performers (n ≈ 5) will get extra credit

• •

• •

• •

12