程序代写代做代考 chain Reinforcement Learning

Reinforcement Learning

In-class tutorial:Worked examples
[DP, MC, basics of TD]

Subramanian Ramamoorthy
School of InformaFcs

17 January 2017

Plan for the Session

•  Problems chosen to illustrate concepts covered in earlier
lectures

•  We will work out problems on the board and take ques:ons
to clarify concepts

•  These slides provide the outline sketch of the ques:ons to be
covered

07/02/17 Reinforcement Learning 2

0. Interpreta:on of V and Q

07/02/17 Reinforcement Learning 3

Using the task of selecting
a club to play the game of
golf, discuss the meaning
of V and Q

What are:
-  States
-  Actions
-  Rewards

What do you understand
by the shape and numbers
in this figure?

I. Interpreta:on of Vπ and π

•  Cells = States
•  NSEW ac:ons resul:ng in

movement by 1 cell
•  Ac:ons taking agent off grid

have no effect but incur
reward of -1

•  All other ac:ons result in a
reward of 0
–  except those that move
the agent out of the special
states A and B.

07/02/17 Reinforcement Learning 4

Inspect and interpret Vπ

I. Interpreta:on of Vπ

07/02/17 Reinforcement Learning 5

I. Interpreta:on of V* and π*

07/02/17 Reinforcement Learning 6

Calculate and show that
Bellman’s equation holds
for centre state – to
understand nature of V*

Interpre:ng V: Cost-to-go

07/02/17 Reinforcement Learning 7

Finding the shortest path in a graph
using optimal substructure; a straight
line indicates a single edge; a wavy line
indicates a shortest path between two
vertices it connects (other nodes on
these paths are not shown); bold line
is the overall shortest path
from start to goal. [From Wikipedia]

Understanding the recursion:
If shortest path from LA to NY must
include Chicago, then shortest path
from LA to Chicago can be computed
separately from last leg.

II. Value/ Policy Itera:on
using Grid World

•  Calculate ini:al steps of Policy Evalua:on using a grid world
example seen in our earlier lectures

07/02/17 Reinforcement Learning 8

Vπ and Greedy π at k = 2

07/02/17 Reinforcement Learning 9

III. MC Value Evalua:on

•  Work out some steps of the MC value evalua:on process for
the 5-state Markov Chain example (for a random walker who
goes one step to the le] or right with equal probability)

07/02/17 Reinforcement Learning 10

IV. Understanding MC through modified
random walk

07/02/17 Reinforcement Learning 11

•  The transi:on probabili:es for state C are as shown. For all
other states, the transi:ons are based on a fair coin flip. The
square is an absorbing terminal state with reward as shown.

Perform some initial
steps of calculation of
Vπ using first-visit
MC.

Discuss MC with Exploring Starts, etc.

Exploring starts: Every state-action pair
has a non-zero probability of being the
starting pair

V. Cliff Walking: TD

07/02/17 Reinforcement Learning 12

Discuss SARSA and Q-learning
procedures with respect to this example