Outline
Dynamic Programming
A method for Multi-Stage Decision Making
• Basics of dynamic programming • Application examples
4/23/2018
@2017 New York University Tandon School of Engineering
406
Richard Bellman
Bellman’s Principle of Optimality
PO: An optimal policy has the property that whatever the initial state and the initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
4/23/2018 @2017 New York University Tandon 407 School of Engineering
4/23/2018
@2017 New York University Tandon
408
Illustrative Figure of PO
Optimal trajectory cost
x(tf )
x(t)
J*(t0,x(t0))
x(t0 ) School of Engineering
Illustrative Example of A Routing Network
4/23/2018
School of Engineering
409
d
k h
8 4b26
9 11
3 g 16
9 5 e 10i
a
7c47
l 14f 6
12j 8 @2017 New York University Tandon
m
Illustrative Example of A Routing Network (cont’d)
Problem:
Find the minimum-cost path that starts at node a and ends at any of the nodes k, l, m
4/23/2018 @2017 New York University Tandon 410 School of Engineering
Illustrative Example of A Routing Network (cont’d)
We will find the optimal path by performing a backward pass through the network.
Let J * be the minimum cost from a generic node q
q to any one of three possible final nodes. Then, J* 16, J* 8, J* 7, J* 6.
ghij
The next stage yields
J* min 3J*, 11J* min 19, 19 19.
dgh
4/23/2018 @2017 New York University Tandon 411 School of Engineering
Illustrative Example of A Routing Network (cont’d)
Thus, we have two optimal paths from node d: dgk and dhk. Similarly,
J* min 6J*, 10J*
min 68, 107 14
ehi and
J* min 4J*, 12J*
min 47, 126 11.
fij
Therefore, the optimal path emanating from node e
is ehk, and the path from node f is fil.
4/23/2018 @2017 New York University Tandon 412 School of Engineering
Illustrative Example of A Routing Network (cont’d)
The next stage yields
J* min 9J*, 2J*
min 919, 214 16
bde and
J* min 5 J*, 14 J*
min 514, 1411 19.
cef
Therefore, the optimal path emanating from node b is behk, and the path from node c is cehk.
4/23/2018 @2017 New York University Tandon 413 School of Engineering
Illustrative Example of A Routing Network (cont’d)
Finally, the initial, and thus final, stage yields
J* min 4J*, 7J* min 416, 719 20.
abc
Hence, the optimal, minimum total cost route from node a to any one of the three nodes k, l, m is abehk, and the total cost of this path is 20.
4/23/2018 @2017 New York University Tandon 414 School of Engineering
Comment:
In a nutshell,
“Dynamic Programming” is a technique for solving a complex problem in such a way that the problem is broken into several simpler steps.
4/23/2018 @2017 New York University Tandon 415 School of Engineering
4/23/2018
4573
@2017 New York University Tandon
416
1
43266 10
2751
413 343
9
4
Exercise
Find the shortest path from node 1 to node 10 in the following network. Also, find the shortest path from node 3 to node 10.
264 3383
School of Engineering
Solution to the “Blood Diamond” Problem
(97, 0, 1, 2, 0) or (97, 0, 1, 0, 2)
Do you know why?
4/23/2018 @2017 New York University Tandon 417 School of Engineering
Detailed Analysis based on PO
• IfNos.1‐3die,itremainsonlyNo.4andNo.5.
Then, No. 5 will vote against whatever No.4 proposes
to earn all diamonds.
So, the only option for No. 4 is to vote for No.3.
• AssoonasNo.3knowsthispoint,hewillpropose (100, 0, 0) to earn all diamonds while No.4 has to vote “yes” in order to survive.
4/23/2018 @2017 New York University Tandon 418 School of Engineering
Detailed Analysis based on PO
• As soon as No.2 knows this point, he will propose (98, 0, 1, 1) to win the support from No.4 and No.5, while giving up on No.3
• When No. 1 is aware of the optimal strategy of No.2, he will give up on No.2 and then propose (97, 0, 1, 2, 0) or (97, 0, 1, 0, 2) to win 3 votes including his own vote, the majority votes!!
4/23/2018 @2017 New York University Tandon 419 School of Engineering
Application to a Control Problem
Find a sequence of control actions {u(k)} to minimize the performance index
N 1 kk0
J (N,x(N))
x(k1) f(k,x(k),u(k)), x(k0)x0.
F(k,x(k),u(k))
where x(k) is the solution of the discrete-time system:
4/23/2018 @2017 New York University Tandon 420 School of Engineering
4/23/2018
@2017 New York University Tandon 421 School of Engineering
A Control Problem (cont’d)
Let J * be the minimum cost of transferring the system from k
x(k)totheterminalstatex(N).Clearly,J* (N,x(N)). N
Using the PO, we obtain J* (x(N 1))
N1
min F(N1,x(N1),u(N1))J*(x(N))
N
minF(N1,x(N1),u(N1))(N,x(N))
u(N1) u(N1)
min{F(N1,x(N1),u(N1)) u(N1)
( N , f ( N 1, x ( N 1) , u ( N 1) ) ) } .
A Control Problem (cont’d)
A recursive expression of this backward pass is (k0 k N ): J*(x(k)) min F(k,x(k),u(k))J* (x(k1))
k u(k) k1
k1
min F(k,x(k),u(k))J* (f(k,x(k),u(k)) .
u(k)
The backward pass is completed when k k0 .
Sincex0 xk0isknown,wefinduu(k0)intermsofx0. Then, proceed with the forward pass:
x(k), u(k) x(k 1), u(k 1)
4/23/2018 @2017 New York University Tandon 422 School of Engineering
Consider a scalar discrete-time system x(k1)2x(k)3u(k), x(0)4,
and the performance index 211
An Exercise
J x(2)10 x2(k)u2(k) .
2 k0
Goal: findtheoptimalu*=(u(0), u(1))thatminimizesJ.
4/23/2018 @2017 New York University Tandon 423 School of Engineering
Solution
u(0) = 2.0153, u(1) = ‐1.9237.
4/23/2018 @2017 New York University Tandon 424 School of Engineering
Allocation Problem: the scalar case
4/23/2018
@2017 New York University Tandon 425 School of Engineering
We want to maximize a noninteracting performance index
subject to
K
g (x ) b, g (x ) 0, x . iiiii
K
P f (x )
ii i1
i1
Solution based on PO
Foreaseofnotation,introducetheresourcevariableK : K
g(x) b
iiK i1
and define
F ( ) max f (x ) f(x).
KK KK
x ,,x 1K
K1
ii
i1
Also, define the policy variable x ( ) which maximizes ˆK K
theaboveF ( )foragiven .Particularly,x (b)x* . KK K ˆKK
4/23/2018 @2017 New York University Tandon 426 School of Engineering
Solution based on PO (cont’d)
F ( )f (x ( )) max
K1
KK KˆKK K1
x ,,x
1 K1
f(x) ii
ˆ subjectto g(x) g x ( ).
i1 Therefore,
K1 F()f(x())F {g x()}
K K K ˆK K K1 K K ˆK K
or, equivalently,
F ( )max f (x )F g (x )
K K xK K K K1 K K K subjectto gK(xK)K.
4/23/2018 @2017 New York University Tandon 427 School of Engineering
i1
iiKKKK
F K1 K1
Solution based on PO (cont’d)
In brief, the K -stage problem reduces to a simpler (K 1)-stage problem:
K 1 F()max f(x)
K1 K1
ii 1 K1
subjectto
i1 K 1
4/23/2018
@2017 New York University Tandon 428 School of Engineering
x ,,x
g(x) g (x ) .
i i K K K K1 i1
Solution based on PO (cont’d)
In general, the following recurrence relation holds:
F()maxf(x)F g(x)
k k xk k k k1 k k k subject to
g ( x ) , k 1, 2 , , K , F ( ) 0 . kkk 00
So, the original K-stage problem is replaced by K (easier) one-stage decision problems.
4/23/2018 @2017 New York University Tandon 429 School of Engineering
4/23/2018
@2017 New York University Tandon 430 School of Engineering
Allocation Problem: the vector case
K
max P f (x )
ii i1
K
subjectto g (x )b, i1,2,,n.
ij j i j1
Or, in matrix notation, K T
g(x)bbbb .
jj12n j1
Allocation Problem: the vector case (Cont’d)
Like the scalar case, we obtain the n-state counterpart of the recurrence relation:
F b max f(x)F bg(x)
kk xk kk k1k kk k1,2,,K, withconstraints:
T x 0, g (x )b , ,,
k k k k 1k 2k nk
4/23/2018 @2017 New York University Tandon 431 School of Engineering
Computational Complexity
Note that the argument b of F is a n-th order vector.
k1 k1
If each component of b may assume m distinct values,
k1
the potential total number N of values of F which
k1
n 10
needbestoredism .(Whenm100andn5,N10 !) Bellman’s “Curse of Dimensionality” !
4/23/2018 @2017 New York University Tandon 432 School of Engineering
4/23/2018
@2017 New York University Tandon 433 School of Engineering
Lagrange Multiplier Approach:
an idea for reducing computational complexity
For simplicity, consider
K
max P(x) f (x )
subject to
xi 0,i1,2,,K
K i1
g (x)b, 1i i 1
K i1
g (x)b. 2ii 2
ii i1
Reduce the problem to a scalar problem:
Assume that x* maximizes P(x) with the constraints and
that xo (h) maximizes the Lagrangian
P(x,h) a
i1
f(x)hg (x), hscalar
K
ii 2ii
K
subjecttog (x)b, x 0.Ifadditionallyho iss.t.
1i i 1 i i1
K
g xo(ho)b, 2ii 2
i1
then it holds P(x* ) P x0 (ho ).
See the textbook for a proof (pp.413-414).
4/23/2018 @2017 New York University Tandon 434 School of Engineering
4/23/2018
@2017 New York University Tandon 435 School of Engineering
Exercise: Integer Programming 3
minPf(x), x
subjectto
x b, b 1, 2,, 15
iii i1
3
i i1
4/23/2018
@2017 New York University Tandon 436 School of Engineering
0 asx 0, 1
f(x)510x as1x 5, 1111
158x as5x 15, 11
0
f 2 ( x 2 ) 2 8 x 2
a s x 2 0 ,
a s 1 x 2 5 , as5x2 15,
1812×2 0
a s x 3 0 , as1x 5,
f(x) 19x 333 3
410x as5x 15. 13
Other Applications of Dynamic Programming
4/23/2018
@2017 New York University Tandon 437 School of Engineering
• •
Optimal control system design Two puzzles
subject to
u
ta
Optimal Control
u
y
min J tb f (x,u,t)dt
P
xq(x,u,t), x(t)n, u(t)U(t)m.
Remark: Could be solved using the theory of Calculus of Variations, instead of “Dynamic Programming”.
4/23/2018 @2017 New York University Tandon 438 School of Engineering
first-difference approximate model is: xk1 xk tk1 tk q(xk ,uk ,tk )
First Pass:
A Sampled‐Data System Approach
K
Take an appropriate set ti i0 of sampling times
over[ta,tb],e.g.,ti ta iT.Then,theEuler-based
and the performance index is approximated by: P K 1 t k 1 t k f k ( x k , u k )
k0
Thus, an optimal sampled-data controller can be obtained
using the previously discussed approach.
4/23/2018 @2017 New York University Tandon 439 School of Engineering
ttb subject to
Second Pass:
A Continuous‐Time Infinitesimal Approach
Instead of minimizing J over ta , tb , we consider
Bx(t),t min tb f (x(),u(),)d uU() t
x()q(x(),u(),), tt b
Clearly, Bx(ta ),ta is the desired minimum.
4/23/2018 @2017 New York University Tandon 440 School of Engineering
u(t)U(t)
u()U() t
4/23/2018
@2017 New York University Tandon School of Engineering
441
Continuous Recurrence Relation
For sufficiently small 0, Bx(t),t
t t min f(x(),u(),)db f(x(),u(),)d
t t
min f(x(t),u(t),t) b f(x(),u(),)d o(2) t
t
b
min f(x(t),u(t),t) min t f(x(),u(),)d o(2)
tt b
Continuous Recurrence Relation (c’td)
4/23/2018
@2017 New York University Tandon 442 School of Engineering
Therefore,
B x(t),t min f(x(t),u(t),t)B x(t),t o(2)
u(t)U(t)
By Taylor’s formula,
x(t) x(t)q(x(t),u(t),t)o(2),
x ( t )
which implies, for small 0,
Bx(t),t (*)
min f(x(t),u(t),t)B x(t)q(x(t),u(t),t),t
u(t)U(t)
4/23/2018
@2017 New York University Tandon School of Engineering
443
Specifications to Linear Systems
In this case, we consider
t
min J b eT (t)Qe(t) uT (t)Ru(t) dt
subject to
ta
xAxBu, e(t)x(t)x (t),
d QQT 0, RRT 0.
(Kalman, 1960)
A Standing Assumption
The function Bx(t),t takes the quadratic form: T
Bx(t),txx Hxx kT xx k ddd0
where H,k,ko are time-variant functions.
4/23/2018 @2017 New York University Tandon 444 School of Engineering
Necessary Condition for Optimality
By applying Taylor’s formula to (*), we obtain
B BT t minf(x,u,t)x q(x,u,t)
Clearly,
4/23/2018
@2017 New York University Tandon 445 School of Engineering
u
BT
mineTQeuTRux (AxBu) (**)
u
B H H T e k x
B eTHeeT k HHT x kTx k
t dd0
On the other hand, at optimal control u* (t), BT
ueTQeuTRux (AxBu)0
u*R1BTHHTek/2
4/23/2018 @2017 New York University Tandon 446 School of Engineering
Direct substitution into the above identity (**) yields:
e Z e e z z 0 e, with abc
TT
T1 T1T T Za HQ HH A4 HH BR B HH
zkA HH BRB
T 1 T1T b2
HHT Ax x , d d
T 11T zkk AxxBRBk
c 0 d d 4
4/23/2018 @2017 New York University Tandon 447 School of Engineering
Therefore,Za 0, zb 0, zc 0. So, M (H H T ) / 2 satisfies the
differential Riccati equation:
where BR1BT . Also, k and k0 satisfy
4/23/2018
@2017 New York University Tandon 448 School of Engineering
T
M A M MAMM Q
d d
T
kMA kMAxx ,
1 kx Ax
k kT d
4 d d
Terminal Conditions
SinceBxt ,t 0, itholds: bb
M(tb)0, k(tb)k0(tb)0.
4/23/2018 @2017 New York University Tandon 449 School of Engineering
4/23/2018
@2017 New York University Tandon 450 School of Engineering
The Infinite‐Horizon LQ Regulator
min J eT Qe uT Ru dt ta
subject to
xAxBu,x Ax,tt
d d a and
A, B, Q, R are constant matrices, QQT 0, RRT 0.
4/23/2018
@2017 New York University Tandon 451 School of Engineering
Steady-State Riccati Equations
ˆˆˆˆ
0 AT M MA M M Q,
0MAT k, ˆ
01kTk, withBR1BT. 4
ˆ
[note: Unique solution M if ( A, B) controllable,
A, Q observable.] Optimal controller:
* 1Tˆˆ u R B MeKe
An Example
Consider a frictionless inverted pendulum with
x , A 0 1 , B0 2 0 1
Problem: Find, if possible, a control law to minimize
V ((t) (t))2 u2(t)dt, uangularvelocity
4/23/2018
@2017 New York University Tandon 452 School of Engineering
dc ta
2
Answer
The weighting matrices are
Q1 0,R1 00 c2
Denote the performance matrix:
ˆmm M1 2
mm
23
So,thegainmatrixisK R B M c m , c m
ˆ1Tˆ2 2 23
4/23/2018 @2017 New York University Tandon 453 School of Engineering
Answer (cont’d)
(1) 02m 2 c2m2 1 22
(2)0mm2c2mm 1323
(3) 02m c2m2 23
eq.(1) gives: m2 With eq.(3), we have:
24c2 c2
24c2 1
m2 c2 ,m3c2m2
4/23/2018 @2017 New York University Tandon 454 School of Engineering
Answer (cont’d)
Then the gain matrix is:
1 / 2 K:24c2, 224c2
ˆ
4/23/2018 @2017 New York University Tandon 455 School of Engineering
Puzzle (i)
Suppose there are 30 matches on a table. Group A begins by picking up 1, 2 or 3 matches. Group B then must also pick 1, 2 or 3 matches.
We continue in this fashion until there is no match on the table. The one who picks the last match loses.
What is the best strategy of Group A for surely winning the game?
4/23/2018 @2017 New York University Tandon 456 School of Engineering
Puzzle (ii)
Suppose there are 40 matches on a table.
Group A begins by picking up 1, 2, 3 or 4 matches. Group B then must also pick 1, 2, 3 or 4 matches.
We continue in this fashion until there is no match on the table. The one who picks the last match loses.
What is the best strategy of Group A for surely winning the game?
4/23/2018 @2017 New York University Tandon 457 School of Engineering
Answer to Puzzle (i)
Group A should always make sure that Group B plays, at each of its turn, with 5, 9, 13, 17, 21, 25, or 29 matches on the table.
That means, Group A should only take one match when it starts!!
4/23/2018 @2017 New York University Tandon 458 School of Engineering
Two More
(A) You are given a 9‐oz cup and a 4‐oz cup. How can you bring home exactly 6‐oz of milk? (Note: No additional cups are available.)
(B) We have 21 coins and are told that one is heavier than any of the other coins. How many weighings on a balance will it take to find the heaviest coin?
4/23/2018 @2017 New York University Tandon 459 School of Engineering
Back to Optimal Control…
4/23/2018
@2017 New York University Tandon 460 School of Engineering
u
y
min J tb f (x,u,t)dt u ta
subject to
P
x q(x,u,t), x(t)n , u(t)U(t) m.
Forward Recurrence Relation
Motivation: to address “jump discontinuities” (or right discontinuity)
The optimal control problem can also be approached by considering
Fx(t),t min t f x(),u(),d. uU ta
ta t
Clearly, F x(tb ), tb is the desired minimum.
4/23/2018 @2017 New York University Tandon 461 School of Engineering
Forward Recurrence Relation
Similar to the development of “backward recurrence relation”, it holds:
Fx(t),t
t t
min f x(),u(),d f x(),u(),d uU ta t
ta t
u(t)U(t)
min f x(t),u(t),t F x(t),t o(2)
4/23/2018 @2017 New York University Tandon 462 School of Engineering
Necessary Condition for Optimal u*
Using theidentityxtx(t)x(t)o(2) for sufficiently small 0, it holds:
F x(t),t min f x(t),u(t),t
4/23/2018
@2017 New York University Tandon 463 School of Engineering
u(t)U(t)
Fx(t)x(t)o(2),t o(2)
4/23/2018
@2017 New York University Tandon 464 School of Engineering
Hamilton‐Jacobi Equation
Assume that F is continuously differentiable. F x(t),t min f x(t),u(t),t
u(t)U(t)
Fx(t)x(t)o(2),t o(2)
min f x(t),u(t),t F x(t),t u(t)U(t)
F x(t),tT F x(t),t
x(t) o(2)
x t
Fx(t),t
Fx(t),tT
t 4/23/2018
u ( t ) U ( t ) x
Hamilton‐Jacobi Equation
Fx(t),t
t min f x(t),u(t),t
Fx(t),tT
u(t)U(t)
x(t) o(2) x
min f x(t),u(t),t q(x(t),u(t),t)
@2017 New York University Tandon 465 School of Engineering
Fx(t),t t
Fx(t),tT
f x(t),u*,t x q(x(t),u*,t)
4/23/2018
@2017 New York University Tandon 466 School of Engineering
Hamilton‐Jacobi PDE
Thus, for any given x,t, optimal controller u* u(x,t) satisfies:
Comment
Instead of solving the difficult HJ PDE, we could introduce the costate vector :
F x*(t),t *(t) .
x
Using the fact that u* is a stationary point for the
minimum, we have the ODE:
f x*(t),u*(t),t qx*(t),u*(t),tT
* *
(t) x x (t)
See the textbook (pp. 434-435) for a detailed derivation.
4/23/2018 @2017 New York University Tandon 467 School of Engineering
t into:
u ( t ) U ( t ) x
t or,
max H x*(t),*(t),u(t),t
4/23/2018
@2017 New York University Tandon 468 School of Engineering
F x*(t),t
Hamiltonian Functions
Introduce “Hamiltonian Function”, arising from classical mechanics, of the form: H (t)T qx(t),u(t),t f x(t),u(t),t
we simplify the HJ PDE
Fx(t),t Fx(t),tT
min f x(t),u(t),t q(x(t),u(t),t)
F x*(t),t **
t
u(t)U(t)
min H x (t), (t),u(t),t ,
u(t)U(t)
Pontryagin’s Maximum Principle for the Optimum
(i) H x*(t),*(t),u*(t),t
u*(t)U(t)
(1956)
H x*(t),*(t),u(t),t
u(t)U(t)
tta,tb, u(t)U(t), and
H x*(t),(t),u*(t),t
iix*(t) q(x*(t),u*(t),t)
H x*(t),*(t),u*(t),t
Hamiltonian Canonical Equations
* iii (t) x .
4/23/2018 @2017 New York University Tandon 469 School of Engineering
4/23/2018
@2017 New York University Tandon School of Engineering
Example of Bang‐Bang Control
Consider an affine system
xa(x,t)B(x,t)u, um with
a (x,t) 1
b b
a2(x,t) 11 1m
a(x,t) ,B(x,t)
b b
a (x,t) n
n1
nm 470
4/23/2018
@2017 New York University Tandon 471 School of Engineering
Problem Statement
Find, if possible, an optimal control u, uk,min uk uk,max, k1,2,,m,
that minimizes the performance measure J tb f (x,t)dt, f C1.
ta
Hamiltonian Function and Necessary Conditions
4/23/2018
@2017 New York University Tandon School of Engineering
472
Consider the Hamiltonian as
H T a(x,t)B(x,t)u f(x,t)
Following the previous arguments, the following necessary condition holds:
*T B(x*,t)u* *T B(x*,t)u, admissible u
maxn a*(t)u n a*(t)u*, a*(t)n *b (x*,t)
u
ii
iii
k ki i1 i1 k1
Hamiltonian Function and Necessary Conditions
nnn
i1 i1
k1
maxa*(t)u a*(t)u*, a*(t)
*b (x*,t) k ki
u
ii
iii
Optimal “bang-bang control”
4/23/2018
@2017 New York University Tandon 473 School of Engineering
u
i,max
if a*(t)0, i
uu
i,min
if a*(t)0, i
u
i , m i n
u* u
i i , m a x
if a*(t)0. i
“switching function”
Practicing Problems for the Final
Problem 1:
Which of the following characterize convex sets?
(1.a)x2 x 1. 12
(1.b)x2 x 1. 12
(1.c)xx x 2,x2x 1. 12312
(1.d)xx x 2,x2x 1. 12312
(1.e)x2 x2 1, x x 1. 1212
(1. f ) The intersection of two convex sets.
4/23/2018 @2017 New York University Tandon 474 School of Engineering
4/23/2018
@2017 New York University Tandon 475 School of Engineering
Practicing Problems for the Final
Problem 2:
MaximizeP3x 2x 2x , subjectto 123
4x 2x 3x 20, 123
2x 2x 4x 6, 123
x 0, x 0, x free. 123
4/23/2018
@2017 New York University Tandon 476
Practicing Problems for the Final
Problem 3:
Minimizef x2 x2 xx , 1212
subject to
2x x 2,
12
x x 4,
12 x 0.
1
School of Engineering
4/23/2018
@2017 New York University Tandon 477 School of Engineering
Practicing Problems for the Final
Problem 4:
Minimizef x x , 12
subject to
(x 1)2 x2 2,
1 2
(x 1)2 x2 2.
12
Practicing Problems for the Final
Problem 5:
ConsiderminJ
under the terminal conditions:
T2
2
xe x1 dt t
T 1, x01, x(1)0.5.
0
(5.1) What are the necessary conditions for optimal x* (t ) ? (5.2) Can x* (t) contain corner points?
4/23/2018 @2017 New York University Tandon 478 School of Engineering