CS计算机代考程序代写 Outline

Outline
Dynamic Programming
A method for Multi-Stage Decision Making
• Basics of dynamic programming • Application examples
4/23/2018
@2017 New York University Tandon School of Engineering
406
Richard Bellman

Bellman’s Principle of Optimality
PO: An optimal policy has the property that whatever the initial state and the initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
4/23/2018 @2017 New York University Tandon 407 School of Engineering

4/23/2018
@2017 New York University Tandon
408
Illustrative Figure of PO
Optimal trajectory cost
x(tf )
x(t)
J*(t0,x(t0))
x(t0 ) School of Engineering

Illustrative Example of A Routing Network
4/23/2018
School of Engineering
409
d
k h
8 4b26
9 11
3 g 16
9 5 e 10i
a
7c47
l 14f 6
12j 8 @2017 New York University Tandon
m

Illustrative Example of A Routing Network (cont’d)
Problem:
Find the minimum-cost path that starts at node a and ends at any of the nodes k, l, m
4/23/2018 @2017 New York University Tandon 410 School of Engineering

Illustrative Example of A Routing Network (cont’d)
We will find the optimal path by performing a backward pass through the network.
Let J * be the minimum cost from a generic node q
q to any one of three possible final nodes. Then, J* 16, J* 8, J* 7, J* 6.
ghij
The next stage yields
J* min 3J*, 11J* min 19, 19 19.


dgh
4/23/2018 @2017 New York University Tandon 411 School of Engineering

Illustrative Example of A Routing Network (cont’d)
Thus, we have two optimal paths from node d: dgk and dhk. Similarly,
J* min 6J*, 10J* 
min 68, 107 14 
ehi and
J* min 4J*, 12J* 
min 47, 126 11. 
fij
Therefore, the optimal path emanating from node e
is ehk, and the path from node f is fil.
4/23/2018 @2017 New York University Tandon 412 School of Engineering

Illustrative Example of A Routing Network (cont’d)
The next stage yields
J* min 9J*, 2J* 
min 919, 214 16 
bde and
J*  min 5 J*, 14 J* 
 min 514, 1411 19. 
cef
Therefore, the optimal path emanating from node b is behk, and the path from node c is cehk.
4/23/2018 @2017 New York University Tandon 413 School of Engineering

Illustrative Example of A Routing Network (cont’d)
Finally, the initial, and thus final, stage yields
J* min 4J*, 7J* min 416, 719 20.


abc
Hence, the optimal, minimum total cost route from node a to any one of the three nodes k, l, m is abehk, and the total cost of this path is 20.
4/23/2018 @2017 New York University Tandon 414 School of Engineering

Comment:
In a nutshell,
“Dynamic Programming” is a technique for solving a complex problem in such a way that the problem is broken into several simpler steps.
4/23/2018 @2017 New York University Tandon 415 School of Engineering

4/23/2018
4573
@2017 New York University Tandon
416
1
43266 10
2751
413 343
9
4
Exercise
Find the shortest path from node 1 to node 10 in the following network. Also, find the shortest path from node 3 to node 10.
264 3383
School of Engineering

Solution to the “Blood Diamond” Problem
(97, 0, 1, 2, 0) or (97, 0, 1, 0, 2)
Do you know why?
4/23/2018 @2017 New York University Tandon 417 School of Engineering

Detailed Analysis based on PO
• IfNos.1‐3die,itremainsonlyNo.4andNo.5.
Then, No. 5 will vote against whatever No.4 proposes
to earn all diamonds.
So, the only option for No. 4 is to vote for No.3.
• AssoonasNo.3knowsthispoint,hewillpropose (100, 0, 0) to earn all diamonds while No.4 has to vote “yes” in order to survive.
4/23/2018 @2017 New York University Tandon 418 School of Engineering

Detailed Analysis based on PO
• As soon as No.2 knows this point, he will propose (98, 0, 1, 1) to win the support from No.4 and No.5, while giving up on No.3
• When No. 1 is aware of the optimal strategy of No.2, he will give up on No.2 and then propose (97, 0, 1, 2, 0) or (97, 0, 1, 0, 2) to win 3 votes including his own vote, the majority votes!!
4/23/2018 @2017 New York University Tandon 419 School of Engineering

Application to a Control Problem
Find a sequence of control actions {u(k)} to minimize the performance index
N  1 kk0
J (N,x(N))
x(k1) f(k,x(k),u(k)), x(k0)x0.
F(k,x(k),u(k))
where x(k) is the solution of the discrete-time system:
4/23/2018 @2017 New York University Tandon 420 School of Engineering

4/23/2018
@2017 New York University Tandon 421 School of Engineering
A Control Problem (cont’d)
Let J * be the minimum cost of transferring the system from k
x(k)totheterminalstatex(N).Clearly,J* (N,x(N)). N
Using the PO, we obtain J* (x(N 1))
N1

min F(N1,x(N1),u(N1))J*(x(N))
N
 minF(N1,x(N1),u(N1))(N,x(N))
u(N1) u(N1)
 min{F(N1,x(N1),u(N1)) u(N1)
  ( N , f ( N  1, x ( N  1) , u ( N  1) ) ) } .

A Control Problem (cont’d)
A recursive expression of this backward pass is (k0  k  N ): J*(x(k)) min F(k,x(k),u(k))J* (x(k1))
 k u(k) k1
 k1
min F(k,x(k),u(k))J* (f(k,x(k),u(k)) .
u(k)
The backward pass is completed when k  k0 .
Sincex0 xk0isknown,wefinduu(k0)intermsofx0. Then, proceed with the forward pass:
x(k), u(k)  x(k 1), u(k 1)
4/23/2018 @2017 New York University Tandon 422 School of Engineering

Consider a scalar discrete-time system x(k1)2x(k)3u(k), x(0)4,
and the performance index 211
An Exercise
J  x(2)10  x2(k)u2(k) .    
2 k0
Goal: findtheoptimalu*=(u(0), u(1))thatminimizesJ.
4/23/2018 @2017 New York University Tandon 423 School of Engineering

Solution
u(0) = 2.0153, u(1) = ‐1.9237.
4/23/2018 @2017 New York University Tandon 424 School of Engineering

Allocation Problem: the scalar case
4/23/2018
@2017 New York University Tandon 425 School of Engineering
We want to maximize a noninteracting performance index
subject to
K
g (x )  b, g (x )  0, x  . iiiii
K
P   f (x )
ii i1
i1

Solution based on PO
Foreaseofnotation,introducetheresourcevariableK : K
g(x) b
iiK i1
and define
F ( ) max f (x ) f(x).
KK KK
x ,,x 1K
K1 
ii
 i1 
Also, define the policy variable x ( ) which maximizes ˆK K
theaboveF ( )foragiven .Particularly,x (b)x* . KK K ˆKK
4/23/2018 @2017 New York University Tandon 426 School of Engineering

Solution based on PO (cont’d)
F ( )f (x ( )) max
K1 
KK KˆKK K1
x ,,x
1 K1
f(x) ii
ˆ subjectto g(x) g x ( ).
i1 Therefore,

K1 F()f(x())F {g x()}
 K K K ˆK K K1 K K ˆK K
or, equivalently,
F ( )max f (x )F  g (x )
  
K K xK K K K1 K K K subjectto gK(xK)K.
4/23/2018 @2017 New York University Tandon 427 School of Engineering
i1 
iiKKKK
F   K1 K1

Solution based on PO (cont’d)
In brief, the K -stage problem reduces to a simpler (K 1)-stage problem:
K  1 F()max f(x)
K1 K1
ii 1 K1
subjectto
i1 K  1
4/23/2018
@2017 New York University Tandon 428 School of Engineering
x ,,x
g(x) g (x ) .
i i K K K K1 i1

Solution based on PO (cont’d)
In general, the following recurrence relation holds:
F()maxf(x)F g(x)
  
k k xk k k k1 k k k subject to
g ( x )   ,  k  1, 2 ,  , K , F (  )  0 . kkk 00
So, the original K-stage problem is replaced by K (easier) one-stage decision problems.
4/23/2018 @2017 New York University Tandon 429 School of Engineering

4/23/2018
@2017 New York University Tandon 430 School of Engineering
Allocation Problem: the vector case
K
max P   f (x )
ii i1
K
subjectto g (x )b, i1,2,,n.
ij j i j1
Or, in matrix notation, K T
g(x)bbbb . 
jj12n j1

Allocation Problem: the vector case (Cont’d)
Like the scalar case, we obtain the n-state counterpart of the recurrence relation:
F b max f(x)F bg(x)  
kk xk kk k1k kk k1,2,,K, withconstraints:
T x 0, g (x )b   ,  ,, 
k k k k 1k 2k nk

4/23/2018 @2017 New York University Tandon 431 School of Engineering

Computational Complexity


Note that the argument b of F is a n-th order vector.
k1 k1
If each component of b may assume m distinct values,
k1
the potential total number N of values of F which
k1
n 10
needbestoredism .(Whenm100andn5,N10 !) Bellman’s “Curse of Dimensionality” !
4/23/2018 @2017 New York University Tandon 432 School of Engineering

4/23/2018
@2017 New York University Tandon 433 School of Engineering
Lagrange Multiplier Approach:
an idea for reducing computational complexity
For simplicity, consider
K
max P(x)   f (x )
subject to
xi 0,i1,2,,K
K i1
g (x)b, 1i i 1
K i1
g (x)b. 2ii 2
ii i1

Reduce the problem to a scalar problem:
Assume that x* maximizes P(x) with the constraints and
that xo (h) maximizes the Lagrangian
P(x,h) a
 i1
f(x)hg (x), hscalar 
K
ii 2ii
K
subjecttog (x)b, x 0.Ifadditionallyho iss.t.
1i i 1 i i1
K  
g xo(ho)b, 2ii 2
i1
then it holds P(x* )  P x0 (ho ).
See the textbook for a proof (pp.413-414).

4/23/2018 @2017 New York University Tandon 434 School of Engineering

4/23/2018
@2017 New York University Tandon 435 School of Engineering
Exercise: Integer Programming 3
minPf(x), x
subjectto
x b, b 1, 2,, 15
iii i1
3

i i1

4/23/2018
@2017 New York University Tandon 436 School of Engineering
0 asx 0, 1
f(x)510x as1x 5, 1111
158x as5x 15, 11
 0
f 2 ( x 2 )   2  8 x 2
a s x 2  0 ,
a s 1  x 2  5 , as5x2 15,
1812×2  0
a s x 3  0 , as1x 5,
f(x) 19x 333 3
410x as5x 15. 13

Other Applications of Dynamic Programming
4/23/2018
@2017 New York University Tandon 437 School of Engineering
• •
Optimal control system design Two puzzles

subject to
u
ta
Optimal Control
u
y
min J  tb f (x,u,t)dt
P

xq(x,u,t), x(t)n, u(t)U(t)m.
Remark: Could be solved using the theory of Calculus of Variations, instead of “Dynamic Programming”.
4/23/2018 @2017 New York University Tandon 438 School of Engineering

first-difference approximate model is: xk1  xk tk1 tk q(xk ,uk ,tk )
First Pass:
A Sampled‐Data System Approach
K
Take an appropriate set ti i0 of sampling times

over[ta,tb],e.g.,ti ta iT.Then,theEuler-based
and the performance index is approximated by: P  K  1  t k  1  t k  f k ( x k , u k )
k0
Thus, an optimal sampled-data controller can be obtained
using the previously discussed approach.
4/23/2018 @2017 New York University Tandon 439 School of Engineering

ttb subject to
Second Pass:
A Continuous‐Time Infinitesimal Approach
Instead of minimizing J over ta , tb , we consider 
Bx(t),t min tb f (x(),u(),)d uU() t
x()q(x(),u(),), tt b

Clearly, Bx(ta ),ta  is the desired minimum.
4/23/2018 @2017 New York University Tandon 440 School of Engineering

u(t)U(t) 
 u()U() t 
4/23/2018
@2017 New York University Tandon School of Engineering
441
Continuous Recurrence Relation
For sufficiently small   0, Bx(t),t
t t  min  f(x(),u(),)db f(x(),u(),)d
t t
min f(x(t),u(t),t) b f(x(),u(),)d o(2) t 
t 
b
 min f(x(t),u(t),t) min t f(x(),u(),)d o(2)

tt b
 

Continuous Recurrence Relation (c’td)
4/23/2018
@2017 New York University Tandon 442 School of Engineering
Therefore,
B x(t),t  min f(x(t),u(t),t)B x(t),t o(2)
     u(t)U(t)
By Taylor’s formula,
x(t) x(t)q(x(t),u(t),t)o(2),
 x ( t )
which implies, for small   0,
Bx(t),t (*)
  
 min f(x(t),u(t),t)B x(t)q(x(t),u(t),t),t
u(t)U(t)

4/23/2018
@2017 New York University Tandon School of Engineering
443
Specifications to Linear Systems
In this case, we consider
t
min J  b eT (t)Qe(t)  uT (t)Ru(t) dt
 subject to
ta

xAxBu, e(t)x(t)x (t),
d QQT 0, RRT 0.
(Kalman, 1960)

A Standing Assumption
The function Bx(t),t takes the quadratic form: T
Bx(t),txx  Hxx kT xx k ddd0
where H,k,ko are time-variant functions.
4/23/2018 @2017 New York University Tandon 444 School of Engineering

Necessary Condition for Optimality
By applying Taylor’s formula to (*), we obtain
B  BT  t  minf(x,u,t)x q(x,u,t)
Clearly,
4/23/2018
@2017 New York University Tandon 445 School of Engineering
u 
 BT 
 mineTQeuTRux (AxBu) (**)
u 
B  H  H T e  k x
B     eTHeeT k HHT x kTx k
t  dd0

On the other hand, at optimal control u* (t),  BT 
ueTQeuTRux (AxBu)0 

 u*R1BTHHTek/2 
4/23/2018 @2017 New York University Tandon 446 School of Engineering

Direct substitution into the above identity (**) yields:
e Z e  e z  z  0 e, with abc
TT
 T1 T1T T Za HQ HH A4 HH BR B HH

zkA HH BRB
T 1 T1T b2

HHT Ax x , d d
T 11T zkk AxxBRBk

c 0  d d 4 

4/23/2018 @2017 New York University Tandon 447 School of Engineering

Therefore,Za 0, zb 0, zc 0. So, M  (H  H T ) / 2 satisfies the
differential Riccati equation:

where   BR1BT . Also, k and k0 satisfy
4/23/2018
@2017 New York University Tandon 448 School of Engineering
T
M A M MAMM Q
 d d
T
kMA kMAxx ,

1 kx Ax
k kT d
4 d d 

Terminal Conditions
SinceBxt ,t 0, itholds: bb
M(tb)0, k(tb)k0(tb)0.
4/23/2018 @2017 New York University Tandon 449 School of Engineering

4/23/2018
@2017 New York University Tandon 450 School of Engineering
The Infinite‐Horizon LQ Regulator
min J   eT Qe  uT Ru dt ta
subject to
xAxBu,x Ax,tt
 d d a and
A, B, Q, R are constant matrices, QQT 0, RRT 0.

4/23/2018
@2017 New York University Tandon 451 School of Engineering
Steady-State Riccati Equations
ˆˆˆˆ
0   AT M  MA  M M  Q,
0MAT k, ˆ
01kTk, withBR1BT. 4
ˆ
[note: Unique solution M if ( A, B) controllable,
A, Q  observable.] Optimal controller:
* 1Tˆˆ u R B MeKe

An Example
Consider a frictionless inverted pendulum with
x , A 0 1 , B0  2 0 1
   
Problem: Find, if possible, a control law to minimize
V ((t) (t))2 u2(t)dt, uangularvelocity
4/23/2018
@2017 New York University Tandon 452 School of Engineering
dc ta  
2

Answer
The weighting matrices are
Q1 0,R1 00 c2
Denote the performance matrix:
ˆmm M1 2
mm
23
So,thegainmatrixisK R B M c m , c m 

ˆ1Tˆ2 2 23
4/23/2018 @2017 New York University Tandon 453 School of Engineering

Answer (cont’d)
(1) 02m 2 c2m2 1 22
(2)0mm2c2mm 1323
(3) 02m c2m2 23
 eq.(1) gives: m2  With eq.(3), we have:
24c2 c2
24c2 1
m2 c2 ,m3c2m2
4/23/2018 @2017 New York University Tandon 454 School of Engineering

Answer (cont’d)
Then the gain matrix is:
  1 / 2  K:24c2, 224c2 
ˆ

4/23/2018 @2017 New York University Tandon 455 School of Engineering

Puzzle (i)
Suppose there are 30 matches on a table. Group A begins by picking up 1, 2 or 3 matches. Group B then must also pick 1, 2 or 3 matches.
We continue in this fashion until there is no match on the table. The one who picks the last match loses.
What is the best strategy of Group A for surely winning the game?
4/23/2018 @2017 New York University Tandon 456 School of Engineering

Puzzle (ii)
Suppose there are 40 matches on a table.
Group A begins by picking up 1, 2, 3 or 4 matches. Group B then must also pick 1, 2, 3 or 4 matches.
We continue in this fashion until there is no match on the table. The one who picks the last match loses.
What is the best strategy of Group A for surely winning the game?
4/23/2018 @2017 New York University Tandon 457 School of Engineering

Answer to Puzzle (i)
Group A should always make sure that Group B plays, at each of its turn, with 5, 9, 13, 17, 21, 25, or 29 matches on the table.
That means, Group A should only take one match when it starts!!
4/23/2018 @2017 New York University Tandon 458 School of Engineering

Two More
(A) You are given a 9‐oz cup and a 4‐oz cup. How can you bring home exactly 6‐oz of milk? (Note: No additional cups are available.)
(B) We have 21 coins and are told that one is heavier than any of the other coins. How many weighings on a balance will it take to find the heaviest coin?
4/23/2018 @2017 New York University Tandon 459 School of Engineering

Back to Optimal Control…
4/23/2018
@2017 New York University Tandon 460 School of Engineering
u
y
min J  tb f (x,u,t)dt u ta
subject to
P

x  q(x,u,t), x(t)n , u(t)U(t)  m.

Forward Recurrence Relation
Motivation: to address “jump discontinuities” (or right discontinuity)
The optimal control problem can also be approached by considering
Fx(t),t min t f x(),u(),d. uU ta
ta t
Clearly, F x(tb ), tb  is the desired minimum.
4/23/2018 @2017 New York University Tandon 461 School of Engineering

Forward Recurrence Relation
Similar to the development of “backward recurrence relation”, it holds:
Fx(t),t
t t 
 min  f x(),u(),d f x(),u(),d uU ta t
ta t
u(t)U(t)
  
 min f x(t),u(t),t F x(t),t o(2)
4/23/2018 @2017 New York University Tandon 462 School of Engineering

Necessary Condition for Optimal u*
Using theidentityxtx(t)x(t)o(2) for sufficiently small   0, it holds:
F x(t),t  min f x(t),u(t),t 
4/23/2018
@2017 New York University Tandon 463 School of Engineering
u(t)U(t) 
Fx(t)x(t)o(2),t o(2) 


4/23/2018
@2017 New York University Tandon 464 School of Engineering
 
Hamilton‐Jacobi Equation
Assume that F is continuously differentiable. F x(t),t  min f x(t),u(t),t
 u(t)U(t)
 
 Fx(t)x(t)o(2),t o(2)
min f x(t),u(t),t  F x(t),t u(t)U(t)
F x(t),tT F x(t),t
x(t)  o(2)
x t   

 Fx(t),t
  
 Fx(t),tT 
 t 4/23/2018
u ( t ) U ( t )    x  
Hamilton‐Jacobi Equation
Fx(t),t
 t  min f x(t),u(t),t
 Fx(t),tT 
u(t)U(t)

x(t) o(2)  x   
 min  f x(t),u(t),t q(x(t),u(t),t)
 
@2017 New York University Tandon 465 School of Engineering

Fx(t),t t
  Fx(t),tT
 f x(t),u*,t  x  q(x(t),u*,t)
4/23/2018
@2017 New York University Tandon 466 School of Engineering
Hamilton‐Jacobi PDE
Thus, for any given x,t, optimal controller u*  u(x,t) satisfies:


Comment
Instead of solving the difficult HJ PDE, we could introduce the costate vector :
F x*(t),t *(t)  .
x
Using the fact that u* is a stationary point for the
minimum, we have the ODE:
f x*(t),u*(t),t qx*(t),u*(t),tT
*   *
(t) x  x  (t) 
See the textbook (pp. 434-435) for a detailed derivation.
4/23/2018 @2017 New York University Tandon 467 School of Engineering


 t into:
u ( t ) U ( t )    x  
t or,
   max H x*(t),*(t),u(t),t
4/23/2018
@2017 New York University Tandon 468 School of Engineering

F x*(t),t 

Hamiltonian Functions
Introduce “Hamiltonian Function”, arising from classical mechanics, of the form: H  (t)T qx(t),u(t),t f x(t),u(t),t
we simplify the HJ PDE
Fx(t),t  Fx(t),tT 
 min  f x(t),u(t),t q(x(t),u(t),t)
 
F x*(t),t **
t
u(t)U(t)
 min H x (t), (t),u(t),t ,
u(t)U(t)

Pontryagin’s Maximum Principle for the Optimum
(i) H x*(t),*(t),u*(t),t
 u*(t)U(t)
(1956)
 H x*(t),*(t),u(t),t
 u(t)U(t)
tta,tb, u(t)U(t), and
H x*(t),(t),u*(t),t
iix*(t)     q(x*(t),u*(t),t)
 
H x*(t),*(t),u*(t),t
Hamiltonian Canonical Equations
*   iii  (t) x .
4/23/2018 @2017 New York University Tandon 469 School of Engineering

4/23/2018
@2017 New York University Tandon School of Engineering
Example of Bang‐Bang Control
Consider an affine system 
xa(x,t)B(x,t)u, um with
a (x,t) 1
b  b 
a2(x,t)  11 1m 
a(x,t) ,B(x,t)  


b  b 
a (x,t) n
 n1
nm  470

4/23/2018
@2017 New York University Tandon 471 School of Engineering
Problem Statement
Find, if possible, an optimal control u, uk,min uk uk,max, k1,2,,m,
that minimizes the performance measure J  tb f (x,t)dt, f C1.
ta

Hamiltonian Function and Necessary Conditions
4/23/2018
@2017 New York University Tandon School of Engineering
472
Consider the Hamiltonian as
H T a(x,t)B(x,t)u f(x,t)
Following the previous arguments, the following necessary condition holds:
*T B(x*,t)u*  *T B(x*,t)u,  admissible u 
maxn a*(t)u n a*(t)u*, a*(t)n *b (x*,t)
u
ii
iii
k ki i1 i1 k1

Hamiltonian Function and Necessary Conditions
nnn
 i1 i1
 k1
maxa*(t)u  a*(t)u*, a*(t) 
*b (x*,t) k ki
u
ii
iii
 Optimal “bang-bang control”
4/23/2018
@2017 New York University Tandon 473 School of Engineering
u
 i,max
if a*(t)0, i
uu
 i,min
if a*(t)0, i
u
 i , m i n
u* u
i i , m a x
if a*(t)0. i
“switching function”

Practicing Problems for the Final
Problem 1:
Which of the following characterize convex sets?
(1.a)x2 x 1. 12
(1.b)x2 x 1. 12
(1.c)xx x 2,x2x 1. 12312
(1.d)xx x 2,x2x 1. 12312
(1.e)x2 x2 1, x x 1. 1212
(1. f ) The intersection of two convex sets.
4/23/2018 @2017 New York University Tandon 474 School of Engineering

4/23/2018
@2017 New York University Tandon 475 School of Engineering
Practicing Problems for the Final
Problem 2:
MaximizeP3x 2x 2x , subjectto 123
4x 2x 3x 20, 123
2x 2x 4x 6, 123
x  0, x  0, x free. 123

4/23/2018
@2017 New York University Tandon 476
Practicing Problems for the Final
Problem 3:
Minimizef x2 x2 xx , 1212
subject to
2x x 2,
12
x x 4,
12 x 0.
1
School of Engineering

4/23/2018
@2017 New York University Tandon 477 School of Engineering
Practicing Problems for the Final
Problem 4:
Minimizef x x , 12
subject to
(x 1)2 x2 2,
1 2
(x 1)2 x2 2.
12

Practicing Problems for the Final
Problem 5:
ConsiderminJ
under the terminal conditions:
T2 
2 
xe  x1 dt t 
 T 1, x01, x(1)0.5.
0
(5.1) What are the necessary conditions for optimal x* (t ) ? (5.2) Can x* (t) contain corner points?
4/23/2018 @2017 New York University Tandon 478 School of Engineering

Related Posts