程序代写代做 C algorithm go 14. The Linear Quadratic Regulator

14. The Linear Quadratic Regulator
The linear quadratic regulator is an optimal control strategy, based on using a quadratic cost (which the controller should minimize) and a linear system. It is called a regulator, as the goal is to ‘regulate’ the tracking error to zero. In addition to being computationally tractable, the resulting controller has many desirable properties. We will derive the linear quadratic controller using dynamic programming.
14.1 Introduction to dynamic programming
• Developed in the 1950’s by Richard Bellman
• “programming” ∼ “planning” (has nothing to do with computers)
• a useful concept with lots of applications (e.g. in control, computer science, and economics; applications include: path planning, air traffic control).
The key idea is to solve a complex and difficult problem via solving a collection of (easier) sub-problems. We won’t prove any properties of dynamic programming, and more infor- mation may be found in e.g. D. P. Bertsekas, Dynamic programming and optimal control (3rd edition). Athena Scientific Belmont, MA, 2005, vol. 1.
Example: Path planning
We will illustrate the dynamic programming algorithm with a path planning problem. Our goal is to obtain minimum cost path from S to E along the network below:
Last update: 2020-03-25 at 16:05:05
14 – 1

The numerical values are the cost (e.g., distance, time, etc.) of the path.
We solve the problem by working backwards from the goal, E. We define our cost function, the distance to the node:
dist (N ) := minimum cost to move from S to N Backward analysis
dist (E) = min {dist (B) + 2, dist (D) + 1} dist (B) = dist (A) + 6
dist (D) = min {dist (B) + 1, dist (C) + 3} dist(C) = 2
dist (A) = min {1, dist (C) + 4} Forward computation
dist(C) = 2
dist (A) = 1
dist (B) = 1 + 6 = 7
dist (D) = 5 dist(E) = 6
This gives us the minimum cost to move from S to E (which is 6) as well as the optimal path (working backwards, E ← D ← C ← S; or put in the forward direction again, S → C → D → E).
Bellman’s principle of optimality
This principle allows us to break a hard problem into a series of sub-problems. Specifically, it states: “From any point on an optimal trajectory, the remaining trajectory is optimal for the corresponding problem initiated at that point.”
14 – 2

Example: the fastest driving route from Seattle to San Diego goes through Portland and Los Angeles. Therefore, the fastest driving route from Portland to San Diego must pass through Los Angeles.
From our numerical example above, we saw the shortest path was S → C → D → E. We thus know that the shortest path starting at C has to be C → D → E.
14.2 General optimal control problems
We now generalize this to the case that the choices are no longer discrete. Unlike the simple example above, the possible choices are no longer a finite list; however we can still exploit dynamic programming.
Consider the general discrete-time plant:
x(k+1) = qk (x(k), u(k))
x(k) ∈ Xk ⊆ Rn (state constraints) u(k) ∈ Uk ⊆ Rm (input constraints)
where qk(·,·) may be nonlinear.
We have a cost, or performance index:
N−1
J = S (x(N )) + 􏰇 Lk (x(k), u(k))
k=0
where S and Lk are real, scalar-valued functions; N is the final time (the optimization
horizon).
Our goal is to obtain the optimal control sequence
{uo (0),uo (1),…,uo(N−1)}
where we mean “optimal” in the sense of minimizing the cost J.
14.2.1 Dynamic programming for optimal control
We define the input sequence Uk, which contains the inputs remaining from time k as Uk :={u(k),u(k+1),…,u(N−1)}.
14 – 3

Define also the optimal cost-to-go at time k: the cost at time k as a function of the current state x(k), when executing the optimal policy from that state, written as Jko (x(k)). We now construct a recursion for this cost-to-go:
 N − 1  Jko(x(k)):=min S(x(N))+􏰇Lk(x(j),u(j))
Uk  j=k 
  
 N−1  =minmin Lk (x(k),u(k))+S(x(N))+ 􏰇 Lj (x(j),u(j))
u(k) Uk+1  j=k+1    
 N−1 
=min Lk(x(k),u(k))+minS(x(N))+ 􏰇 Lj(x(j),u(j)) u(k)  Uk+1 j=k+1 
= min 􏰄Lk (x(k), u(k)) + Jko+1 (x(k+1))􏰤 u(k)
(14.1)
From the second to the third line: L (x(k), u(k), k) does not depend on Uk+1.
The cost-to-go function has to satisfy the boundary condition JNo (x(N)) = S (x(N)). We can further replace x(k+1) = qk (x(k), u(k)) to get the more explicit
Jko (x(k)) = min 􏰄Lk (x(k), u(k)) + Jko+1 (qk (x(k), u(k)))􏰤 (14.2) u(k)
Thus the original optimal control problem can now be solved by solving a sequence of problems JNo −1, JNo −2, . . . , J1o, Jo, that is starting at the end and working backwards to the beginning.
14.2.2 Discrete-time, finite-horizon Linear Quadratic control
We will first consider linear, time-varying systems of the form
x(k+1) = A(k)x(k) + B(k)u(k), x (0) = xo (14.3)
Note that, without loss of generality, we assume that we always start the problem at time k = 0.
Although most practical systems are nonlinear, and thus do not fit the above model, they can often be well approximated by such a linear model.
14 – 4

We define a quadratic cost function (performance index), J, as follows:
1 2
strategy,“LQR”: Linear Quadratic Regulator. The optimal cost-to-go is then
x(N)T Sx(N) +
1 N−1
􏰇 􏰈x(k)T Q(k)x(k) + u(k)T R(k)u(k)􏰉
J =
Q(k)=Q(k)T ≥0, S=ST ≥0, R(k)=R(k)T >0
2 k=0
This is often called “LQ” control: Linear plant with Quadratic cost. Related is the control
o 􏰁1T 1T o 􏰂 Jk (x(k)) = min x(k) Q(k)x(k) + u(k) R(k)u(k) + Jk+1 (x(k+1))
u(k) 2 2 withboundarycondition:Jo (x(N))=1x(N)TSx(N).
N2
Time step
We look at the recursion, assuming that we have a solution at step k and solving from that the solution at k − 1 (moving backwards in time). Specifically, we assume that we know at time k that the optimal cost to go is quadratic in x(k), specifically:
Jko (x(k)) = x(k)T U(k)x(k) for some matrix U(k) = U(k)T > 0
Specifically, we want to compute the optimal cost-to-go at time k − 1, Jko−1 (x(k−1)). By the derived recursion (14.1):
Jko−1(x(k−1))= min {Lk−1(x(k−1),u(k−1))+Jko(x(k))} u(k−1)
􏰁1􏰈T T
= min x(k−1) Q(k−1)x(k−1)+u(k−1) R(k−1)u(k−1)
+2x(k) U(k)x(k) From the system dynamics (14.3), we have
u(k−1) 2
1T 􏰆􏰂
x(k) = A(k−1)x(k−1) + B(k−1)u(k−1)
14 – 5

which we can substitute, to get
Jko−1(x(k−1))=1 min 􏰄x(k−1)TQ(k−1)x(k−1)
2 u(k−1)
+u(k−1)T R(k−1)u(k−1)
+ (A(k−1)x(k−1) + B(k−1)u(k−1))T U(k) (A(k−1)x(k−1) + B(k−1)u(k−1))􏰦
􏰁1T T 􏰂 = min u(k−1) M(k−1)u(k−1)+p(k−1) u(k−1)+q(k−1)
u(k−1) 2
where
q(k−1) = x(k−1)T 􏰈Q(k−1) + A(k−1)T U(k)A(k−1)􏰉 x(k−1) p(k−1) = x(k−1)T A(k−1)T U(k)B(k−1)
M(k−1) = R(k−1) + B(k−1)T U(k)B(k−1)
We can solve the optimal control uo(k−1) by letting ∂Jk−1/∂u(k−1) = 0:
uo(k−1) = − 􏰈R(k−1) + B(k−1)T U(k)B(k−1)􏰉−1 B(k−1)T U(k)A(k−1)x(k−1)
The optimal strategy is a proportional feedback controller! We introduce∗ the gain matrix
F(k−1), so that
F(k−1) := 􏰈R(k−1) + B(k−1)T U(k)B(k−1)􏰉−1 B(k−1)T U(k)A(k−1)
uo(k−1) = −F(k−1)x(k−1).
If we substitute this optimal strategy into the cost function Jko−1, we immediately see that this is quadratic in x(k−1). In other words, we have so far shown that, if the optimal cost-to-go is quadratic at time k:
• the optimal control input at time k−1 is uo(k−1) = −F(k−1)x(k−1), a linear feedback controller.
• the optimal cost-to-go at time k − 1 is quadratic in the state at time k − 1.
To complete the inductive proof, we simply need to show that the optimal cost-to-go at
the final time step is quadratic in the state. At time N: optimal cost-to-go is JNo (x(N)) = 1x(N)T Sx(N) := 1x(N)T U(N)x(N)
22
where U(N) = S. This completes the proof by induction, and yields the linear quadratic regulator.
∗Note the sign convention in the definition of F: being controls engineers, we feel more comfortable with negative feedback.
14 – 6

14.2.3 Implementation
Initialize the Ricatti equation with U(N) = S. Iterate backwards to compute U(k): U(k−1) :=Q(k−1) + A(k−1)T U(k)A(k−1)
− A(k−1)T U(k)B(k−1) 􏰈R(k−1) + B(k−1)T U(k)B(k−1)􏰉−1 B(k−1)T U(k)A(k−1) Compute the feedback gain F(k) using U(k):
F(k−1) := 􏰈R(k−1) + B(k−1)T U(k)B(k−1)􏰉−1 B(k−1)T U(k)A(k−1) Finally, implement the controller:
uo(k−1) = −F(k−1)x(k−1). We note that uo(k) depends on
• the state vector x(k)
• system matrices A(k) and B(k) and the cost matrix R(k)
• U(k+1),whichdependsonQ(k+2),A(k+1),B(k+1),andU(k+2)…
Iterating gives: u(0) depends on {A(k),B(k),R(k),Q(k+1)}N−1 (i.e., it depends on the k=0
present and the future of the system).
Note that U(k) does not depend on the actual values that the state x(k) takes, and can thus be computed offline.
14.3 Infinite horizon LQ control
Next, we will investigate what happens when k → ∞. There will be many similarities between this and the steady state Kalman filter.
See the script.
14 – 7

14.3.1 Example: double integrator
Plant dynamics:
􏰌x1(k+1)􏰍 􏰌1 1􏰍 􏰌x1(k)􏰍 􏰌 1 􏰍
x2(k+1) = 0 1 x2(k) + 12 u(k) x(k+1) = Ax(k) + Bu(k)
Performance index:
JN (x0) = where
1 2
x(N)T Sx(N) +
1 N−1
􏰇 􏰈x(k)T Qx(k) + Ru2(k)􏰉
2 k=0
U(N) ≥ 0
􏰌2 0􏰍
Q = 0 0 , R = 10
Note that the matrices A, B, Q, and R are time invariant.
We examine the convergence of U(k) with different choices for the terminal cost S =
U(k) = AT U(k+1)A + Q − AT U(k+1)B 􏰈R + BT U(k+1)B􏰉−1 BT U(k+1)A
􏰌1 0􏰍 •N=30,U(N)= 0 1
14 – 8

20 15 10
5
0
0 5 10 15 20 25 30
Time step k
P1,1 (k) P1,2 (k) P2,2 (k)
􏰌20 0􏰍 •N=30,U(N)= 0 0
20 15 10
5
0
0 5 10 15 20 25 30
Time step k
P1,1 (k) P1,2 (k) P2,2 (k)
Note:
• regardless of the boundary condition U(N) ≥ 0, the solution of the Riccati equation
converges to the same steady state U∞, • the control law
uo(k) = − 􏰈R + BT U(k+1)B􏰉−1 BT U(k+1)Ax(k) thus converges (backwards) to
uo(k) = − 􏰈R + BT U∞B􏰉−1 BT U∞A x(k) 􏰐 􏰏􏰎 􏰑
F∞
14 – 9
PP

Convergence of the Riccati equation solution
We state without proof: For N → ∞, if (A, B) is stabilizable then U(k) is bounded.
Example 14.1:
Some intuition behind this statement: if (A,B) is not stabilizable, then there are unstable, uncontrollable modes that may cause
1 N−1 􏰪 􏰇 􏰄x(k)T Qx(k) + u(k)T R(k)u(k)􏰤
Consider for example x(k+1) = 2x(k) + 0u(k); Q = 1; and R = 1. The uncontrollable mode is unstable. 􏰃∞k=0 x(k)T Qx(k) blows up regardless of u(k), and the Riccati equation becomes
U(k) = 4U(k+1) + 1
The backward integration drives U(0) to infinity. Note that the forward difference
equation is U(k+1) = 1U(k) − 1, and is thus stable. 44
􏰧1 N→∞ N→∞ 2
x(N)T Sx(N) +
= ∞
lim JN (x0) = lim
yielding Jo (x0) = 1xTo U (0)x0 = ∞ ⇒ limN→∞ ||U (0)|| = ∞
N2
2k=0
We’ll need some background before we investigate this statement further:
Lyapunov stability
Loosely speaking, a system is stable in the sense of Lyapunov if we can find a Lyapunov function, V (x(k)) which has the properties:
• V(x(k))≥0forallx(k)̸=0,
• V(0)=0,and
• V (x(k+1)) < V (x(k)), i.e. the function decreases over time. For linear systems, a typical choice of Lyapunov function is quadratic; i.e. for some matrix U=UT >0:
V (x(k)) := x(k)T Ux(k)
(14.4)
14 – 10

Theorem 14.1. For a system x(k+1) = Ax(k), and given a Q = QT > 0, if the following Lyapunov equation has a unique solution U = UT > 0, then A is asymptotically stable:
ATUA−U =−Q
Proof. Consider the following Lyapunov function: V (x) := xT U x.
We now look at the evolution of V along a trajectory x(k+1) = Ax(k): V (x(k+1)) − V (x(k)) :=x(k+1)T Ux(k+1) − x(k)T Ux(k)
=x(k)T AT U Ax(k) − x(k)T U x(k)
=x(k)T 􏰈ATUA−U􏰉x(k) = − x(k)T Qx(k)
<0 because Q > 0
We will require the following extension of Theorem 14.1:
Theorem 14.2 (An extension of Lyapunov theory based on observability). If we find from a Lyapunov equation ATUA−U = −Q where U > 0 for Q = CTC ≥ 0, and (A,C) is observable, then the system x(k+1) = Ax(k) is asymptotically stable.
Proof. (By contradiction) Assume (A,C) is observable and there exists U, Q to satisfy AT U A − U = −Q, and that A has at least one eigenvalue on the unit circle, i.e. is not asymptotically stable.
Since U > 0 and Q ≥ 0, the system is stable in the sense of Lyapunov (note, not necessarily asymptotically stable). All eigenvalues of A are hence on or inside the unit circle (at least one exists by assumption). Pick any eigenvalue-eigenvector pair (λ,v) where λ is on the
unit circle. Then ||Cv||2 = v∗Qv = −v∗ 􏰈ATUA−U􏰉v = −􏰒|λ|2 −1􏰓v∗Uv = 0, which implies Cv = 0. We thus have
Av=λv, Cv=0, CAv=λCv=0, ···, CAn−1v=0 so that
C  CA 
⇒ . v=0 .
C An−1
14 – 11

Since the observability matrix is not full rank, (A, C ) is not observable, but this contradicts our assumption.
Observability Gramian
Consider the discrete time system x(k+1) = Ax(k)
y(k) = Cx(k)
We define the observability Gramian W as follows:

W :=􏰇􏰈AT􏰉kCTCAk k=0
(14.5)
The Gramian is related to the observability of a system, and specifically if (A, C) is observ- able, then the sum converges, and W exists (we won’t prove this) The Gramian allows to compare the relative observability of two systems (recall, the rank tests only gave binary answers).
Closely related is the controllability Gramian, which is a measure of the cost to control a system to/from a certain state.
Proofs of some properties as N → ∞ • system dynamics (time invariant):
x(k+1) = Ax(k) + Bu(k), x (0) = xo
• performance index:
1 􏰇∞ 2 k=0
(14.6)
J =
􏰈x(k)TQx(k)+u(k)TRu(k)􏰉, Q≥0, R>0
Note that there is no terminal cost x(N)TSx(N) – the relative influence of the terminal cost decreases with increasing horizon.
14 – 12

The optimal state-feedback control law is uo(k) = − 􏰈R + BT U∞B􏰉−1 BT U∞Ax(k)
=: −F∞x(k)
which requires the solution of the Discrete Algebraic Riccati equation†:
U∞ =ATU∞A+Q−ATU∞B􏰈R+BTU∞B􏰉−1BTU∞A (14.7) We will now look for conditions under which (14.7) gives a meaningful result. Introduce
the matrix C so that Q = CTC, which is always possible for Q = QT ≥ 0.
Then, if (A,B) stabilizable and (A,C) detectable, then we have guaranteed closed-loop
asymptotic stability for x(k+1) = (A − BF∞) x(k).
We will prove two parts of this, given observability of (A,C): (part one) this implies
U∞ > 0, (part two) the resulting closed-loop system is stable.
Part one
Here, we will show only that “(A,C) observability ⇒ U∞ > 0”: with uo(k) = −F∞x(k) andQ=CTC. WedefineAcl :=(A−BF),andnotethat
x(k) = Aclx(k−1) = A2clx(k − 2) = · · · = Akclx(0) u(k) = −F∞x(k)
We define the following pseudo-output for the system under closed-loop control:
􏰌C􏰍 y􏰫(k)=R1/2F x(k)

where 􏰈R1/2􏰉T R1/2 = R.
By assumption, the pair (A,C) is observable, which means that the system with output y􏰫 is also observable (we’ve simply added more outputs). Since closed-loop feedback does not affect observability, the closed-loop system is thus also observable, and therefore the corresponding observability Gramian as defined by (14.5) exists:
Akcl
†e.g. using scipy.linalg.solve discrete are.
14 – 13
􏰇∞ k􏰌C􏰍T􏰌C􏰍 Wcl := 􏰈ATcl􏰉 1/2 1/2
k=0 RF∞RF∞ and Wcl > 0.

We now compute J∞o (x(0)) =
= =
An analogous analysis can be applied to the detectability case.
Part two
Now we show the closed-loop stability of the below system, where we’ve introduced the output y􏰫:
the cost-to-go at time zero
x (0)T U∞x (0) ∞
􏰇 􏰈xT (k)Qx(k) + uT (k)Ru(k)􏰉 k=0
􏰇∞ 􏰔 􏰌 C 􏰍 T 􏰌 C 􏰍 􏰕 xT (k) R1/2F R1/2F x(k)
k=0 ∞ ∞ 􏰇∞􏰔􏰒􏰓T􏰌C􏰍T􏰌C􏰍 􏰕
xTo Akcl R1/2 F R1/2 F Akcl xo k=0 ∞ ∞
xTo Wclxo i.e. U∞ =Wcl >0.
= =
x(k+1) = (A − BF∞) x(k) = Aclx(k) 􏰌C􏰍
y􏰫(k)=R1/2F x(k) ∞
(14.8)
We use Theorem 14.2 by transforming the Riccati equation to a Lyapunov equation: U∞ =ATU∞A+Q−ATU∞B􏰈R+BTU∞B􏰉−1BTU∞A
􏰐 􏰏􏰎 􏰑
=F∞
=(A−BF∞)T U∞(A−BF∞)+2ATU∞BF−FTBTU∞BF+Q−ATU∞BF∞
􏰐 􏰏􏰎 􏰑
=AT U∞A =ATclU∞Acl +ATU∞BF∞ −FTBTU∞BF +Q
=ATclU∞Acl +F∞T 􏰈R+BTU∞B􏰉F∞ −FTBTU∞BF +Q 􏰐 􏰏􏰎 􏰑
=AT US B = ATclU∞Acl + F∞T RF∞ + Q
from which follows that
T 􏰌 C 􏰍T􏰌 C 􏰍
AclU∞Acl − U∞ = − R1/2F R1/2F ∞∞
􏰐 􏰏􏰎 􏰑
≥0
14 – 14
(14.9)
(14.10)

observability of (14.8) plus U∞ > 0 ⇒ closed-loop stability from (14.10) A counter-example of detectability requirement
Consider
x(k+1)=2x(k)+u(k), Q=0, R=1
• the state cost Q is zero, the optimal control is uo(k) = 0
• the closed-loop system is thus unstable, with eigenvalue at 2.
• (A, C ) is undetectable, since C = 0 and unstable
• the Riccati equation however still converges to U∞ = 3, as (A, B) is controllable
Discussion
• Guaranteed closed-loop stability (just proven)
• Good margins for single-input systems (stated without proof):
􏰔1􏰬 R 􏰕 2 R+BTU∞B
An alternative method to pole-placement, for designing a static feedback gain F. Why choose LQR, rather than pole placement?
• it is often easier to reason about the “cost” of tracking errors for the various states than to choose “good” closed-loop poles,
• the quadratic cost function is unlikely to be exactly what we care about, however, it is something we can solve for and captures the idea that “large” control values or state deviations are “worse” than “small” ones.
14 – 15
Phase margin > 2 sin−1 Gain margin > 􏰛
1
1− R+BTU∞B
R