程序代写代做 go 15. Linear Quadratic Gaussian Control

15. Linear Quadratic Gaussian Control
We have looked at how to design a feedback controller for a deterministic system, where we used dynamic programming to derive the LQR controller for linear systems. Earlier in the class we looked at how to estimate the state of stochastic systems, for example using the Kalman filter to track the PDF of a linear system corrupted by Gaussian noise. In this chapter, we investigate feedback control of stochastic systems, where we will restrict our- selves to linear systems, corrupted by Gaussian noise. The result will be Linear Quadratic Gaussian (LQG) control.
This derivation follows from D. P. Bertsekas, Dynamic programming and optimal control (3rd edition). Athena Scientific Belmont, MA, 2005, vol. 1, Chapters 4 & 5.
15.1 Stochastic control with perfect state knowledge
We begin by looking at the stochastic optimal control problem where the randomness only enters in the dynamics, but we have perfect measurements of the full system state:
x(k+1) = qk(x(k),u(k),v(k)) z(k) = x(k)
with x(0) = x0 known, E [v(k)] = 0, {v(·)} independent and with known PDFs, and with the cost
􏰖N−1 􏰗 J=E S(x(N))+􏰇Lk(x(k),u(k),k)
k=0
where v(k) is process noise, and w(k) is a measurement noise, qk is the dynamics function,
S(·) captures the terminal cost, and Lk(·,·) is the stage cost function. Last update: 2020-04-08 at 19:04:17
15 – 1

Compared to our earlier optimal control discussion, the difference is of course the process noise v(k), which means that the future states cannot be perfectly predicted (even if we knew the inputs u(k) in advance). That is why we include an expectation in the cost value – this makes it a deterministic quantity, even if the states x(k) are random.
We could attempt to solve this as a large batch-optimization problem, but this would be extremely difficult. Instead, we will again apply dynamic programming, and we again use the optimal cost-to-go function Jko(·), which is defined as below
 N−1 
Jko(x(k)):=minES(x(N))+ 􏰇Lj(x(j),u(j)) (15.1)
Uk j=k
with Uk = {u(k), u(k + 1), . . . , u(N − 1)} the inputs remaining from time k onwards. Note that x(k) is the argument, and the expectation is thus over the randomness from the process noise v(k), v(k+1), . . ..
With dynamic programming, we start at the end and work backwards. The boundary condition here is JNo (x(N)) := S(x(N)). The iteration follows as below, where we assume we have the optimal cost-to-go at time k + 1:
 N−1  Jko(x(k)):=minES(x(N))+ 􏰇Lj(x(j),u(j))
Uk j=k
 N−1 
=minminELk(x(k),u(k))+S(x(N))+ 􏰇 Lj(x(j),u(j))
u(k) Uk+1
  N−1 
=minLk(x(k),u(k))+minES(x(N))+ 􏰇 Lj(x(j),u(j))
(15.2)
(15.3)
(15.4) (15.5)
u(k) Uk+1
j=k+1
=min􏰈L (x(k),u(k))+E􏰊Jo (q (x(k),u(k),v(k)))􏰋􏰉 u(k) k k+1 k
wherein:
• from (15.2) to (15.3) is just a re-ordering,
• from (15.3) to (15.4) exploits the fact that x(k) is not random (it’s the argument), neither is u(k) (it’s a decision variable), so they can be pulled out of the expectation & minimization over Uk+1, and
• from (15.4) to (15.5) uses the cost-to-go from k + 1 (we assumed is known). 15 – 2
j=k+1

This is, of course, great – it’s exactly what we had for the deterministic case. Note though, that there are some subtleties which we won’t go into detail on (see Bertsekas):
• This is technically a functional minimization, in that we’re minimizing over policies for u(k) rather than specific values x(k). Intuitively – even if we know x(0) perfectly, we cannot state all future optimal decisions, because they will depend on the actual noise values encountered v(k). Thus, our result here is a feedback rule, rather than a feedback value.
• Is it legal to swap the expectation and minimization?
As in the deterministic case, we note that we are recursively solving the functions Jko, and that they implicitly contain the optimal feedback rule at each time k. In general, it is impossible to actually solve these functions.
15.1.1 Linear quadratic problems
In the special case that the dynamics are linear, and the cost is quadratic, we can however solve the recursion exactly. In the following, we will immediately restrict ourselves to time- invariant systems, as it makes the notation simpler. The extension to time-varying systems is straight-forward.
x(k+1) = Ax(k) + Bu(k) + v(k) z(k) = x(k)
with E [v(k)] = 0, {v(·)} independent, and Var [v(k)] = V . Instead, we optimize for the expected quadratic cost:
 N−1 
J = E 1x(N)T Sx(N) + 1 􏰇 􏰈x(j)T Qx(j) + u(j)T Ru(j)􏰉 (15.6) 2 2 
j=0 withS=ST ≥0,Q=QT ≥0,R=RT >0.
We will show (using induction) that the optimal cost-to-go function has the following form:
Jko(x(k)) = 1x(k)T U(k)x(k) + b(k) (15.7) 2
with U(k) = U(k)T ≥ 0, and b(k) a scalar.
15 – 3

Recursion
We first assume that at time k + 1 we have
Jo (x(k+1)) = 1x(k+1)T U(k+1)x(k+1) + b(k+1)
(15.8)
(15.9) (15.10)
(15.11)
k+1 2
We can immediately apply the general solution (15.5) to have: and Jo(x(k))=min􏰈L (x(k),u(k))+E􏰊Jo (q (x(k),u(k),v(k)))􏰋􏰉
u(k) k k+1 k
k
2
This can be reduced, noting that it will have terms that are:
􏰅1􏰈T T􏰉 =min x(k) Qx(k)+u(k) Ru(k) +
u(k) 2
1E 􏰘(Ax(k) + Bu(k) + v(k))T U(k+1) (Ax(k) + Bu(k) + v(k))􏰙􏰆
• quadratic in x(k) and u(k), neither of which is random,
• linear in v(k), and all of these terms will evaluate to zero (since v(k) is the only
random quantity, and E [v(k)] = 0), • quadratic in v(k).
Doing the algebra yields:
Jko(x(k))=1min􏰈x(k)T 􏰈Q+ATU(k+1)A􏰉x(k)
2 u(k)
+u(k)T 􏰈R + BT U(k+1)B􏰉 u(k) +2x(k)T AT U(k+1)Bu(k) +E􏰊v(k)TU(k+1)v(k)􏰋􏰉
(15.12)
(15.13) (15.14) (15.15) (15.16) (15.17)
The last term can be expressed in terms of Var [v(k)]:
E􏰊v(k)TU(k+1)v(k)􏰋 =E􏰊trace􏰈v(k)TU(k+1)v(k)􏰉􏰋 =E􏰊trace􏰈U(k+1)v(k)v(k)T􏰉􏰋 =trace􏰈E􏰊U(k+1)v(k)v(k)T􏰋􏰉 =trace􏰈U(k+1)E􏰊v(k)v(k)T􏰋􏰉
=trace (U(k+1)V ) 15 – 4

We note that then (15.12) is a minimization over a quadratic function of u(k), which we can compute by taking the derivative and setting to zero. This yields:
uo(k) = − 􏰈R + BT U(k+1)B􏰉−1 BT U(k+1)Ax(k) (15.18) This is the same equation we had for determinstic LQR! The input is again a linear function
of the state x(k).
Substituting now this uo(k) into the cost-to-go function of (15.12) we see that Jo(k) also
has the form
Jko(x(k)) = 1x(k)T U(k)x(k) + b(k) 2
where
U(k) = AT U(k+1)A − AT U(k+1)B 􏰈R + BT U(k+1)B􏰉−1 BT U(k+1)A + Q
b(k) = 1trace(U(k+1)V)+b(k+1) 2
We note that the basis step for the induction is
JNo (x(N)) = 1x(N)T Sx(N) 2
thereby completing the proof.
Discussion
We note:
• the optimal control law is a proportional feedback controller,
(15.19)
(15.20)
• the Riccati equation is the same as those in the regular LQ problem, with the addi- tional cost b(k),
trace(U(j)V)
If v(k) = 0 identically, then there is no additional cost (sanity check, we recover the deterministic result).
15 – 5
• this additional cost is due to the process noise, v(k):
1 1􏰇N
b(N)=0, b(k)=2trace(U(k+1)V)+b(k+1)=2
j=k+1

• the final optimal cost is as below, with x(0) = x0 o 􏰌1T 􏰍
J (x(0)) = E 2x(0) U(0)x(0) + b(0)
1T 1􏰇N
trace(U(j)V)
(15.21)
= 2xo U(0)xo + 2
j=1
We note that the first term is the familiar deterministic LQ cost, with the additional term capturing the added cost due to uncertainty in the system. Since V ≥ 0 and U(j) ≥ 0, we see that the process noise increases the cost of control.
15.2 Stochastic control with imperfect state knowledge
We will now consider the case where we cannot measure the full state without noise. We consider the general noisy output system
x(k+1) = qk(x(k),u(k),v(k)) z(k) = hk(x(k),w(k))
with the cost
􏰖N−1 􏰗 J=E S(x(N))+􏰇Lk(x(k),u(k),k)
k=0
where v(k) is process noise, and w(k) is measurement noise, and the initial condition is
x(0) also random, with our usual independence assumptions. In practise, of course, our systems look like this:
• the dimension of the measurements is usually lower than the dimension of the state (we cannot measure all states),
• the sensors are noisy.
15.2.1 Transformation to a system with perfect state knowledge
We will first see one way in which we can rewrite this as a problem for a system with perfect state knowledge, so that we can apply our knowledge from Sec. 15.1. We will gloss over some details, see Bertsekas for a complete discussion.
15 – 6

We introduce an information vector, i(k) which just captures all of the things that are available to us at time k, and has the dynamics:
i(k+1) = (i(k), z(k+1), u(k))
We note that this defines the evolution of a system where we consider u(k) a known input and z(k+1) a random disturbance. Furthermore, the size of the state space increases at every time step.
We look at the optimal cost-to-go at time N − 1:
􏰁
JNo−1(i(N−1))= min u(N −1)
E [S(q(x(N−1),u(N−1),v(N−1))) x(N −1) ,v(N −1)
+L (x(N −1), u(N −1), v(N −1)) |i(N −1)]}
and more generally, using (15.5),
􏰁􏰂 Jo(i(k)) = min E 􏰊L (x(k), u(k)) + Jo (i(k+1)) |i(k)􏰋
(15.22)
k u(k) x(k) ,v(k) ,z(k+1) k+1
Note that we’re writing this as the expectation conditioned on the information vector of
the current time step, to make the functional dependence clear.
This yields a dynamic programming problem for a system with perfect state knowledge, where an optimal policy can be found by minimizing over all possible information vectors i(k), and working backwards (though this may be prohibitively expensive computationally).
15.2.2 Linear systems
We consider again the LTI system (with the time-varying extension omitted for simplicity): x(k+1) = Ax(k) + Bu(k) + v(k) (15.23)
z(k) = Hx(k) + w(k) (15.24)
With {v (·)} and {w (·)} zero-mean and independent noise sequences with known PDFs.
Our cost function will be the usual
 N−1 
J = E 1x(N)T Sx(N) + 1 􏰇 􏰈x(j)T Qx(j) + u(j)T Ru(j)􏰉 (15.25) 2 2 
j=0
15 – 7

In our previous analyses, we had access to the system state, and the optimal input was given by a function of x(k) (which turned out to be a linear feedback). Now, this is no longer possible, as we don’t have access to the system state, and instead only have access to the measurement sequence z(0), . . . , z(k) and the past inputs u(0), . . . , u(k−1).
We look at the cost to go at time N − 1, from (15.22), using the information vector i(N −1): o1􏰁􏰊TT
JN−1(i(N−1))= min E x(N−1) Qx(N−1)+u(N−1) Ru(N−1) 2 u(N−1) x(N−1),v(N−1)
+(Ax(N−1)+Bu(N−1)+v(N−1))T S(Ax(N−1)+Bu(N−1)+v(N−1))|i(N−1)􏰙􏰦 (15.26)
Simplifying, and exploiting that E [v(k)] = 0 and that v(k) is independent of i(k), gives
(15.27)
(15.28)
JNo−1(i(N−1))=1 E 2x(N−1)
􏰊x(N−1)T 􏰈Q+ATSA􏰉x(N−1)|i(N−1)􏰋 +1 E 􏰊v(N−1)TSv(N−1)􏰋
2v(N−1)
+1 min 􏰄u(N−1)T􏰈BTSB+R􏰉u(N−1)
2 u(N−1)
+2E[x(N−1)|i(N−1)]T ATSBu(N−1)􏰦
We can minimize this explicitly, to get
uo(N−1) = − 􏰈BT SB + R􏰉−1 BT SA E [x(N−1)|i(N−1)]
with
= −F(N−1)E[x(N−1)|i(N−1)] F(N−1) = 􏰈BT SB + R􏰉−1 BT SA
We note that (15.28) has the same form as the LQR feedback, except that we now have E[x(N−1)|i(N−1)] in the place of the previously perfectly known state x(N−1).
(15.29)
The cost to go at N − 1 is
JNo −1(i(N−1)) =1 E 􏰊x(N−1)T U(N−1)x(N−1)|i(N−1)􏰋
2x(N−1)
+1 E 􏰘(x(N−1)−E[x(N−1)|i(N−1)])TM(N−1)
2x(N−1)
(x(N −1) − E [x(N −1)|i(N −1)]) |i(N −1)]
+ E 􏰊v(N−1)TSv(N−1)􏰋 v(N −1)
15 – 8

with
M(N−1) = AT SB 􏰈R + BT SB􏰉−1 BT SA
U(N−1) = AT SA − M(N−1) + Q
Comparing this to the perfect state information case, we notice an additional term that
penalizes the estimation error x(k) − E [x(k)].
We can now repeat this for N − 2, where we will exploit the fact that the input u(N−2) will not affect the state estimation error at time N − 1∗. This will, by induction, yield the input sequence
uo(k) = −F(k)E[x(k)|i(k)] with recursion
U(N) = S (15.30) U(k) = AT U(k+1)A + Q − AT U(k+1)B 􏰈R + BT U(k+1)B􏰉−1 BT U(k+1)A (15.31)
Thus, the optimal input is a linear feedback law, based on an estimate of the state E [x(k)|i(k)]. This estimate is generated based on the information i(k) available at time k, and is therefore implementable.
We note that this optimal control can be decomposed into two parts:
1. an estimator, which uses the information vector to generate the estimate E [x(k)|i(k)]
2. a linear feedback controller, with gain matrix F(k) a function of known problem data, and independent of all statistical data of the noise sources.
Separation theorem
This gives the separation theorem: the optimal control strategy for the noisy system can be split into two parts:
1. estimate the state x(k) while ignoring feedback control,
2. choose an input u(k) as though the estimate were exact, ignoring all stochasticity.
∗This is intuitive, for a linear system the known input doesn’t affect estimate quality. For a proof, see D. P. Bertsekas, Dynamic programming and optimal control (3rd edition). Athena Scientific Belmont, MA, 2005, vol. 1, Lemma 5.2.1.
15 – 9

Certainty equivalence
This is related to the following observation: the optimal controller applies at time k the control that would be optimal for the deterministic system with the corresponding cost- to-go, replacing the process noise v (·) with its expected value (zero), and the state with its estimate E[x(k)|i(k)]. This is called certainty equivalence, the observation that the optimal decision for the quadratic cost depends only on the expected value of the random variables, and specifically we can replace the random variables with their expectations and then solve a “certain” problem, rather than a random problem.
We can get some intuition for this, by looking at the following simple problem: let a and b be known scalars, let x and w be two independent scalar random variables (with E [x] = xˆ, E [w] = wˆ), and let u be a decision variable. We will compare the optimal choice of u for two different cost functions: the full cost J and the ‘certainty’ cost Jc, defined as:
J :=E􏰘(ax+bu+w)2􏰙 Jc :=(axˆ+bu+wˆ)2
Let uo be the input to minimize J, and uoc the input to minimize Jc:
uo = argmin􏰥E􏰘(ax+bu+w)2􏰙􏰦 u
= arg min 􏰄b2u2 + (2abxˆ + 2bwˆ) u􏰤 = −axˆ − 1wˆ ubb
uoc =argmin􏰥(axˆ+bu+wˆ)2􏰦=−axˆ−1wˆ ubb
This means that the optimal u depends on the random variables only through their expec- tation † .
15.2.3 Linear systems with Gaussian noise
We have seen that the optimal input at time k is a linear function of the estimate at that time, using the information vector i(k). For the special case that the process noise v, the measurement noise w, and the initial condition x(0) all additionally have Gaussian distribution, we know that we can compute the conditional estimate recursively using the Kalman filter.
In this case, we have thus proven that the optimal control strategy consists of using a Kalman filter to generate the state estimate E [x(k)|i(k)] (which we can do recursively, and
†Does this hold in general (do the PDFs matter? what if the we were interested in some nonlinear combination of x, w, and u? what if our cost were not quadratic?)? Is the optimal cost value the same?
15 – 10

computationally efficiently, without needing to keep the whole information vector i(k)). We then apply the linear feedback rule to get uo = −F(k)E[x(k)|i(k)], which we saw had exactly the same form as the LQR problem.
This is illustrated in the figure:
15.2.4 Summary
LQR (deterministic systems, or systems with perfect state information):
 N−1 
J = E 1xT (N)Sx(N) + 1 􏰇 􏰈xT (j)Qx(j) + uT (j)Ru(j)􏰉 2 2 
j=0
x(k+1) = Ax(k) + Bu(k) + v(k)
uo(k) = − 􏰈R + BT U(k+1)B􏰉−1 BT U(k+1)Ax(k) LQG (stochastic systems with imperfect state information):
N−1 
J=E 1xˆT(N)Sxˆ(N)+1􏰇􏰈xˆT(j)Qxˆ(j)+uT(j)Ru(j)􏰉 22 
j=0
x(k+1) = Ax(k) + Bu(k) + v(k)
xˆ(k+1) = Axˆ(k) + Bu(k) + K(k+1) (z(k+1) − H (Axˆ(k) + Bu(k)))
15 – 11

with K(k) the Kalman filter gain at time k. The optimal control is proportional to the estimated state:
uo(k) = − 􏰈R + BT U(k+1)B􏰉−1 BT U(k+1)Axˆ(k)
U(k) = AT U(k+1)A − AT U(k+1)B 􏰈R + BT U(k+1)B􏰉−1 BT U(k+1)A + Q Comparison of costs
• Deterministic linear system, quadratic cost (LQR): Jo = 1xTo U(0)xo
2
(15.32)
• Stochastic linear system but with perfect, full-state measurement, quadratic cost (see (15.21)):
1 1 N−1
Jo= xToU(0)xo+ 􏰇trace(U(k+1)V)
k=0
• Stochastic linear system with Gaussian noise, and quadratic cost (LQG):
Jˆo =1xTo U(0)xo + 1trace(U(0)P(0)) 22
1 N−1
􏰇trace􏰈KT(k+1)U(k+1)K(k+1)􏰈H􏰈AP(k)AT +V􏰉HT +W􏰉􏰉 k=0
(15.33) where P(k) is the estimate variance at time k, and K(k) is the Kalman filter gain.
Note: the additional noise does not change the control feedback function (always linear, with the same gain), but it does increase the cost.
15.3 Steady-state LQG
Next, we investigate stationary solutions to the LQG problem.
15 – 12
22
+
2

The goal is to minimize
J′ = lim J = lim E xT(N)Sx(N) + 1 􏰇 􏰈xT(j)Qx(j)+uT(j)Ru(j)􏰉
 N−1 
N→∞NN→∞2N 2N = 1E􏰊xT(k)Qx(k)+uT(k)Ru(k)􏰋
2

We know that, for finite horizons, the solution is equivalent to solving separately for a deterministic LQR controller, and a Kalman filter. Therefore, for the infinite horizon, the solution is u = −F∞xˆ(k): steady-state LQR with steady-state Kalman filter
F∞ = 􏰈R + BT U∞B􏰉−1 BT U∞A
U∞ =ATU∞A−ATU∞B􏰈R+BTU∞B􏰉−1BTU∞A+Q K∞ =P∞HT 􏰈HP∞HT +W􏰉−1
P∞ =AP∞AT −AP∞HT 􏰈HP∞HT +W􏰉−1HP∞AT +V Convergence and stability
We know that if
• (A, Cv) and (A, B): stabilizable (with V = CvT Cv), and
• (A, Cq) and (A, H): detectable (with Q = CqT Cq)
(15.34) (15.35) (15.36) (15.37)
then the Ricatti equations (for the LQR and for the Kalman filter) will converge, and the separate systems will be stable (that is, the estimator in isolation, and the perfect- information controller in isolation).
We can look at the closed-loop system, ignoring noise, and with feedback matrix F∞ and state observer gain K∞:
x(k+1) = Ax(k) − BF∞xˆ(k) z(k) = Hx(k)
xˆ(k+1) = Axˆ(k) − BF∞xˆ(k) + K∞ (H (Ax(k) − BF∞xˆ(k)) − H (Axˆ(k) − BF∞xˆ(k))) =(A−BF∞ −K∞HA)xˆ(k)+K∞HAx(k)
We introduce the estimation error, e(k) := x(k) − xˆ(k), and note that it evolves as e(k+1) = (I − K∞H) Ae(k)
15 – 13
j=0

and rewrite
x(k+1) = (A − BF∞) x(k) + BF∞e(k)
so that
􏰌x(k+1)􏰍 􏰌(A − BF∞) BF∞ 􏰍 􏰌x(k)􏰍
e(k+1) = 0 (I−K∞H)A e(k)
We can thus conclude that the eigenvalues of the closed loop systems are the eigenvalues of A − BF∞ (that is, the perfect-information LQR) and (I − K∞H)A (that is, the Kalman filter). Since both of these are stable by the assumptions above, we can conclude that the closed-loop system is also stable.
Note that this argument (separation of the eigenvalues) also holds for any stabilizing con- troller and stabilizing estimator (not only the optimal ones).
15.4 Caveat
We saw that the LQR controller had very good robustness properties, for all systems to which it applied. Likewise, the Kalman filter has very good robustness properties. However, it turns out that the LQG setup has no such margin guarantees. Counter-examples can be constructed where the margins are arbitrarily small – thus, the designer must check the margins for a design, if the state is not known perfectly (as is the case for all practical systems). See J. Doyle, “Guaranteed margins for LQG regulators,” IEEE Transactions on Automatic Control, p. 756, 1972.
15 – 14