程序代写代做 Bayesian algorithm 7. The optimal state estimator

7. The optimal state estimator
In this short chapter, we will present the optimal state estimator for a very general class of systems. Although this estimator will be easy to define, we will see that it is, in general, intractable to solve.
7.1 Model
This is our general problem statement. Let x(k) ∈ X be the vector-valued state at time k (k = 0,1,…), to be estimated. State can be DRV (finite or infinite), or CRV. Let z(k), k = 1, 2, . . . , be a vector valued measurement that we can observe. It can be a continuous or a discrete random variable.
x(k) = qk−1 (x(k−1),v(k−1)), k = 1,2,… z(k) = hk (x(k), w(k)) ,
where x(0), {v(·)} and {w(·)} are independent with known PDFs.
7.2 Optimal recursive estimator
Our goal is to compute an estimate xˆ of the system state x, using prior knowledge (that is, knowledge of x(0)) and the sequence of measurements until time k. We want to do this recursively, that is, we don’t want to store the full history of past measurements.
The estimate will be optimal in a sense that we are free to define as a function of the PDF Last update: 2020-02-04 at 17:48:04
7–1

of the state. A typical choice is Minimum Mean Squared Error estimator (which is the conditional mean of the state, given the measurement sequence).
There are thus two components to the optimal estimator:
1. Compute the conditional PDF of the state, f(x(k)|z(1:k)).
2. Extract the optimal estimate xˆ from this PDF using the techniques from the previous chapter.
Recursive estimator equations
Only the results are given here. Here, we write it for x CRV (in case of DRV, just replace integrals with sums). The recursion has two steps, a prior update, and a measurement update.
Initialization
We are given knowledge about the system’s initial condition in the following PDF: f(x(0)|z(1:0)) := f(x(0))
Prior update The prior update pushes forward the time index to compute f(x(k)|z(1:k−1)). • Start with total probability theorem, conditioning on x(k−1):
􏰢
f (x(k)|x(k−1), z(1:k−1)) f (x(k−1)|z(1:k−1)) dx(k−1),
x(k−1) ∈X
• Simplify f(x(k)|x(k−1),z(1:k−1)) by noting that x(k) = qk−1 (x(k−1),v(k−1)):
– given x(k−1) we have that x(k) is a function only of v(k). By assumption, v(k) is independent of all other quantities.
– Note that z(k−1) is is a function of x(k−1), and thus a function of v(k−2), but not of v(k−1).
7–2
f (x(k)|z(1:k−1)) =

– Thus x(k) is conditionally independent of z(1:k−1) given x(k−1): f(x(k)|x(k−1),z(1:k−1)) = f(x(k)|x(k−1))
which can be calculated from f(v(k−1)) and qk−1(·,·) using change of variables (although it may not be straightforward to do so).
This gives the prior update step:
􏰢
f (x(k)|z(1:k−1)) =
f (x(k)|x(k−1)) f (x(k−1)|z(1:k−1)) dx(k−1),
x(k−1) ∈X
Measurement update The measurement update incorporates any information from the latest
measurement, z(k), i.e. we wish to compute f(x(k)|z(k),z(1:k−1)) = f(x(k)|z(1:k))
• Start with Bayes’
f(x(k)|z(k),z(1:k−1)) = f(z(k)|x(k),z(1:k−1))f(x(k)|z(1:k−1)) f (z(k)|z(1:k−1))
• Note that z(k) = hk (x(k), w(k)), i.e. given x(k) only a function of w(k)
• Past measurements z(1:k−1) depend on w(1:k−1), by assumption independent of w(k)
• Thus f(z(k)|x(k),z(1:k−1)) = f(z(k)|x(k)) and can be computed from f(w(k)) and the change of variables through hk(x(k), ·)
Finally, we use total probability theorem to arrive at
f (z(k)|x ̄(k)) f (x ̄(k)|z(1:k−1)) dx ̄(k)
Extracting an estimate
Having solved for the state’s conditional PDF, and having chosen a sense of optimality, it is now straight-forward to extract the optimal estimate. Most commonly, one wants to compute the Minimum Mean Squared Error estimate, so that
f(x(k)|z(1:k)) =
x ̄(k) ∈X
f(z(k)|x(k))f(x(k)|z(1:k−1)) 􏰢
xˆ(k) = E [x(k)|z(1:k)] .
7–3

7.3 Discussion
The above optimal estimator holds for a very general class of systems: we only required independence of the noise values, and the initial state. We note that, although the estimator is easy to state, it is in general intractable to compute. Only very rarely will there be an analytical solution to these equations.
The PDFs are functions over continuous variables (or DRVs with infinite domain), that is, the PDFs are infinite dimensional. Thus, even if we could come up with a numerical scheme for solving the equations (thus avoiding the problem of searching for analytical solutions), storing the PDF in our algorithm would require an infinite amount of memory.
Some systems allow us to actually solve this exactly. For example, where the state space X is discrete and finite – This allows us to capture the full conditional PDF with a finite amount of memory, and easily implement an optimal estimator (even for very complex nonlinear system dynamics and noise distributions).
In the next chapter we will see another situation where we can solve the conditional PDF exactly, this time for systems with CRV states. This will require imposing substantial constraints on the system dynamics.
In later chapters, we will look at approximation schemes, that give up solving the condi- tional PDF exactly, but are then applicable to a more broad class of systems.
7–4

8. The Kalman Filter as exact solution
We derive the Kalman Filter (KF) as the Bayesian state estimator that keeps track of the PDF of the system state conditioned on all past measurements, f(x(k)|z(1:k)), for linear time-varying systems with Gaussian process and measurement noise∗. The KF algorithm consists of a prior update step (based on the process model) and a measurement update step (based on the measurement model).
Exploring the problem structure (linear system and Gaussian distributions) allows us to derive the KF as the analytic solution to the Bayesian state estimation problem. The resulting recursive equations are straightforward matrix calculations.
We follow the derivation of the Kalman filter in [0].
8.1 Model
We consider the linear, time-varying system: x(k) = A(k−1)x(k−1) + u(k−1) + v(k−1)
z(k) = H(k)x(k) + w(k)
(8.1) (8.2)
x(k): state
u(k): known control input v(k): process noise
z(k): measurement
w(k): sensor noise
for k = 1, 2, . . . . The CRVs x(0), {v(·)}, and {w(·)} are mutually independent, Gaussian distributed:
Last update: 2020-02-04 at 17:48:04
∗AKA “the beautiful interpretation”. We’ve seen “the useful interpretation” with the recursive least
squares.
8–1

• •
8.2
x(0) ∼ N (x0, P0), i.e. x(0) has a Gaussian distribution with mean x0 and variance P0, v(k) ∼ N (0,V(k)), w(k) ∼ N (0,W(k)).
Gaussian Random Variable (GRV)
Recall: the PDF of a Gaussian distributed vector CRV y = (y1, . . . , yD) is given by 1 􏰅1 T−1 􏰆
f(y) = (2π)D/2 det(Σ)1/2 exp −2(y − μ) Σ (y − μ) where
• μ ∈ RD is the mean vector,
• Σ∈RD×D isthevariance,Σ=ΣT >0.
Notice that a GRV is fully characterized by its mean and variance.
Special case
Consider the case where Σ is a diagonal matrix: σ12 0 ··· 0
(8.3)
0 σ2 ··· 0 2
Σ=. . .. . . ….
0 0 · · · σ D2 􏰡 1
Then,
f(y)= 􏰛 exp− 2 .
􏰅 (yi−μi)2􏰆 i=1,…,D 2πσi2 2σi
That is, the PDF simplifies to the product of D scalar GRVs. Note that the variables are independent† if and only if Σ is diagonal.
†Spatially independent, do not confuse with temporally independent. For a vector random variable y(k), spatial independence means that all elements at a fixed time k (i.e. y1(k),…,yD(k)) are independent; whereas temporal independence refers to y(1), . . . , y(k) being independent.
8–2

Importance of the Gaussian property
The assumption of Gaussian random variables will allow us to derive the Kalman Filter as the analytic solution to the Bayesian state estimation problem. In particular, the following properties will be useful.
Property 1: An (affine) linear transformation of a GRV is a GRV
Let y be a GRV, and let x be defined by x = My + b with M a constant matrix and b a constant vector of appropriate dimensions. Then, x is a GRV.
We will show this for scalar random variables. • Let y ∼ N 􏰈μ,σ2􏰉. That is,
1 􏰅 (y−μ)2􏰆 f(y)=√2πσ2exp − 2σ2 .
Letx=my+b,withm̸=0andbconstants. Whatisfx(x)? • Using change of variables, we get
fy(y) 1 􏰔 1􏰅x−b 􏰆2􏰕 fx(x)=􏰀dx􏰀=|m|√2πσ2exp −2σ2 m −μ
􏰀􏰀 􏰀dy􏰀
That is, x ∼ N 􏰈b + mμ, m2σ2􏰉.
Notice that the derivation above is not the same as showing that the mean is b + mμ and
the variance is m2σ2, which is easy to do, and applies to non-Gaussian PDFs as well: E [x] = mE [y] + b = mμ + b
Var [x] = E 􏰊(my + b − (mμ + b))2􏰋 = E 􏰊m2(y − μ)2􏰋 = m2E 􏰊(y − E [y])2􏰋 = m2σ2. Property 2: A linear combination of two jointly GRVs is a GRV
Let x and y be two jointly Gaussian random variables, and let z be defined by z = Mxx + Myy, where Mx and My are constant matrices of appropriate dimensions. Then, z is a GRV.
8–3
1􏰅1 2􏰆 = √2πm2σ2 exp −2m2σ2(x−b−mμ) .

This property follows directly from Property 1 above:
z = Mξ with M = [Mx My] and ξ = (x,y) a GRV.
Hence, z ∼ N (μz,Σz). As before, we can calculate μz and Σz directly. Let x ∼ N (μx,Σx) and y ∼ N (μy,Σy), and we assume that x and y are independent. Then,
μz =E[z]=MxE[x]+MyE[y]=Mxμx +Myμy Σz =E􏰊(z−μz)(z−μz)T􏰋
=E􏰊(Mxx−Mxμx +Myy−Myμy)(Mxx−Mxμx +Myy−Myμy)T􏰋 =MxE􏰊(x−μx)(x−μx)T􏰋MxT +MyE􏰊(y−μy)(y−μy)T􏰋MyT
+MxE􏰊(x−μx)(y−μy)T􏰋MyT +MyE􏰊(y−μy)(x−μx)T􏰋MxT 􏰐 􏰏􏰎 􏰑 􏰐 􏰏􏰎 􏰑
=0 (by independence) =0 (by independence)
=MxΣxMxT +MyΣyMyT.
Property 3: conditioning jointly GRVs gives a GRV
Let x ∈ RNx and y ∈ RNy be two jointly Gaussian random variables, so that ζ = (x, y) ∈ RNx+Ny is a GRV, with ζ ∼ N (μζ,Σζ) and
􏰌μx 􏰍 􏰌Σxx Σxy 􏰍 μζ=μ andΣζζ=ΣT Σ
y xyyy
We now wish to show that x is conditionally Gaussian, that is that f(x|y) is a Gaussian distribution, and compute that distribution when the random variable y takes the value y ̄.
8–4

We do this by direct computation, substituting the Gaussian form of the PDFs: fx|y (x|y ̄) = fx,y (x, y ̄) = fζ ((x, y ̄))
=
1
􏰔 1􏰅􏰌x􏰍 􏰆T 􏰅􏰌x􏰍 􏰆􏰕 exp− −μζ Σ−1 −μζ ·
fy (y ̄)
(2π)(Nx+Ny)/2 det(Σζζ)1/2
fy (y ̄)
2 y ̄ ζζ y ̄
􏰔 1 􏰅1 T−1 􏰆􏰕−1
(2π)Ny/2 det(Σyy)1/2 exp −2(y ̄−μy) Σyy (y ̄−μy) 􏰔 1􏰅􏰌x􏰍 􏰆T 􏰅􏰌x􏰍 􏰆􏰕
∝exp − −μζ Σ−1 −μζ 2y ̄ ζζy ̄
􏰔 1􏰌x−μx􏰍T 􏰌Σxx Σxy􏰍−1􏰌x−μx􏰍􏰕
=exp − T (8.4)
where we treat y ̄ as a known constant. We thus immediately see that this result is quadratic in x, and has the functional form of the Gaussian, thus x conditioned on y is a GRV. That is why we ignored the proportionality constant – it’ll be whatever is necessary for the PDF to integrate to one.
Next, we’d like to compute the exact parameters (mean μ ̄ and variance Σ ̄), so that 􏰅1 T ̄−1 􏰆
xx xy yy xy
2 y ̄ − μy Σxy Σyy y ̄ − μy
fx|y(x|y ̄)∝exp −2(x−μ ̄) Σ (x−μ ̄) (8.5) We note that the inverse of the matrix Σζζ is given as below, using the general block-wise
matrix inversion equations:
􏰌Σ Σ 􏰍−1 􏰖 􏰈Σ −Σ Σ−1ΣT 􏰉−1
xx xy
ΣT Σ xy yy
−Σ−1Σ 􏰈Σ −ΣT Σ−1Σ 􏰉−1􏰗
􏰈
xx xy yy xy
T −1 􏰉−1 T −1
xx xy yy
= 􏰈
− Σyy−ΣxyΣxxΣxy ΣxyΣxx
xy xx xy T −1 􏰉−1
We now compare (8.4) and (8.5), and compute μ ̄ and Σ ̄ so that the first and second order terms in x are equal. Note that we don’t care about the terms that don’t contain x, as these are again simply constants.
Looking at the quadratic terms shows immediately Σ ̄ = Σ − Σ Σ − 1 Σ T
Comparing first-order terms in x gives
xTΣ ̄−1μ ̄=xTΣ ̄−1μ +xTΣ−1Σ 􏰈Σ −ΣT Σ−1Σ 􏰉−1(y ̄−μ )
xxxxyyyxyxxxy y
8–5
Σyy−ΣxyΣxxΣxy

which must hold for all x, so that (with some manipulation)
μ ̄=μ +Σ Σ−1(y ̄−μ) (8.6)
x xyyy y
In summary, given jointly Gaussian RVs x and y, we know that x conditioned on y = y ̄ is GRV with mean μ ̄ and variance Σ ̄. With the mean and variance in hand, we can compute the proportionality constant for the PDF, using (8.3).
8.3 Deriving the Kalman filter
We now have the tools to derive the Kalman filter as the exact solution to the Bayesian estimation problem, specifically showing that we can track the exact PDF of the state conditioned on the measurements.
We will prove this using induction, but first state some useful variables.
8.3.1 Auxiliary variables
We define the auxiliary variables xp(k) and xm(k). These are used as a short-hand, and will be used again when we look at extensions to the Kalman filter.
We define the prediction, or a priori variable xp(k) as the state x at time k, conditioned on the sequence of measurements z ̄(1:k−1). Note that we will distinguish between the random variable z(k) and the actual measurement z ̄(k) with the overbar.
The measurement update, or a posteriori variable xm(k) is defined as the state x at time k conditioned on the sequence of measurements z ̄(1:k), in other words xp(k) conditioned additionally on z ̄(k).
We define, specifically, xm(0) = x(0), and will then compute xp(k) and xm(k) through induction. Note that both xp and xm are random variables, i.e. they have a probability density function, etc. We will use the Kalman filter to compute their distributions, and then generate an estimate from these.
8.3.2 Derivation and proof of the Kalman filter
We claim that the Kalman filter recursively computes the exact PDF of the state condi- tioned on the measurement sequence. We prove this claim, and derive the filter equations,
8–6

through induction. For clarity’s sake, we will use ξ as a dummy variable in our PDFs for the state, and we will use z ̄(k) as the actual measurement observed at time k (using the overbar to distinguish from the random variable z(k)).
Induction step: Assume that, for some k, we know that xm(k−1) is normally distributed with mean xˆm(k−1) and variance Pm(k−1), and additionally that
fxm(k−1) (ξ) = fx(k−1)|z(1:k−1) (ξ|z ̄(1:k−1)) and xm(k−1) ∼ N (xˆm(k−1), Pm(k−1)) Recall that, for a Gaussian distribution, the mean and variance fully capture the PDF. We will first compute the PDF of xp(k), then use this to compute xm(k).
To compute fxp(k)(ξ) = fx(k)|z(1:k−1)(ξ|z ̄(1:k−1)), we recall that
x(k) = A(k−1)x(k−1) + u(k−1) + v(k−1)
in other words x(k) is an affine combination of x(k−1) and v(k−1). We recall that, by assumption x(k−1) conditioned on z(1:k−1) is Gaussian, and that v(k−1) is Gaussian (and independent of all other quantities). Therefore, by property 2 of GRVs, we conclude that xp(k) is GRV, and we can write xp(k) ∼ N (xˆp(k), Pp(k)) What is left is to compute the mean xˆp(k) and the variance Pp(k), which is easily done from the dynamics:
xˆp(k) = E [x(k)|z ̄(1:k−1)] = A(k−1)xˆm(k−1) + u(k−1)
Pp = Var [x(k)|z ̄(1:k−1)] = A(k−1)Pm(k−1)A(k−1)T + V (k)
We next get a measurement z ̄(k), and wish to use this measurement to compute the PDF of the state x(k) conditioned on all the observed measurements z ̄(1:k). We will investigate this using the extended variable ζ(k), defined
􏰌x(k)􏰍 ζ(k) = z(k)
We first characterize ζ(k) conditioned on z ̄(1:k−1), and then use the observation z ̄(k). We will first investigate the random variable z(k) conditioned on the observations z ̄(1:k−1).
Referring to the system model, we have z(k) = H(k)x(k) + w(k)
which is an affine combination of x(k) and the measurement noise w(k). Recall that we have just shown that x(k) is GRV when conditioned on z ̄(1:k−1) (using the initial assumption), and that w(k) is GRV by our model, and is independent of all other quantities. Therefore,
8–7

again by Property 2, we know that z(k) is GRV, and the PDF is fully defined by the mean and variance:
μz(k) := E [z(k)|z ̄(1:k−1)] = H(k)E [x(k)|z ̄(1:k−1)] + E [w(k)|z ̄(1:k−1)] = H(k)xˆp(k) Σzz(k):=Var[z(k)|z ̄(1:k−1)]=H(k)Pp(k)H(k)T +W(k)
We also compute the cross-covariance between x(k) and z(k) when conditioned on z ̄(1:k−1): Σxz(k) = E 􏰘(x(k) − xˆ(k)) (z(k) − zˆ(k))T |z ̄(1:k−1)􏰙
=E􏰘(x(k)−xˆ(k))(H(k)(x(k)−xˆp(k))+w(k))T |z ̄(1:k−1)􏰙
= E 􏰘(x(k) − xˆ(k)) (x(k) − xˆp(k))T |z ̄(1:k−1)􏰙 H(k)T + E 􏰘(x(k) − xˆ(k)) (w(k))T |z ̄(1:k−1)􏰙
= Pp(k)H(k)T
And we can conclude that ζ(k) ∼ N (μζ(k), Σζζ(k)) with
􏰌 xˆp(k) 􏰍 μζ(k) = E [ζ(k)|z ̄(1:k−1)] = H(k)xˆp(k)
􏰌 Pp(k) Pp(k)H(k)T 􏰍 Σζζ(k)=Var[ζ(k)|z ̄(1:k−1)]= H(k)Pp(k) H(k)Pp(k)H(k)T +W(k)
We now use the measurement z ̄(k) and Property 3 from earlier to compute the distribution of the state x(k) conditioned on the measurements z ̄(1:k). Specifically, we define xm(k) through its PDF, specifically fxm(k)(ξ) = fx(k)|z(1:k)(ξ|z ̄(1:k)). By Property 3 we immedi- ately know that this is a Gaussian distribution, and we’re left with computing the mean and variance, which we can directly take from Property 3. The mean follows as
xm(k) = E [x(k)|z ̄(1:k)] = E [x(k)|z ̄(1:k−1)] + Σxz(k)Σzz(k)−1 (z ̄(k) − H(k)xˆp(k)) = xp(k) + Pp(k)H(k)T 􏰈H(k)Pp(k)H(k)T + W(k)􏰉−1 (z ̄(k) − H(k)xp(k))
and the variance as
Pm(k) = Var [x(k)|z ̄(1:k)] = Pp(k) − Σxz(k)Pzz(k−1)−1ΣTxz(k)
= Pp(k) − Pp(k)H(k)T 􏰈H(k)Pp(k)H(k)T + W(k)􏰉−1 H(k)Pp(k)
Thus, we know that if x(k−1) is Gaussian with mean xˆm(k−1) and variance Pm(k−1) (conditioned on the measurements z ̄(1:k−1)), the state x(k) conditioned additionally on the measurement z ̄(k) is also Gaussian with mean and variance as computed above.
Basis step: To complete the proof by induction, we note that at time k = 0, by definition, fxm(0)(ξ) = fx(0)(ξ) as there are no measurements in the set z(1 : 0). 􏰣
8–8

8.3.3 Implementation
The Kalman filter is implemented as follows:
Initialization
We initialize the filter with our knowledge of the system initial state: xˆm(0) = x0
Pm(0) = P0 Recursion
(8.7) (8.8)
We apply control input u(k−1), and at each time k ∈ {1, 2, . . .} we receive a new measure- ment z ̄(k). We first do a prediction step:
xˆp(k) = E [x(k)|z ̄(1:k−1)] = A(k−1)xˆm(k−1) + u(k−1)
Pp = Var [x(k)|z ̄(1:k−1)] = A(k−1)Pm(k−1)A(k−1)T + V (k)
Next, we use the measurement in the update step:
xˆm(k) = xˆp(k) + Pp(k)H(k)T 􏰈H(k)Pp(k)H(k)T + W(k)􏰉−1 (z ̄(k) − H(k)xˆp(k))
Pm = Pp(k) − Pp(k)H(k)T 􏰈H(k)Pp(k)H(k)T + W(k)􏰉−1 H(k)Pp(k) At any time k, we can compute the full PDF:
fx(k)|z(1:k) (ξ|z ̄(1:k)) ∼ N (xˆm(k), Pm(k))
(8.9) (8.10)
(8.11) (8.12)
8–9

8.3.4 Alternative Equations
An alternative form of the measurement update equations (can be derived using the matrix inversion lemma):
K(k) = Pp(k)HT(k) 􏰈H(k)Pp(k)HT(k) + W(k)􏰉−1 xˆm(k) = xˆp(k) + K(k) (z(k) − H(k)xˆp(k))
Pm(k) = 􏰈I − K(k)H(k)􏰉Pp(k)
= 􏰈I − K(k)H(k)􏰉Pp(k)􏰈I − K(k)H(k)􏰉T + K(k)W(k)KT(k) The matrix K(k) is called the Kalman Filter gain.
Note that implementing (8.16) is computationally more expensive than (8.15), but sensitive to numerical errors. This is known as the “Joseph form” of the covariance update‡.
Why would we use the alternative equations? Compare the matrix inversion in (8.13) to that in (8.12), specifically the dimensions (recall that the computational cost of a matrix inverse grows as the matrix dimension cubed).
8.4 Extracting an estimate
We have derived the Kalman Filter as the exact solution to tracking the PDF of a system state, conditioned on all past measurements. Remarkably, this is possible using finite memory and only simple matrix manipulations. We exploited the fact that all random quantities are Gaussian, suitably independent, and that the system equations are linear.
Often (e.g. feedback control) we need to produce an estimate xˆ(k) of the system state x(k), from the PDF f(x(k)|z(1 : k)). We have shown that x(k), conditioned on z(1 : k) is Gaussian: x(k) ∼ N (xˆm(k), Pm(k))
Normal distribution is unimodal and symmetric. Therefore, the maximum a posteriori, minimum mean squared error, and minimum mean absolute error coincide and the estimate associated with the Kalman Filter is
xˆ(k) = xˆMAP(k) = xˆMMSE(k) = xˆMMAE(k) = xˆm(k)
‡We will also see, in the next chapter, that the Joseph form is more general, as the resulting variance is
correct even if we’re using a sub-optimal gain matrix K.
8 – 10
(8.13) (8.14) (8.15)
(8.16)
is less

8.5 Remarks Implementation
If the problem data is known ahead of time (i.e. A(k), H(k), Q(k), and R(k) for all k as well as P0), all KF matrices (i.e. Pp(k), Pm(k), and K(k)) can be computed off-line. In particular, the KF gain does not depend on real-time measurement data. Of course, the output xˆm(k) does depend on real-time measurement data.
xˆp(k) = A(k−1)xˆm(k−1) + u(k−1)
xˆm(k) = xˆp(k) + K(k) (z(k) − H(k)xˆp(k))
xˆm(k) = 􏰈I − K(k)H(k)􏰉A(k−1)xˆm(k−1) + 􏰈I − K(k)H(k)􏰉u(k−1) + K(k)z(k)
= Aˆ(k)xˆm(k−1) + Bˆ(k)u(k−1) + K(k)z(k)
The filter is linear and time varying.
Estimation error
Estimation error: e(k) := x(k) − xˆm(k).
e(k) = x(k) − 􏰈I − K(k)H(k)􏰉􏰈A(k−1)xˆm(k−1) + u(k−1)􏰉 − K(k)H(k)x(k) − K(k)w(k) = 􏰈I − K(k)H(k)􏰉􏰈A(k−1)x(k−1) + u(k−1) + v(k−1)􏰉
− 􏰈I − K(k)H(k)􏰉􏰈A(k−1)xˆm(k−1) + u(k−1)􏰉 − K(k)w(k)
= 􏰈I − K(k)H(k)􏰉A(k−1)e(k−1) + 􏰈I − K(k)H(k)􏰉v(k−1) − K(k)w(k)
• e(k) is a GRV.
• Error mean: E [e(k)] = (I − K(k)H(k))A(k−1)E [e(k−1)]. It follows that the filter is
unbiased, i.e. E [e(k)] = 0.
• Error variance: Var [e(k)] = E 􏰊(x(k) − xˆm(k))(x(k) − xˆm(k))T 􏰋 = Pm(k).
8 – 11

• What about the stability of the system E [e(k)] = (I − K(k)H(k))A(k−1)E [e(k−1)]? Asymptotic properties of the KF are discussed in more detail in later sections.
Remark: Positive definiteness of variance matrices
In the derivation of the KF, we assumed that x(0), v(k), and w(k) are GRVs and therefore have positive definite variance matrices (i.e. P0 > 0, V (k) > 0, W (k) > 0). Note that the KF equations also make sense when some of the variance matrices are positive semidefinite (i.e. ≥ 0), as long as the involved matrix inversions are well defined.
Examples:
• P0 = 0 means that the initial state is perfectly known.
• V ≥ 0 with some zero eigenvalue means that there is no process noise in some direction of the state space.
• W ≥ 0 means that we have a perfect measurement. For example, when we enforce a constraint (e.g. know the velocity is exactly zero).
8.6 Example
We revisit the rocketship example from the previous lecture.
The rocketship is acted upon by a propulsion force. The force command, uf , can change only at discrete times k. The rocket engine is not perfect, and the actual force is corrupted by an additive normally distributed noise vf (which also only changes at the discrete times k).
• State: position p(k) and speed s(k).
8 – 12

• Acceleration is proportional to the force (and we assume the rocketship has unit mass, and timestep is of unit length): s(k) = s(k−1) + uf(k−1) + vf(k−1) with vf(k−1) ∼ N (0, 0.1).
• Change in position is p(k) = p(k−1) + s(k−1) + (1/2)(uf(k−1) + vf(k−1)).
• Distance measurement to origin, corrupted by noise, is available: z(k) = p(k) + w(k)
with w(k) ∼ N (0, 0.5).
• Initial distribution: (p(0), s(0)) ∼ N (0, I ).
System equations
As before, we let x(k) := (p(k), s(k)), u(k) := (uf (k)/2, uf (k)), and v(k) := (vf (k)/2, vf (k)). Combining all equations:
􏰌1 1􏰍
x(k) = 0 1 x(k−1) + u(k−1) + v(k−1),
􏰐 􏰏􏰎 􏰑
A
z(k)=􏰊1 0􏰋x(k)+w(k), 􏰐 􏰏􏰎 􏰑
H
􏰌0.025 0.05􏰍 E [v(k−1)] = 0, Var [v(k−1)] = V = 0.05 0.1 ,
E[w(k)] = 0,Var[w(k)] = W = 0.5.
A, H , V , W , and initial mean and variance are all we need to implement the KF.
Simulations
(the script to simulate this example are provided on the class website.)
• Simulate true state x(k) and estimate xˆm(k)
– KF provides estimates for position and velocity.
– Pm(k) captures uncertainty in estimates. For example, plot of mean plus/minus onestandarddeviation,xˆi (k)±􏰚Pii(k):Foranyk,theprobabilityofxi(k)being
mm
in the interval [xˆi (k) − 􏰚Pii(k) , xˆi (k) + 􏰚Pii(k)] is 68.2%. mmmm
– Pm(k) converges to a constant matrix as k → ∞ (refer to steady state Kalman filter.
8 – 13