9. The Kalman Filter as State Observer
The Kalman Filter can be used to estimate states that are not directly accessible through a sensor measurement; that is, states that do not explicitly appear in the measurement equation for z(k). In last chapter’s example, for instance, position and velocity estimates are obtained from a position measurement only. The KF can infer information about states that are not directly measured by exploiting possible couplings of the states through the system dynamics. In this sense, the KF is sometimes referred to as a state observer.
In this chapter, we discuss observability and detectability as conditions that guarantee that all states of a linear, time-invariant system can reliably be estimated by a KF (in the sense that the error variances of all states remain bounded and do not tend to infinity as time progresses). In addition, the discussion of asymptotic properties of the KF will allow us to derive the Steady-State Kalman Filter, which is a time-invariant implementation of the KF.
9.1 Model
For this chapter, we restrict the model to linear, time-invariant systems and stationary distributions. That is, A(k) = A, H(k) = H, V (k) = V , and W(k) = W are constant.
x(k) = Ax(k−1) + u(k−1) + v(k−1) z(k) = Hx(k) + w(k)
where x(k) ∈ Wn, z(k) ∈ Wm, and E[x(0)] = x0, Var[x(0)] = P0, E[v(k−1)] = 0, Var[v(k−1)] = V, E[w(k)] = 0, Var[w(k)] = W, and x(0), {v(·)} and {w(·)} are inde- pendent.
Last update: 2020-02-25 at 07:44:09
9–1
9.2 Observability
We first introduce the concept of observability for a deterministic system, i.e. without noise (v(k−1) = 0, w(k) = 0). The goal is to reconstruct x(0) from measurements z(0), z(1), z(2), etc.∗
Using measurements up to time k, we have
z(0) = Hx(0)
z(1) = Hx(1) = HAx(0) + Hu(0)
z(2) = Hx(2) = HA2x(0) + Hu(1) + HAu(0) .
z(k) = Hx(k) = HAkx(0) + Hu(k − 1) + · · · + HAk−1u(0), which we rewrite as:
H z(0) 0 HA z(1) H
. x(0)= . − . . . .
0 ··· 0u(0) 0 ··· 0u(1)
. .. . . . . . . .
HAk z(k) HAk−1 HAk−2 ··· 0 u(k)
=:Ok
The right-hand side (RHS) is known – hence, we can uniquely solve for x(0) if and only if rank(Ok) = n (full column rank). In that case we can use the following least squares approach:
x(0) = (OkT Ok)−1OkT · RHS.
Note, this is not really least-squares, since the system is deterministic and without uncer-
tainty.
If, at time k, the matrix Ok is not full rank, we can wait another time step, and collect an additional measurement z(k+1) which will add an additional m rows to the matrix, and perhaps Ok+1 is full rank. There are two possibilities: if we wait long enough, eventually Ok becomes full rank, or Ok will be rank deficient for arbitrarily large k. How could we know when to give up?
The answer is given by the Cayley-Hamilton theorem, which states that every square matrix A ∈ Wn satisfies its characteristic equation. Recall that the characteristic polynomial is
∗Note that once we know x(0), we can reconstruct x(k) for all k (for the deterministic case). 9–2
generated from
0 = det (A − λI)
=λn +a1λn−1 +···+an−1λ+an so that by the theorem
0=An +a1An−1 +···+an−1A+anI and more specifically
An =−a1An−1 −···−an−1A−anI
(9.1) (9.2)
(9.3)
Aside: note that in (9.1), if we substitute λ = A we get the trivial det(A−A·I) = 0. However, the Cayley-Hamilton theorem says something much more profound, since we may now substitute λ = A into (9.2), replacing a scalar equation with a matrix equation.
From (9.3) means that An is a linear combination of I = A0, A, . . . , An−1, and specifically in this context that the nth observation is linearly dependent on observations 0 through n − 1. Thus, if the matrix O := On−1 is not full rank, we will never be able to reconstruct x(0) from the observations, no matter how many we collect.
If rank(O) = n, we say that the pair (A, H) is observable.
Observability conditions† The pair (A,H) is observable.
⇔
⇔ ⇔
are eigenvalues of A. For all other values of λ, A − λI has full rank. †According to Anderson, Moore, Optimal Filtering, Dover Publications, 2005.
9–3
For a deterministic LTI system (x(k) = Ax(k−1) + u(k−1), z(k) = Hx(k)), knowledge of z(0:n−1) and u(0:n−1) suffices to determine x(0).
rank(O) = n. A−λI
H is full rank for all λ ∈ C (PBH-Test).
For the PBH-Test (PBH = Popov-Belevitch-Hautus), one only needs to check those λ that
9.3 Asymptotic Properties of the Kalman Filter
For constant A, H , V , and W , the KF is still time-varying:
Pp(k) = APm(k−1)AT + V
K(k) = Pp(k)HT (HPp(k)HT + W )−1
Pm(k) = (I − K(k)H)Pp(k).
In the following we examine the estimation error e(k) = x(k) − xˆm(k) as k → ∞. We
already know that the filter is unbiased, i.e. E [e(k)] = 0 for all k if xˆm(0) = x0. We now consider the variance Pp(k) and combine the equations above:
Pp(k+1) = APp(k)AT + V − APp(k)HT HPp(k)HT + W −1HPp(k)AT . (9.4)
• It can be shown that observability is sufficient for guaranteeing the variance is bounded:
if (A, H) is observable, then there exists a c > 0 so that limk→∞ Pp(k) < cI.
• Intuition: all states are observable through the measurement z(k); that is, the KF can obtain some information about all states. Since the KF is optimal, it will use this information, and the uncertainty in estimating the states (captured by the variance) will not grow unbounded.
Examples
Consider the system (n = 2, m = 1) x(k) = Ax(k−1) + v(k−1)
z(k) = Hx(k) + w(k)
(See script ObservabilityExamples.py.)
Case 1:
21 A= 0 0.5 , H= 1 0
P0=I, q=1
The observability matrix has full rank, so the variance is bounded. Furthermore, we
note that the variance converges to a steady value, limk→∞ Pp(k) = P∞. 9–4
x(0) ∼ N (0, P0) , v(k−1) ∼ N (0, qI) w(k) ∼ N (0, 1) .
H10 ⇒ O= HA = 2 1
Case 2:
Case 3:
Case 6:
2 1 H 0 1 A= 0 0.5 , H= 0 1 ⇒ O= HA = 0 0.5
P0=I, q=1
The observability matrix does not have full rank and Pp(k) diverges. The state
x1(k) is not observable and the corresponding error variance grows unbounded:
Var[e (k)] = P(1,1)(k) → ∞ as k → ∞. 1p
0.51 A= 0 2 , H= 0 1
P0=I, q=1
H01 ⇒ O= HA = 0 2
Even though the observability matrix does not have full rank, limk→∞ Pp(k) = P∞. This shows that observability is stronger than needed. Indeed, there is a weaker condition that guarantees that the KF variance does not diverge: detectability.
Case 4:
0 1 H 0 1 A= −2 0 , H= 0 1 ⇒ O= HA = −2 0
P0 = diag(0,1), q = 0
The observability matrix has full rank, and the variance remains bounded. However,
the variance does not converge, and we notice that it oscillates between two values.
Case 5:
0 1 H 0 1 A= −2 0 , H= 0 1 ⇒ O= HA = −2 0
P0 = diag(1,1), q = 0
This is identical to case 4, except for the initial variance P0. We note that now the
variance does converge.
0 1 A= −2 0 , H= 0 1
P0 = diag(0,1), q = 10−12
9–5
H 0 1 ⇒ O= HA = −2 0
This is identical to case 4, except that we have a process noise (with exceptionally small variance). We note that now the variance converges, and in fact seems to converge to the same values as in case 5. This indicates that convergence of the variance goes beyond the system’s observability properties, and places some demands on the uncertainty in the system.
9.4 Detectability
In words, a system is detectable if all its unstable modes are observable.
Detectability conditions
⇔ H isfullrankforallλ∈C with|λ|≥1(PBH-Test).
Furthermore, if (A,H) is not observable, then there exists a state transformation T such
The pair (A,H) is detectable. A−λI
that
TAT =A A , HT =H1 0, and(A11,H1)observable.
−1 A11 0 −1 21 22
Then: (A,H) is detectable ⇔ A22 is stable (all eigenvalues have magnitude less than 1).
• The main idea behind detectability:
Assume that there exists a λ with |λ| ≥ 1 such that A−λI is not full column rank.
H
That is, the vector v is a natural mode of the system that does not decay and that is not seen in the output of the system.
• Detectability is weaker than observability: (A,H) observable ⇒ (A,H) detectable.
• It can be shown that if (A,H) detectable, then Pp(k) is bounded‡.
‡See W. W. Hager and L. L. Horowitz, “Convergence and stability properties of the discrete Riccati operator equation and the associated optimal control and filtering problems,” SIAM Journal on Control and Optimization, vol. 14, no. 2, pp. 295–312, 1976, Lemma 1.
9–6
Then there exists a vector v, v ̸= 0, such that (A−λI)v=0, Hv=0 ⇔ Av=λv, Hv=0.
9.5 The Steady-State Kalman Filter
Motivation: if the KF variance converges, then so does the KF gain: limk→∞ K(k) = K∞. Using the constant gain K∞ instead of the time-varying gain K(k) simplifies the implementation of the filter (there is no need to compute or store K(k)). This filter is called the Steady-State KF.
Computation of K∞
Assume Pp(k) converges to P∞. Then, (9.4) reads
P∞ =AP∞AT +V −AP∞HTHP∞HT +W−1HP∞AT.
• This is an algebraic equation in P∞, called the Discrete Algebraic Riccati Equation
(DARE).
• Efficient methods exist for solving it (under certain assumptions on the problem param- eters, see below); Python implementation: scipy.linalg.solve discrete are(A.T, H.T, V, W) Matlab implementation: dare(A’,H’,V,W).
• Thesteady-stateKFgainthenis: K∞ =P∞HT(HP∞HT +W)−1. Steady-state estimator
The steady-state KF equations with xˆ(k) := xˆm(k):
xˆ(k) = I − K∞HA xˆ(k−1) + I − K∞H u(k−1) + K∞ z(k)
= Aˆ xˆ(k−1) + Bˆ u(k−1) + K∞ z(k), a linear time-invariant system.
Estimation error:
e(k) = x(k) − xˆ(k) = (I − K∞H)A e(k−1) + (I − K∞H) v(k−1) − K∞ w(k).
stability important!
• Want (I − K∞H)A to be stable (i.e. all eigenvalues inside the unit circle) for the error not to diverge.
9–7
• Mean: E[e(k)]=(I−K∞H)AE[e(k−1)].
Consider the case if E [e(0)] = x0 − xˆ(0) ̸= 0 (we may not know x0 = E [x(0)]). We have:
E [e(k)] → 0 as k → ∞ for any initial E [e(0)] if and only if (I − K∞H)A is stable. What can go wrong?
• Pp(k) does not converge as k → ∞.
• Pp(k) converges, but to different solutions for different initial Pp(1). This is not desirable
– which one should we use to compute K∞?
• (I − K∞H)A is unstable.
All these are addressed by the following theorem. First, we introduce the concept of stabilizability: Consider the system
x(k) = Ax(k−1) + Bu(k−1).
This system (A, B) is said to be stabilizable if there exists a matrix F such that A − BF is stable. It can be shown that stabilizability is the dual of detectability: (A, B) is stabilizable ⇔ (AT , BT ) is detectable.
Theorem§
AssumeW >0andV ≥0,andletGbeanymatrixsuchthatV =GGT (canalwaysbe
done for symmetric positive semidefinite matrices). Remark: we can now write the system dynamics as
x(k) = Ax(k−1) + u(k−1) + Gv ̄(k−1)
where E[v ̄] = 0 and Var[v ̄] = I.
Then the following two statements are equivalent:
1. (A,H) is detectable and (A,G) is stabilizable.
§Adapted from D. Simon, Optimal state estimation: Kalman, H infinity, and nonlinear approaches. John Wiley & Sons, 2006, p. 196 and B. D. Anderson and J. B. Moore, Optimal filtering. Dover Publications, 2005. See these and references therein for proofs.
9–8
2. The DARE has a unique positive semidefinite solution P∞ ≥ 0, and the resulting (I − K∞H)A is stable, and
lim Pp(k) = P∞ for any initial Pp(1) ≥ 0 (and, hence, any Pm(0) = P0 ≥ 0). k→∞
Interpretation of the two conditions:
• (A,H) is detectable: can observe all unstable modes.
• (A, G) is stabilizable: noise excites unstable modes. Note that if V > 0 then (A, G) is always stabilizable.
• Examples where one of these is not satisfied will be covered in the discussion.
9.6 Remarks
• The KF is the optimal state estimator (for a linear system and Gaussian distributions) irrespective of whether the system is observable, detectable, or not detectable. Recall: the KF exactly computes the conditional PDF of the state, from which we then extract an optimal estimate (e.g. MAP or MMSE). Loosely speaking then, the KF does the best possible, even if the measurements do not provide sufficient information (that is, even if the variance grows unbounded, the KF estimate is still optimal in the MMSE/MAP/etc. sense). This is why observability/detectability was not discussed when we derived the Kalman filter.
If the noise is not Gaussian, a similar argument holds – the KF is the best linear unbiased estimator, even if the system is unobservable.
• Observability and detectability are properties of the system, and not of the estimation algorithm. Hence, they cannot be altered by using a different state estimation algorithm, but only by modifying the system (for example, by placing an additional sensor).
• Observabilityanddetectabilitycanalsobedefinedfortime-varyingornonlinearsystems. However, conditions for checking them are usually not as straightforward as for the linear time-invariant case.
9–9