10. The Extended Kalman Filter
We discuss the Extended Kalman Filter (EKF) as an extension of the KF to nonlinear systems. The EKF is derived by linearizing the nonlinear system equations about the latest state estimate.
We consider the nonlinear discrete-time system
x(k) = qk−1x(k−1), u(k−1), v(k−1) E [x(0)] = x0, Var [x(0)] = P0 (10.1)
E [v(k−1)] = 0, Var [v(k−1)] = V (k−1)
z(k) = hkx(k), w(k) E [w(k)] = 0, Var [w(k)] = W(k) (10.2)
for k = 1, 2, . . . , and where x(0), {v(·)}, and {w(·)} are mutually independent. We assume that qk−1 is continuously differentiable with respect to x(k−1) and v(k−1), and that hk is continuously differentiable with respect to x(k) and w(k).
The key idea in the derivation of the EKF is simple: in order to obtain a state estimate for the nonlinear system above, we linearize the system equations about the current state estimate, and we then apply the (standard) KF prior and measurement update equations to the linearized equations.
10.0.1 Process update
Assume we have computed xˆm(k−1) and Pm(k−1) as (approximations of) the conditional mean and variance of the state x(k−1) given the measurements z(1:k−1). Linearizing (10.1)
Last update: 2020-02-25 at 07:48:36
10 – 1
about x(k−1) = xˆm(k−1) and v(k−1) = E [v(k−1)] = 0 yields
x(k) ≈ qk−1(xˆm(k−1), u(k−1), 0)
+ ∂qk−1(xˆm(k−1), u(k−1), 0) ·(x(k−1) − xˆm(k−1)) + ∂qk−1(xˆm(k−1), u(k−1), 0) ·v(k−1)
∂x
∂v
=:L(k−1)
=:v(k−1) =:ξ(k−1) = A(k−1)x(k−1) + v(k−1) + ξ(k−1),
where ξ(k−1) is treated as a known input, and the process noise v(k−1) has zero-mean and variance Var[v(k−1)] = L(k−1)V(k−1)LT(k−1). We can now apply the KF prior update equations to the linearized process equation:
xˆp(k) = A(k−1)xˆm(k−1) + ξ(k−1)
= qk−1(xˆm(k−1), u(k−1), 0) (by substituting ξ(k−1))
Pp(k) = A(k−1)Pm(k−1)AT(k−1) + L(k−1)V (k−1)LT(k−1).
Intuition: predict the mean state estimate forward using the nonlinear process model and
update the variance according to the linearized equations.
10.0.2 Measurement update
We linearize (10.2) about x(k) = xˆp(k) and w(k) = E [w(k)] = 0:
z(k) ≈ hk(xˆp(k), 0) + ∂hk(xˆp(k), 0) ·(x(k) − xˆp(k)) + ∂hk(xˆp(k), 0) ·w(k)
=:A(k−1)
= A(k−1)x(k−1) + L(k−1)v(k−1) + qk−1(xˆm(k−1), u(k−1), 0) − A(k−1)xˆm(k−1)
∂x
∂w
=:M(k)
=:w(k) =:ζ(k) = H(k)x(k) + w(k) + ζ(k),
where w(k) has zero mean and variance Var[w(k)] = M(k)W(k)MT(k). Compared to the measurement equation that we used in the derivation of the KF, there is the additional term ζ(k), which is known. It is straightforward to extend the KF measurement update to this case (for example, by introducing the auxiliary measurement z(k) := z(k) − ζ(k)).
10 – 2
=:H(k)
= H(k)x(k) + M(k)w(k) + hk(xˆp(k), 0) − H(k)xˆp(k)
Applying the KF measurement update to the linearized measurement equation yields: K(k) = Pp(k)HT(k) H(k)Pp(k)HT(k) + M(k)W(k)MT(k)−1
xˆm(k) = xˆp(k) + K(k) (z(k) − H(k)xˆp(k) − ζ(k))
= xˆp(k) + K(k) (z(k) − hk(xˆp(k), 0)) (by substituting ζ(k))
Pm(k) = I − K(k)H(k)Pp(k).
Intuition: correct for the mismatch between the actual measurement z(k) and its nonlinear
prediction hk(xˆp(k), 0). 10.0.3 Summary
The discrete-time EKF equations are given by:
Initialization: xˆm(0) = x0, Pm(0) = P0.
Step 1 (S1): Prior update/Prediction step
xˆp(k) = qk−1(xˆm(k−1), u(k−1), 0)
Pp(k) = A(k−1)Pm(k−1)AT (k−1) + L(k−1)V (k−1)LT(k−1)
where
A(k−1) := ∂qk−1(xˆm(k−1), u(k−1), 0) and L(k−1) := ∂qk−1(xˆm(k−1), u(k−1), 0).
∂x
Step 2 (S2): A posteriori update/Measurement update step
K(k) = Pp(k)HT(k) H(k)Pp(k)HT(k) + M(k)W(k)MT(k)−1 xˆm(k) = xˆp(k) + K(k) (z(k) − hk(xˆp(k), 0))
Pm(k) = I − K(k)H(k)Pp(k)
where
H(k) := ∂hk(xˆp(k), 0) and M(k) := ∂hk(xˆp(k), 0).
∂x ∂w
10.0.4 Remarks
∂v
• The matrices A(k−1), L(k−1), H(k), and M(k) are obtained from linearizing the system equations about the current state estimate (which depends on real-time measurements).
10 – 3
Hence, the EKF gains cannot be computed off-line, even if the model and noise distri- butions are known for all k.
• If the actual state and noise values are close to the values that we linearize about (i.e. if x(k−1) − xˆm(k−1), v(k−1), x(k) − xˆp(k), and w(k) are all close to zero), then the linearization is a good approximation of the actual nonlinear dynamics. This assumption may, however, be bad. In the case of Gaussian noise, for example, the above quantities are not guaranteed to be small since the noise is actually unbounded.
• The EKF variables xˆp(k), xˆm(k), Pp(k), and Pm(k) do not capture the true condi- tional mean and variance of x(k) (let alone the full conditional PDF). They are only approximations of mean and variance.
For example, in the prior update, xˆp(k) would only accurately capture the mean update if the expected value operator E [·] and qk−1 commuted; that is, if
E qk−1x(k−1), u(k−1), v(k−1) = qk−1E [x(k−1)] , E [u(k−1)] , E [v(k−1)] .
This is not true for a general nonlinear function qk−1, and may be a really bad assump-
tion in the case of strong nonlinearities (it holds, however, for linear qk−1).
Even though the EKF variables do not capture the true conditional mean and variance,
they are often still referred to as the prior/posterior mean and variance.
• Despite the fact that the EKF is a (possibly crude) approximation of the Bayesian state estimator and general convergence guarantees cannot be given, the EKF has proven to work well in many practical applications. As a rule of thumb, an EKF often works well for (mildly) nonlinear systems with unimodal distributions.
Solving the Bayesian state estimation problem for a general nonlinear system is of- ten computationally intractable. Hence, the EKF may be seen as a computationally tractable approximation (trade-off: tractability vs. accuracy).
10.1 Example of a “nice” system with poor performance
Consider the following scalar system, with the functions x(k) = q (x(k−1), v(k−1)) = 2 arctan (x(k−1) + v(k−1))
z(k) = h(x(k), w(k)) = x(k) + w(k)
and x(0) ∼ N (x0,1), v(k−1) ∼ N (0,0.1), and w(k) ∼ N (0,10).
10 – 4
We notice that the measurement is linear, and that the dynamics are smooth (infinitely differentiable):
∂q(x,v) = 2 = ∂q(x,v) (10.3) ∂x (x+v)2 +1 ∂v
The process uncertainty is small, and the measurement uncertainty large. The dynamics are illustrated below:
6 4 2 0
-2 -4 -6
-6 -4 -2 0 2 4 6 x(k − 1)
f (x(k), 0) x(k)
The deterministic system (when we set v(k) = 0) has three equilibria, at x(k) = 0 and x(k) ≈ ±2.33. The equilibrium at zero is unstable, the other two are stable.
We can run the EKF algorithm on generated data many times, to get a feeling for the system’s performance. We can compare the EKF performance for x0 = 4, and x0 = 0.
We note, for x0 = 4: the EKF does a good job of tracking the state (note: the 1σ line is true only for one run of the EKF (the last one we did), not for all runs). The true state clumps around the stable equilibrium at x ≈ 2.33.
For x0 = 0: the EKF does not do a good job of tracking the state. Usually, it’s OK, but sometimes the true state goes to one equilibrium, and the estimator to the other. Note
• this is true even though we can measure the state,
• it doesn’t matter how long we run the filter, it doesn’t correct itself, • there is no model mismatch.
If we look at a plot of the true state and the estimate for such a case, we see the problem: the EKF estimate initially moved in the opposite direction from the true state (perhaps
10 – 5
due to process noise). At the same time, the EKF is overly confident of its estimate (as can be seen by the small covariance). In this case, we would say that the EKF has diverged.
See the script EKF UKF Example.m.
10.1.1 Arbitrarily large errors under linearization
The underlying assumption we make, when using the EKF, is that we can push a random variable through an ‘approximately’ linear function and use the linearization to compute the resulting random variable’s statistics. Here we want to look at an example which shows how wrong this can be.
Consider the transformation y = q(x) = x + 10−6×4
with E[x] = 0 and Var[x] = 0.01.
We note that x is zero-mean, with very small standard deviation, and that the nonlinearity in q is extremely mild (if x is small, x4 is very small and 10−6×4 is surely negligible). We might thus feel confident that we can use a first-order approximation to compute the statistics of y, and get:
E[y] ≈ q(E[x]) = 0
Consider now the following PDF for x:
f(x) = 20
π (1 + 100×2)2
which we can verify is a valid PDF and that x has the desired expectation and variance.
We can use the law of the unconscious statistician to compute the true expectation of y: ∞
E[y] = q(x)f(x)dx −∞∞
−6 4
=10 x f(x)dx=
20·10−6∞ x4
π (1+100×2)2 dx
−∞ = x
10000
−∞
x
2000000 ∗ x2 + 20000
∞ 200000 −∞
+
− 3 arctan(10x) 10 – 6
= ∞
In other words, our first-order estimate of the mean is infinitely wrong.
What’s going on? The distribution for x, is related to the Cauchy distribution we saw earlier in the course. In this case, x has finite first and second moments, but its higher moments are not defined or are infinite, which means that any higher order terms in the series expansion for the nonlinear transformation q are also undefined or infinite.
10 – 7