12. The Particle Filter
The Particle Filter (PF) is an approximation of the Bayesian state estimator for a general nonlinear system and general noise distributions. The basic idea is to approximately repre- sent the state PDF by a (large) number of samples, which are called particles. Essentially, in a region where the PDF takes large values, there is a large number of particles. The basic recursive update mechanism is as follows: the particles are propagated through the process model; then, the particles are weighted according to the measurement likelihood; and, finally, a resampling step generates a new set of particles, again with equal weights.
12.1 Model
We consider the nonlinear discrete-time system x(k) = qk−1(x(k−1), v(k−1)), k = 1, 2, . . .
z(k) = hk(x(k),w(k)),
where x(0),{v(·)} and {w(·)} are independent and can be discrete or continuous random
variables with known PDFs.
12.2 Monte Carlo Sampling
Monte Carlo sampling refers to the basic technique of using a large number of samples (called particles) to approximate the PDF of a random variable. It is beneficial, for ex- ample, for computing an approximation of the PDF of a function of the random variable: instead of using change of variables, the function is simply evaluated at the particles. Monte Carlo sampling is the underlying idea that is used in the PF.
Last update: 2020-03-05 at 16:14:09
12 – 1
We have been using Monte Carlo sampling informally throughout the course, and will now make it precise.
Discrete random variables
Consider a discrete random variable y, with a finite outcome space Y = {1,2,3,…,Y}, and an associated PDF fy(y). We draw N samples from this PDF: {y1, y2, . . . , yN }, where samples are independent.
• For n ∈ {1,2,…,N}, let
1 ifyn=i i=1,…,Y
pni := 0 otherwise ,
that is, pni indicates if yn = i (when pni = 1) or not (when pni = 0). In a more compact
form, we write
p ni = δ ( y n − i ) ,
where δ(·) is the Kronecker delta function:
1 ifx=0
δ(x) =
0 otherwise .
• Let pi denote the sample average of pni over the N samples (i.e. the fraction of samples that take the value i):
1 N
of pni ; that is,
Y ̄ pi→E[pni]=δ(yn−i)fy(yn)=fy(i) forN→∞.
yn=1
where we used the law of the unconscious statistician.
∗Here convergence is meant in the sense of almost sure convergence, i.e. converging with probability 1. 12 – 2
1 N # samples taking value i pni =N δ(yn−i)= N .
n=1
By the law of large numbers (LLN) the sample average converges∗ to the expected value
pi:=N
n=1
• Conclusion: the PDF of a random variable can be approximated by repeated sampling of a distribution.
Now consider a function of y ∈ Y: x = g(y). We have already discussed in class how to construct fx(x) from fy(y) (change of variables). Now, we are interested in approximating fx(x) by a set of samples.
• Let X = g(Y). For each j ∈ X, let Yj = {yj,i} be the set of all yj,i ∈ Y such that g(yj,i) = j.
• Let xn := g(yn) and
1ifxn =g(yn)=j
• Let
r jn = N .
rj → Erjn for N → ∞ and Erjn = Prob(xn = j) = Prob(yn ∈ Yj) = fy(yj,i).
yj,i ∈Yj But yj,i∈Yj fy(yj,i) = fx(j) as proven in class (see section on change of variables).
rjn := 0 otherwise
1 N # of x-samples taking value j
r j = N By the LLN,
n=1
• Hence,
f x ( j ) ≈ N r jn = N δ ( x n − j ) ,
1N 1N n=1 n=1
= δ(xn − j).
and we can approximate fx(x) by the samples xn = g(yn). Instead of solving the change of variables, we simply need to evaluate the function g(·) at all samples yn.
Extensions:
• The above is also true for joint discrete random variables:
1 N fx(ξ)≈N δ(ξ − xn)
n=1
where now ξ, xn are vectors, and δ(·) refers to the vector version of the Kronecker delta
(that is, δ is one if all entries of its vector argument are zero, otherwise δ is zero). 12 – 3
• We have only considered finite X and Y, but the above holds for the infinite case as well.
Continuous random variables
Consider the continuous random variable y with PDF fy(y). Let x = g(y). How can we approximate fx(x)?
• Let {yn} be repeated samples from fy(y), and let xn := g(yn). Fix ∆x (bin size). Define
1 ifx≤xn≤x+∆x x 0 otherwise .
rn := Then
1 N Nxx
Prob(x≤x≤x+∆x)≈
• Since (for small ∆x and smooth fx(x))
x+∆x
fx(x)dx ≈ fx(ξ)∆x forx≤ξ≤x+∆x
fx(ξ)≈rx, x≤ξ≤x+∆x. ∆x
Typically, we use informal notation:
1 N
fx(ξ) ≈ N
with δ(·) the Dirac delta pulse. In particular, note that
rn =:r
n=1
Prob(x≤x≤x+∆x)= we have
x
δ(ξ − xn)
n=1
x+∆x N N
1 δ(ξ−xn)dξ= 1 rn =r ≈Prob(x≤x≤x+∆x).
N Nxx x n=1 n=1
The advantage of this notation is that it is the same for discrete and continuous random variables (where δ(·) is the Kronecker delta function or the Dirac delta pulse, respectively).
12 – 4
How to parse this: the number of particles near a certain point x (that is, the sample density) is an approximation of the PDF at the point x. The approximation becomes exact as N → ∞ in the sense of the LLN.
Example: We approximate a Gaussian distribution with zero mean and variance one by repeatedly drawing samples from this distribution. The graphs below show the result for different numbers of samples N and different bin sizes ∆x. The exact value of the PDF is shown as a line, and the approximated value obtained from Monte Carlo sampling is shown as bars.
Samples = 100, Bin size = 0.4
0.5 0.4 0.3 0.2 0.1 0.0
-4 -3 -2 -1 0 1 2 3 4 x
0.45 Samples = 1000, Bin size = 0.2
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
−4 −3 −2 −1 0
x
1 2 3 4
0.45 Samples = 1000, Bin size = 0.4
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
−4 −3 −2 −1 0
x
1 2 3 4
0.40 Samples = 10000, Bin size = 0.2
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
−4 −3 −2 −1 0
x
1 2 3 4
12.3 Recall Bayesian State Estimation
The Bayesian tracking algorithm (also called Bayesian state estimator) computes f(x(k)|z(1:k)). The objective of the PF is to approximate f(x(k)|z(1:k)).
We introduce auxiliary variables, just like we did for the Kalman filter. We define new 12 – 5
fx (x)
fx (x)
fx (x) fx (x)
random variables, xp(k), xm(k), and zm(k), and we distinguish again between the random variable z(k) and the value z ̄(k) that z(k) takes (the actual measurement at time k).
Init:
S1: S2:
xm(0) := x(0)
xp(k) := qk−1(xm(k−1),v(k−1))
zm(k) := hk(xp(k), w(k)) x (k) defined via its PDF
k = 1,2,…
m
fxm(k)(ξ) := fxp(k)|zm(k)(ξ|z ̄(k))
∀ξ
Analogously to the proof used for the Kalman filter, we can show that for all ξ and k =
1,2,…
fxp(k)(ξ) = fx(k)|z(1:k−1)(ξ|z ̄(1:k−1))
fxm(k) (ξ) = fx(k)|z(1:k) (ξ|z ̄(1:k)) . Using the short-hand notation:
f(xp(k)) = f(x(k)|z(1:k−1)) f (xm(k)) = f (x(k)|z(1:k)) .
12.4 Prior Update
Given the PDF fxm(k−1)(ξ) of xm(k−1), we construct the PDF fxp(k)(ξ) of xp(k). We approximate fxm(k−1)(ξ) and fxp(k)(ξ) by Monte Carlo sampling:
• Let
1 N
1 N
δ(ξ − xnp (k))
xnp(k) := qk−1(xnm(k−1), vn(k−1)), for n = 1, 2, . . . , N,
and vn(k−1) are Monte Carlo samples of f(v(k−1)). 12 – 6
δ(ξ − xnm(k−1))
where {xnm(k−1)} are N particles that approximate the PDF of xm(k−1). These particles
fxm(k−1) (ξ) ≈ N
are Monte Carlo samples of fxm(k−1)(ξ). We can therefore form an approximation of
fxp(k) (ξ) ≈ N where
n=1
n=1
• This is straightforward and intuitive. We simply propagate N particles through the process dynamics. This is like a parallel simulation.
• Note that each sample also gets its own noise sapmle vn(k−1).
• Provided N is “large”, and {xnm(k−1)} is a “good” representation of fxm(k−1)(ξ), and {vn(k−1)} is a “good” representation of fv(k−1)(v), then {xnp(k)} will be a “good” rep- resentation of fxp(k)(ξ).
12.5 Example
We revisit an old example, slightly modified. We apply the PF prior update step and do not consider any measurements yet.
• θ(k): location of an object on a circle, θ(k) ∈ [0, 2π).
• Dynamics:
θ(k) = mod θ(k−1) + s(k−1), 2π
where s(k−1) is a CRV uniformly distributed on [−s+b,s+b], and b is a CRV (not time- dependent) uniformly distributed on [−s, s]. The modulo operation mod(y, 2π) adds or subtracts multiples of 2π to the argument y such that the result is in the interval [0, 2π).
• Introducing the states x1(k) := b and x2(k) := θ(k), we can rewrite the dynamics in standard form:
x1(k) = x1(k−1)
x2(k) = mod x2(k−1) + x1(k−1) + v(k−1), 2π,
12 – 7
where x1(0) is uniformly distributed on [−s,s] and v(k−1) is uniformly distributed on [−s,s]. The distribution of x2(0) reflects our initial knowledge of the object’s location.
• Note that x(k) is now a CRV. When we discussed the example in the context of Bayesian tracking, we assumed that the object moves in discrete steps, which allowed us to implement the Bayesian tracking algorithm directly. Here, we use a PF to track the objects location (as a first step, we consider the PF without measurements; later we will add measurements).
• Implementation and simulation results:
– Parameter values (known to the PF): the object’s initial location θ(0) = π, s =
0.01.
– We use N = 1000 particles. Each particle xn(k) is a vector with two elements: the
first corresponding to the bias b and the second corresponding to the location θ(k).
– The histogram of the object’s location spreads out over time: because of the pro- cess noise, our knowledge of the object’s location becomes less accurate. This is amplified when increasing s.
– The bias histogram does not change. Without any measurement, there is no way to infer information about the bias, and the process dynamics do not alter x1(k).
– If we do not know θ(0) and initialize the PF with particles drawn from a uniform distribution on [0,2π), we basically have no idea where the object is.
Recap: Auxiliary variables
Recall the definitions of the auxiliary random variables from before:
xm(0) := x(0)
xp(k) := qk−1(xm(k−1),v(k−1))
∀ξ
zm(k) := hk(xp(k), w(k)) x (k) defined via its PDF
k = 1,2,…
m
fxm(k)(ξ) := fxp(k)|zm(k)(ξ|z ̄(k))
fxm(k) (ξ) = fx(k)|z(1:k) (ξ|z ̄(1:k)) .
We derived the prior update of the Particle Filter (PF) as an approximation of the Bayesian
state estimator using Monte Carlo sampling. Next, we address the measurement update. 12 – 8
2
Init:
S1: S2:
For all ξ and k = 1,2,…,
fxp(k)(ξ) = fx(k)|z(1:k−1)(ξ|z ̄(1:k−1))
12.6 Measurement Update
Given the PDF fxp(k)(ξ) of xp(k) ∈ X (from prior update) and after obtaining a measure- ment z ̄(k), we construct the PDF fxm(k)(ξ) of xm(k). Again, we approximate the PDF by Monte Carlo sampling.
12.6.1 Scaling of particles by measurement likelihood
• From Bayes’ rule, obtain
fxm(k)(ξ) := fxp(k)|zm(k)(ξ|z ̄(k)) =
fzm(k)|xp(k)(z ̄(k)|ξ) fxp(k)(ξ) fzm(k)|xp(k)(z ̄(k)|ζ) fxp(k)(ζ) dζ
. (12.1)
• We approximated fxp(k)(ξ) by Monte Carlo samples (prior update); that is,
1 N
X normalization, constant in ζ
δ ξ − xnp (k).
• Substitutingtheexpressionforfxp(k)(ξ)in(12.1)suggeststhefollowingformforfxp(k)|zm(k)(ξ|z ̄(k)):
fxp(k)(ξ) ≈ N
fxp(k)|zm(k)(ξ|z ̄(k)) ≈ βn δ(ξ − xnp (k)) = α fzm(k)|xp(k)(z ̄(k)|ξ) δ(ξ − xnp (k)),
n=1
NN n=1 n=1
where α is a normalization constant.
• Get N equations by substituting ξ = xnp(k) for n = 1,2,…,N (recall that δ(·) is zero
except at zero):
βn = α fzm(k)|xp(k)z ̄(k)|xnp(k) n = 1,2,…,N .
We have an additional equation, Nn=1 βn = 1, which is required for fxp(k)|zm(k)(ξ|z ̄(k)) to be a valid PDF. Therefore,
N −1 α = fzm(k)|xp(k)z ̄(k)|xnp (k) .
n=1
• Intuition: treat each particle separately. At points of high prior, we have many particles (of equal weight). Posterior has the same particles, but their weights are scaled by measurement likelihood.
12 – 9
• Note that fzm(k)|xp(k)(z ̄(k)|ξ) = fz(k)|x(k)(z ̄(k)|ξ) for all ξ by the definition of zm(k). • Summary: We obtained a representation for fxm(k)(ξ),
N
fxm(k)(ξ) = fxp(k)|zm(k)(ξ|z ̄(k)) ≈ βn δ(ξ − xnp (k)).
n=1
We are not done yet – this is not in the format that we need: the particles must be of equal weight, 1/N, so that we can apply our prior step.
Resampling
Objective: Reselect N particles, where the probability of selecting particle n is βn. This is essentially like drawing N samples from the distribution given by Nn=1 βn δ(ξ − xnp (k)). We apply the basic “sampling a distribution” algorithms (see earlier chapters).
Algorithm. Repeat N times:
• Select a random number r uniformly on [0, 1].
n n−1
• Pick particle n such that βn ≥ r and βn < r.
n=1 n=1
This gives N new particles xnm(k), which are a subset of the old particles (because we only
select from those). The new particles all have equal weight,
1 N
δ(ξ − xnm(k)),
as required. This completes the measurement update step.
12.7 Summary
The basic PF algorithm is given by:
Initialization: Draw N samples {xnm(0)} from f(x(0)). These are the initial particles.
12 – 10
fxm(k)(ξ) ≈ N
n=1
Step 1 (S1): Prior update/Prediction step
To obtain the prior particles {xnp(k)}, simulate the particles {xnm(k−1)} via the process equation:
xnp(k) := qk−1(xnm(k−1), vn(k−1)), for n = 1, 2, . . . , N, which will require N noise samples from f(v(k−1)).
Step 2 (S2): A posteriori update/Measurement update step
Scale each particle by the measurement likelihood:
βn = α fz(k)|x(k)z ̄(k)|xnp(k), for n = 1,2,...,N,
where α is the normalization constant chosen such that Nn=1 βn = 1.
Resample (using the algorithm above) to get the N posterior particles {xnm(k)}, all with
equal weights.
12.8 Example (continued)
We continue the example, now with sensor measurements.
• Recall the dynamics (x1 is the process noise bias b, and x2 is the angle θ of the object): x1(k) = x1(k−1)
x2(k) = mod x2(k−1) + x1(k−1) + v(k−1), 2π,
where x1(0) and v(k−1) are both uniformly distributed on [−s,s].
12 – 11
• A distance sensor at (L,0) measures the distance to the object corrupted by noise: z1(k) = (L − cos x2(k))2 + (sin x2(k))2 + w(k)
with w(k) uniformly distributed on [−e,e]. We receive distance measurements only occasionally.
• A half-plane measurement indicates whether the object is in the upper or the lower half plane:
1 if x2(k)∈[0,π) z2(k) = −1 if x2(k) ∈ [π, 2π)
We only receive half-plane measurements occasionally.
• Simulation results:
– With distance measurements only, the approximate state PDF has two modes corresponding to the true location and the “phantom location” (mirror image). Eventually, the PF will “select” one of them.
– The half-plane sensor allows the PF to track the right one.
– Sample impoverishment: particles converge to only few particles or even the same
one (especially for small process noise; here, in particular, the bias).
12.9 Sample Impoverishment
A possible problem: all particles may converge to the same one and they are therefore no longer a good representation of the state PDF. This has to do with the lumpiness of the approximation. In the limit as N → ∞, this takes an infinite amount of time to happen. It turns out to be an issue for a finite N.
One approach (probably the simplest) to prevent sample impoverishment is roughening.
Roughening
Perturb the particles after resampling, xnm(k) ← xnm(k) + ∆xn(k),
where ∆xn(k) is drawn from a zero-mean, finite-variance distribution. 12 – 12
• There are many ways to choose the variance (or, more generally, the distribution) for ∆xn(k). We present one possible way in the following.
• Let σi be the standard deviation of ∆xni (k), where the index i represents the i-th element of a vector. Then, choose
σ =KE N−1, iid
with
K: tuning parameter, typically K ≪ 1
d: dimension of the state space
Ei: max|xn1 (k) − xn2 (k)|, the maximum inter-sample variability
n1,n2 m,i m,i
−1
N d : spacing between nodes of the corresponding uniform, square grid.
12.10 Remarks
• The PF is an approximation of the Bayesian state estimator. The fact that the PF can, in principle, handle general nonlinear systems and general noise distributions comes at the expense of a possibly large computational effort. In particular, a large number of particles N that may be required to reliably capture the state PDF may be prohibitive for a practical implementation.
• The PF presented in this chapter is the simplest, most basic form of particle filtering. There are many practical (numerical) issues that often require tuning and utilizing the problem structure for a satisfactory filter performance.
• This algorithm is not deterministic: for a given problem data, the particle filter result will depend on many random samples (e.g. the process noise samples vn(k−1), the resampling after the measurement update). This is different than e.g. the Kalman filter (all forms we covered), which is deterministic.
• The particle filter estimates a PDF. To obtain an estimate (e.g. for feedback control), one needs to extract the estimate from the PDF (e.g. the minimum mean squared error estimate).
• The particle filter is a good example of something that suffers from the curse of dimen- sionality, because of the need for sampling the space with sufficient density. Doubling the state-space size (e.g. two coupled robots) typically makes the problem much more than twice as hard. There are approaches to mitigate this, but they will not be covered in this lecture.
12 – 13