COURSE PROJECT, EL2320 APPLIED ESTIMATION 1
Kalman Filter and Particle Filter for Real time Object Tracking
Lucas Rode ́s Guirao (partner Diego Mor ́ın) lucasrg@kth.se
Abstract—The objective of this project is to compare the performance of the Kalman Filter and the Particle Filter for real time object tracking. In addition, we explore possible combinations of both techniques. Furthermore, we will use a pre-filtered video sample as input. With this project, we expect to gain a deep understanding of both approaches. Advantages and drawbacks from both techniques will be also highlighted. Finally, a demo of the implementation will be provided.
I. INTRODUCTION
W e are living in an era in which massive amounts of data are generated everyday. Currently, 2.5 quintillion bytes of data are created daily, according to [1]. Within this sea of data, there is place for visual content (such as images, videos), where powerful algorithms and processing techniques can provide the tools to make inferences from this data. In this regard, computer vision is an interdisciplinary field which focuses on the understanding of computers of digital images. Speaking in terms of engineering, computer vision aims to design systems that automatize tasks that could be easily done by humans but are infeasible due to the yet increasing amount of data. Within this field, we find object tracking, which has gained momentum due to the progress done in the field of supercomputing hardware and the increasing affordability of HD cameras. Object tracking has a wide range of applications, such as surveillance, sports,
medical imaging etc [2].
In this regard, this report analyses and discusses different object tracking approaches in the context of video analysis. In particular, we study two different methods, namely (i) the Kalman Filter (KF) and (ii) the Particle Filter (PF), which have been exhaustively used in several Object Tracking problems, for instance in [3], [4], [5] and [6], [7], [8], respectively. Object tracking might be complex due to several factors, such as noise in the image frames, scene illumination changes, complex image motion, object occlusions, real- time image processing etc. This paper addresses the three later. Additionally, we also inspect some object detection techniques involving image processing, which are essential for a successful object tracking.
A. Contribution
In this paper, we provide the following contributions:
Advisor: Prof. John Folkesson, EL2320 Applied Estimation, KTH Royal Institute, HT 2016.
‚ We explore a combination of KF and PF, which will be referred as Kalmanized Particle Filter (KPF) throughout this paper. The initial idea was to obtain a more robust method, however it presents some drawbacks.
‚ We compare all implemented methods, i.e. KF, PF and KPF, in terms of the Mean Square Error (MSE) between the estimated- and the real object position.
‚ We propose Kernel Density Extraction (KDE) as an efficient method to solve multiple problems in Object tracking such as: Tuning of noise covariance matrix Q in KF and state estimation and particle weighting in PF.
B. Outline
Section II briefly reviews important aspects of the theory used in this project (i.e. KF and PF). Next, Section III explains all different implementations. Furthermore, IV comments the results obtained from the different approaches. Finally, Section V finishes the report with the conclusions.
II. OBJECT TRACKING
Object tracking in videos is the task of locating a specific object in a sequence of frames. First, a detection algorithm is required in order to obtain possible object locations. Next, the object tracker is then responsible for following the trajectory of the object in subsequent frames. There exist several approaches for object tracking, which can be classified in different cate- gories [9]:
‚ Point tracking: It represents the object by a point and estimates the current object location based on estimations in previous frames. That is, there is a dependence between estimations at time k and time k ` 1. Moreover, these estimations usually are the pixel coordinates of the object position and its motion (e.g. velocity, acceleration). This information is stored in a vector which is referred to as state vector.
‚ Kernel tracking: It represents the object by a certain shape (e.g. rectangular or elliptic) together with a RGB color histogram [10].
‚ Silhouette tracking: In this approach, the object region is estimating in each frame using image processing meth- ods.
In this project we only consider the point tracking category, which can be further divided into two subcategories, (i) deterministic approaches and (ii) probabilistic approaches. We only focus on the later, which uses randomness techniques to estimate state vectors. In particular, we briefly present the KF and the PF.
COURSE PROJECT, EL2320 APPLIED ESTIMATION
2
A. Kalman Filter
The KF implements a Bayes filter and was invented by Rudolph Emil Kalman in the 1950s [11]. In this approach, the belief at time step k, i.e. belpxkq, is assumed to be Gaussian distributed, which allows it to be uniquely described by a mean vector μk and a covariance matrix Σk. It consists of two steps, namely prediction and update, each of them making important assumptions.
1) Prediction: In the prediction step, the KF uses previous state to make a first prediction of the current state. In particular, it assumes that the state xk is related with xk ́1 through a linear function for all 1 ď k ď K, where K is the duration of the tracking task. This linear function is given by
xk “ Akxk ́1 ` Bkuk ` εk, (1)
where we have used the following notation:
xk: n ˆ 1 vector representing the state vector at time
step k. Typically contains data of interest, e.g. position, velocity, acceleration etc.
uk: m ˆ 1 vector denoting the control vector. Typically contains control input data, e.g. steering angle.
Ak : n ˆ n matrix state transition matrix which establishes the relation between two consecutive states.
Bk : n ˆ m control matrix. Relates the control vector and the estimated state.
εk: n ˆ 1 Gaussian random vector modelling the randomness in this system. It is assumed to be zero- mean with covariance matrix Rk (often referred to as the process noise covariance matrix)
In the context of object tracking, where there is no external influence, both the control matrix and control vector can be omitted. Hence, (1) is rewritten as
xk“Akxk ́1`εk. (2)
As aforementioned, the state is fully characterized by a mean vector and a covariance matrix. Thus, the predicted state is defined by the predicted mean μk and predicted covariance Σk. Using (2) we write the prediction equations as
Gaussian vector representing the measurement noise term. Consequently we have that the likelihood of the measurement is ppzk|xkq “ Npzk|Ckxk,Qkq, where Qk is the measurement noise covariance matrix.
In this step, the predicted parameters μk and Σk are corrected. For this purpose, the Kalman gain Kk is defined, which quantifies the relative importance between the update and prediction step. That is, the higher it is, the more relevant is the measurement for the state estimation, which might occur when the measurement is very certain. Using (4) we write the update equation as
Kk “ ΣkCkT pCkΣkCkT ` Qkq ́1,
μk “ μk ` Kkpzk ́ Ckμkq, (5) Σk “Σk ́KkCkΣk.
B. Particle Filter
The PF is a non-parametric implementation of the Bayes Filter. In contrast to the KF, it does not make any assumption on the belief distribution. This, makes it very useful for representing multi-modal posteriors. Moreover, it does not assume the motion model to be linear, making it suitable for modelling non-linear systems.
The main idea of the PF is to represent the posterior belpxkq
by a finite number of randomly drawn state samples from
this posterior. Therefore, we can assert that it is an numerical
approximation rather than an analytical. These samples are
known as particles, which have an associated weighting factor.
At time step k we denote a particle by xrms for m “ 1,…,M, k
where M is the number of particles. These particles are
obtained by sampling from the so called proposal distribution
ppxk|uk,xk ́1q, for instance (2). Along with each particle,
we define the importance factors wrms. They are used to k
incorporate the influence from the measurement zk into the posterior estimation. They are obtained as
μk “ Akμk ́1,
Σ k “ A k Σ k ́ 1 A Tk ` R k .
wrms9ppzk|xrmsq, for m “ 1, . . . , M. kk
(6) “ 1.
(3) The system performance depends on the value of Ak and
Moreover, they are normalized so that The set of particles
Xk “ txr1s,…,xrMsu kk
řM rms m“1 wk
Rk, whose impact will be later studied.
2) Update: Next, in the update step the KF uses the current measurement, i.e. zk, to correct (or update) the object’s state. It is assumed that the relation between the measurement zk and the state xk is also linear. In particular we have that
zk “Ckxk `δk, (4)
where zk is the l ˆ 1 measurement vector, Ck is the l ˆ n transformation matrix mapping the vector state into the measurement domain and δk is a l ˆ 1
weighted by the corresponding importance factors represent the belief at time step k. Nonetheless, after obtaining this set the PF performs its key trick, the Resampling. It consists in sampling, with replacement, from the current belief estimate. That is, particles with higher weights will be more likely to “survive”. However, re-sampling to often increases the risk of losing diversity.
The larger is M, the more accurate will be the posterior, but this comes at the expense of high computational cost.
COURSE PROJECT, EL2320 APPLIED ESTIMATION
3
III. IMPLEMENTATION
We have selected 1 minute recording of the Nintendo Pinball arcade game as our test video and have focussed on tracking the ball. In addition, we deleted some regions from the original video in order to create occlusions. This allowed us for testing the performance of our implemented methods when the measurements were very poor. Prior to any tracking algorithm, we pre-process each input video frame k in order to generate the corresponding binary matrix IBpkq of the same size of the original picture with ‘1’s in pixels that are likely to contain ball and ‘0’ in the rest. This was done using an RGB color interval of the ball, giving the results shown in Fig. 1 illustrates.
Fig. 1: The left image, shows the input frame with the artificially added occlusions (cyan filled rectangles). The right image illustrates the filtered k:th frame, i.e. IBpkq, which shows likely regions for the ball highlighted in white. This particular result is for frame k “ 315.
Next, we applied the 10 ˆ 10 kernel matrix
»fi
Fig. 2: Result after applying a given Kernel K on the binary image, i.e. we computed as convolution IK pkq “ K ̊ IB pkq, where IBpkq is the filtered image frame from Fig. 1. This particular output is for frame k “ 315.
Moreover, for simplicity we have used stationary matrices for the state transition and the state-to-measurement mapping, i.e.
50 100 150 200 250 300 350
50 100 150 200 250 300 350
100
200
300 400
100 200
300 400
Ak “ A, Ck “ C.
(9)
In this regard, the algorithm of the Kalman Filter is sum- marized in Fig. 3.
kalman filter (μk ́1, Σk ́1, zk) Step 1: Prediction
μk Ð Aμk ́1
Σk ÐAΣk ́1AT `Rk
Step 2: Update
Kk Ð ΣkCT pCΣkCT ` Qkq ́1 μ Ðμ `Kpz ́Cμq
0000110000 kkkkk
—0001881000ffi —ffi
—001827278100ffi —ffi
—0 1 8 27 81 81 27 8 1 0ffi —ffi
—1 8 27 81 253 253 81 27 8 1ffi
Σk ÐΣk ́KkCΣk return μk,Σk
Fig. 3: Kalman Filter Algorithm
Furthermore, the measurement z for the KF was simply
acquired by obtaining the coordinates of the mode of I pkq K
for all frames k .
We shall highlight that we keep the measurement noise
covariance Qk variable. We tune it using the variance of the coefficients of IKpkq. In particular, the higher is the variance (which means that we might have a clear mode in IKpkq) the lower valued we set Qk. In contrast, when the variance is low we set a high valued Qk. This way, we have a mechanism to put more weight on the motion model whenever the measurements are poor and vice-versa.
1) Constant Velocity: For this approach we used a state vector of size 4 ˆ 1, shown in (10). The first two positions are the pixel coordinates of the object (pk,1 and pk,2) and the two last are the discrete velocity of the object (vk ́1 and vk,2),
»fi
—ffi K“—1 8 27 81 253 253 81 27 8 1ffi (7)
— —
0 1 8 27 81 81 27 8 1 0
ffi ffi
k
0 0 1 8 27 27 8 1 0 0 —–0 0 0 1 8 8 1 0 0 0ffifl 0000110000
to the binary image as
IK pkq “ K ̊ IB pkq, (8)
where ‘ ̊’ stands for the convolution operator and we removed the outer rows and columns of the IK pkq in order for it to match the dimensions of the original frame. Note that K is a circular kernel (with cubic coefficients) which is what we need when the tracking object is round. Next, IKpkq, which is illustrated in Fig 2, was used to include the measurement information in the posterior estimation on both KF and PF. We shall see this later.
In the following, let us detail our implementations.
A. Kalman Filter
In this project, we have considered two different motion models: (i) Constant Velocity and (ii) Constant Acceleration.
pk,1 —pk,2ffi
xk“—v ffi. (10) – k,1fl
vk,2
COURSE PROJECT, EL2320 APPLIED ESTIMATION
4
Note that we only consider two dimensions of motion, since
we are working with images which are two-dimensional.
Moreover, we have vk,1 “ v1 and vk,2 “ v2 for all
»fi
k “ 1,…,K. —
From physics, for a constant velocity linear model the position pk can be obtained using the position pk ́∆k, the velocity v and the incremental time difference ∆k, namely
pk “ pk ́∆k ` ∆kv.
In our case, we have that ∆k “ 1, which means that pk “
»fi»
p0,1 z0,1
fi
pk ́1 ` v. Thus, the state transition matrix is given by »1 0 1 0fi
—0 1 0 1ffi A“—0 0 1 0ffi.
–fl
0001
—p0,2 ffi — —ffi—
z0,2 ffi ffi
z0,1 ́ z ́1,1 ffi z ́z ffi.
Since we only observe the position of the object, the size
of the measurement vector z k
—ffi
zk “ zk,1 zk,2
and the mapping is given by the 2 ˆ 4-sized matrix „
(12)
C“ 1 0 0 0 . (13) 0100
The initial state, i.e. x0, is characterized by a mean vector μ0 and a covariance matrix Σ0. The first is estimated using the first two measurements z ́1 and z0. In particular
»fi» fi
p0,1 z0,1
—p0,2ffi — z0,2 ffi μ0“—vffi“—z ́z ffi –1fl–0,1 ́1,1fl
v2 z0,2 ́ z ́1,2
and the second is initialized with high coefficients
»fi
10 0 0 0
—0 10 0 0ffi Σ0 “— ffi.
(11)
–00100fl kK
0 0 0 10
Using the algorithm from Fig. 3 the object tracking task can be performed.
2) Constant Acceleration: This approach considers vari- ability in the velocity according to a constant acceleration value a. The dynamics are now given by
ˇ
where IK pkqˇ denotes the kernel value of the particle xrms
pk “ pk ́∆k ` vk ́∆k∆k ` .5a∆k2, vk “ vk ́∆k ` a∆k.
In this regard, the transition matrix is now
m“1
(14)
As of the re-sampling method, we have used systematic re- sampling, since it only generates one random number (instead of M random numbers, as the multinomial sampling would).
—v0,1 ffi —
μ0“—v ffi“—
0,2 ́1,2 ffi – a1 fl –pz0,1 ́ z ́1,1q ́ pz ́1,1 ́ z ́2,1qfl
a2 pz0,2 ́ z ́1,2q ́ pz ́1,2 ́ z ́2,2q The initial covariance matrix remains the same, i.e.
»fi
10 0 0 0 0 0 —0 10 0 0 0 0ffi
is 2 ˆ 1, i.e. „—ffi
—
001010
ffi
10100.5 0 0101 0 0.5
ffi
A“000101. —ffi
–0 0 0 0 1 0fl 000001
The initial estimate is obtained similarly as in the previous case. However, we now need the first three measurements
—ffi
—0,2ffi—
—ffi— ffi
—0 0 10 0 1 0ffi
–0 0 0 0 10 0fl 0 0 0 0 0 10
B. Particle Filter
We propose a very simple implementation of the PF. In particular, we only use the process noise to inspect new states. In particular, we use the following motion model
xrms “ xrms ` ηrms, (15) k k ́1
where ηrms „ Np0,Σηq is the diffusion noise for particle m and we have defined Ση “ diagp25, 25q.
Next, we assigned the weights to the particles proportionally to the values in IK pkq corresponding to their location. That is, higher weight was given to those particles that are located in highly likely regions for the ball. This way, we were able to include the information given by the measurements and update the state belief. Formally, we can write this as
ˇ
wrms9 I pkqˇ , (16)
m
mk
at time step k, respectively. As aforementioned, we need to ensure normalization of the weighting factors. At each iteration k, we can estimate the state computing the weighted sum of the particles coordinates, i.e.
—ffi Σ0“—0 0 0 10 0 0ffi.
est xk “
M
ÿ rms rms
wk xk (17)
COURSE PROJECT, EL2320 APPLIED ESTIMATION
5
particle filter (Xk ́1, zk) Step 1: Diffusion
X ̄ k “ X k “ 0 form“1toM dodo
rms rms sample xk „ ppxk|xk ́1q
wkrms “ ppzk|xrmsq k
X ̄ ` ă x r m s , w r m s ą kkk
end for
Step 2: Resampling
form“1toM dodo
draw particle i with probability 9wris
kalmanized particle filter (μk ́1, Σk ́1, zk) zPF Ð filtered frame at time step k
if no white region in zPF then k
Increase Q in KF end if
X Ð particle filter (X ,zPF) k k ́1 k
KF ř rms zk Ð1{M xrmsPXk xk
k
return μk
In the following, we attach links to YouTube videos where you can see how each method performed:
‚ KF constant velocity (video)
‚ KF constant acceleration (vide)
‚ PF
‚ KPF constant velocity (video)
‚ KPF constant acceleration (video)
Table I lists the MSE of the estimations for the first 15 seconds where no particle deprivation happened. To compute the MSE, we used as a reference image processing tools comparing pixel values and removed the artificially added occlusion regions. Furthermore, Fig. 6a and Fig. 6b illustrate the squared error for all frames within the 15 second sequence.
TABLE I: Results when there is no particle deprivation.
k
,Σ
Fig. 5: Kalmanized Particle Filter Algorithm
μ ,Σ Ð kalman filter (μ
k k ́1 k ́1
,zKF) k ́1 k
add xris to Xk k
end for return Xk
Fig. 4: Particle Filter Algorithm
C. Combination
k
As the reader might have noticed, so far we have presented traditional approaches for object tracking, namely KF and PF. The later, in our implementation, did not even include any motion model. In the following we present our combination of KF and PF which aims to combine both approaches to generate an even more robust model. Hence, our goal here is not to provide a very complex model but a simple robust model that takes advantage of the aforementioned models.
We first use the Particle Filter, as previously explained, to obtain an estimation of the position of the object, i.e. xest as
proposed in (17). We then use it as the measurement for KF at time step k. We use the motion models previously explained (constant velocity and constant acceleration).
Whenever there are occlusions, i.e. the ball is occluded, the PF stops working properly, since it has no good measurements and only the motion model is insufficient (it only spreads the particle cloud, remember (15)). This means that the PF can not provide the KF with good measurements (or estimations). To overcome this, we alert the KF whenever no white regions are detected after the image processing. This is done by setting the measurement noise covariance matrix Qk to a very high value, which ensures that the KF will rely on its motion model to estimate the position of the object rather than on the measurements, which are poor. As explained in the description of the KF method, we set Qk by using the variance of the matrix IKpkq, obtained after filtering the original image and applying the circular kernel from (7).
The algorithm for this combined approach is shown in Fig. 5.
IV. RESULTS
In this section we show and briefly discuss the results obtained when using the methods explained so far for object tracking. As already highlighted in the previous section, we will use a 1 minute length video which captures a Nintendo Pinball game play. However, we restrict the analysis to the first 15 seconds, since it is enough to gain some insights about the performances of the respective filters.
k
Method
KF (constant velocity)
KF (constant acceleration) PF
KPF (constant velocity) KPF (constant acceleration)
Mean square error
41.70 48.08 110.15 84.32 88.68
In addition, Table I shows the MSE results when particle deprivation happens. As expected, we observe a notable de- crease in the performance of of the PF and KPF methods. Additionally, Fig. 6c illustrates the time evolution of the squared error.
TABLE II: Results when particle deprivation happens.
Method
KF (constant velocity)
KF (constant acceleration) PF
KPF (constant velocity) KPF (constant acceleration)
A. Discussion
Mean square error
41.7 48.08 8983.22 3568.71 9003.51
From the results, we observe that the performance of the PF and the KPF highly depends on whether particle deprivation
COURSE PROJECT, EL2320 APPLIED ESTIMATION
6
3500
3000
2500
2000
1500
1000
500
Mean Square Error
Particle Filter
Kalman Filter
Combined Filter
0
0 50 100 150 200 250 300 350 400 450
Frame number
(a) Squared error for each frame for a constant velocity model. Kalman Filter turned out to perform at best
4000 3500 3000 2500 2000 1500 1000
500
Mean Square Error
Particle Filter
Kalman Filter
Combined Filter
0
0 50 100 150 200 250 300 350 400 450
Frame number
(b) Squared error for each frame for a constant acceleration model. Kalman Filter turned out to perform at best
5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0
×104
Mean Square Error
Particle Filter
Kalman Filter
Combined Filter
Squared euclidean distance
Squared euclidean distance Squared euclidean distance
0 50 100 150 200 250 300 350 400 450 Frame number
(c) In this execution, we observe the phenomena of particle deprivation. This is seen in the high values of squared error for some frame intervals, which ultimately correspond to instances when the ball is occluded and hence its track is lost by the particle cloud.
Fig. 6: MSE for each frame for different executions of the 15 second frame sequence. Peaky values are usually associated with frame intervals where the ball is occluded.
occurs or not. If that is the case, i.e. particle depletion occurs, then their performance is extremely poor. This is due to the fact that the particle cloud loses track of the ball and is not able to obtain new accurate measurements. Find an example of this in our pinball video in thislink. A possible solution to this is to modify the diffusion step, by increasing the coefficients of the noise matrix which ultimately allows the PF to explore more states. In this regard, we have to bear in mind that, as shown by (15), we are using a very simplistic PF with no motion model at all.
When there is no particle deprivation, from the results in Tables I and II we can assert that the combination of PF and KF slightly increases the performance of the PF alone.
Occlusion intervals are shown in Fig. (6) as peaks in the squared error. In this regard, we observe that the KF is much less affected by them. One of the keys here has been the tuning of Qk when there are occlusions, previously explained.
V. CONCLUSION
We conclude that using KF after PF can slightly improve the performance of the PF alone. However, this has been proved for a very simple PF and should be verified in the future with a more complete PF. In particular, we observe that the combination, i.e. KPF, is still very sensitive to particle deprivation when there are occlusions. Hence, it might be interesting to explore techniques, such as randomly adding new particles or increasing M to reduce its effects.
COURSE PROJECT, EL2320 APPLIED ESTIMATION 7
In addition, we can assert that kernels can be very helpful in the filtering of the image in order to obtain suitable measurements. Nonetheless, in the future further Kernels should be tested and a more detailed analysis should be done about it.
Furthermore, we find it interesting to remark the fact that we have used a simple linear Kalman Filter and not any of its variants (EKF, UKF, iEKF). This might be due to the simplicity of the ball movements in the video. In the future, it can be enriching to probe with videos of different nature, where such variants might be helpful.
VI. ACKNOWLEDGEMENT
We have partially used some of the code of LAB2, in particular those functions related with the re-sampling. For this, we acknowledge course Teacher Prof. John Folkersson and course TAs as its authors.
REFERENCES
[1] Bringing big data to the enterprise. https://www-01.ibm.com/software/ data/bigdata/what-is-big-data.html. Accessed: 2016-27-12.
[2] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A survey. Acm computing surveys (CSUR), 38(4):13, 2006.
[3] Ted J Broida and Rama Chellappa. Estimation of object motion parameters from noisy images. IEEE transactions on pattern analysis and machine intelligence, (1):90–99, 1986.
[4] DavidBeymerandKurtKonolige.Real-timetrackingofmultiplepeople using continuous detection. In IEEE Frame Rate Workshop, page 53. Citeseer, 1999.
[5] Romer Rosales and Stan Sclaroff. 3d trajectory recovery for tracking multiple objects and trajectory guided recognition of actions. In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on., volume 2. IEEE, 1999.
[6] Jun S Liu and Rong Chen. Sequential monte carlo methods for dynamic systems. Journal of the American statistical association, 93(443):1032– 1044, 1998.
[7] N Shepard and MK PITT. Filtering via simulation: auxiliary particle filter. Journal of the American Statistical Association, 94:590–599, 1999.
[8] B-N Vo, Sumeetpal Singh, and Arnaud Doucet. Sequential monte carlo methods for multitarget filtering with random finite sets. IEEE Transactions on Aerospace and electronic systems, 41(4):1224–1245,
2005.
[9] Duc Phu Chau, Franc ̧ois Bremond, and Monique Thonnat. Object track-
ing in videos: Approaches and issues. arXiv preprint arXiv:1304.5212,
2013.
[10] Dorin Comaniciu, Visvanathan Ramesh, and Peter Meer. Kernel-based
object tracking. IEEE Transactions on pattern analysis and machine
intelligence, 25(5):564–577, 2003.
[11] RudolphEmilKalman.Anewapproachtolinearfilteringandprediction
problems. Journal of basic Engineering, 82(1):35–45, 1960.