ACS6124 Multisensor and Decision Systems Part I: Multisensor Systems
Lecture 3: Signal Estimation
George Konstantopoulos g.konstantopoulos@sheffield.ac.uk
Automatic Control and Systems Engineering The University of Sheffield
(lecture notes produced by Dr. Inaki Esnaola and Prof. Visakan Kadirkamanathan)
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Extraction of Information from Sensor Signals
Consider noisy signal measurements from a sensor. Extraction of useful information from the sensor requires the removal of noise and estimating the underlying true value of the variable of interest.
Measurement model: yt = xt + vt
Data: Y1:T {yt |t = 1, · · · , T } measurements
The estimation problem is to determine the most probable values xt , given all observation Y1:T .
Noise removal requires additional information about the underlying trend in the signal xt and the noise signal vt .
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Probability distributions of noise and observation
Measurement model:
yt =xt+vt t=1,···,T
Consider a single observation yt . What is E[xt |yt ]?
Assumption: Independent zero mean noise E[vt ] = 0. Further vt is Gaussian distributed with variance σ2.
1 −1 [vt]2 p(vt)=√ e 2 σ2
2πσ The probability distributions p (vt ) and p (yt |x ) are:
Where vt ∼ N(0,σ2) and p(yt |x) ∼ N(x,σ2)
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Minimum Mean Square Error (MMSE) Estimation
Assumption 1: xt = x is a steady state variable
Assumption 2: vt is zero mean, independent and identically distributed (iid) The estimation problem can be viewed as a minimum mean squared error estimation problem in which the Mean Squared Error (MSE) cost function is minimised:
JMSE(x)=E[(yt −x)2]
The optimal (minimum mean squared error – MMSE) estimate minimises JMSE :
xˆ = arg min JMSE (x ) x
Consider the full set of observations:
1T2 JMSE(x)≈T ∑[yt−x]
t=1
With a little algebra,
JMSE(x)≈(y )−2xy ̄+x , y ̄= T ∑yt, (y )= T ∑[yt]
̄22 1T ̄21T2
t=1 Minimising JMSE (x ) w.r.t x , xˆMMSE = y ̄ = E[y (t )|Y1:T ]
2
The estimate has a distribution of xˆ ∼ N(y ̄, σ MMSE T
t=1
(mean of measurements)
)
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Central Limit Theorem
Central Limit Theorem
Let y1,y2,…,yT a sequence of independent and identically distributed (i.i.d.) random variables with mean μ and variance σ2. If ST = y1 +y2 +…+yT then
ST −Tμ
√ −−−→ N ( 0 , 1 )
σ T T→∞
Remarks:
It is important to check that the independence and distribution equality conditions are satisfied.
A particularly useful result is the average sum of of i.i.d. random variables:
1 ST − μ T
−−−→ N ( 0 , 1 )
σ
√ T→∞ T
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Maximum Likelihood Estimation
Assumption 1: yt = x + vt – Model Assumption 2: vt ∼ N(0,σ2) – Noise
Assumption 3: vt are independent and identically distributed (iid) – Noise
Data: {y1,y2,··· ,yT } ≡ Y1:T The maximum likelihood estimate:
xˆML = arg max p (Y1:T | x ) where p (Y1:T | x ) is likelihood function x
Noting that maximising the likelihood function will give the same optimal estimate as maximising the log-likelihood function:
TT JLL(x)=lnp(Y1:T |x)=ln ∏p(yt |x) = ∑ln(p(yt |x))
T 1 −1[yt−x]2
JLL(x)=∑ln√e2σ2 =Tln√ +∑− 2
t=1
t=1
1T1[yt−x]2 t=1 2πσ 2πσ t=1 2 σ
Log-likelihood is similar to mean squared error function: xˆML ≡ xˆMMSE = y ̄
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Incorporating Prior Knowledge
Prior knowledge leads to a more accurate estimator.
Consider the previous example and assume that x ∈ [−∆, ∆] instead of any real value.
The estimator derived above yields P[xˆ ∈/ [−∆,∆]] ̸= 0 Alternatively, consider the truncated sample mean estimator:
− ∆ xˆ < − ∆
x ̃ = xˆ − ∆ ≤ xˆ ≤ ∆ ∆ xˆ>∆
The PDF of the random variable x ̃ modelling the estimator is
p (x ̃) = P[xˆ ≤ −∆]δ (x ̃ + ∆) + P[xˆ ≥ ∆]δ (x ̃ − ∆) + p (xˆ) [u(x ̃ + ∆) − u(x ̃ − ∆)]
u(·) is the step function.
If xˆ ∈ [−∆,∆] then MSE(xˆ) > MSE(x ̃)
We have reduced the MSE by allowing the estimator to be biased
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Incorporating Prior Knowledge
The prior knowledge about the parameter x available before could have been modelled as x ∼ U[−∆,∆]: The x itself is a random variable.
Consider a Bayesian estimator x∗ with this prior knowledge formulation. Bayesian MSE: The distribution of the parameter can be included as:
∗ ∗2∗2
BMSE(x )=E[(x−x ) ]= (x−x ) p(Y1:T, x)dY1:Tdx
where Y1:T = [y1,y2,…,yT ] is the vector of observation (a particular realization). Compare MSE to the case in which x is not random:
MSE(xˆ) = E[(x −xˆ)2] =
Can the MSE be minimized for the Bayesian case? Yes
(x −xˆ)2p(Y1:T )dY1:T
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Bayes’ Theorem for Two Random Variables
The definition of conditional probability yields that the conditional p.d.f. of random variable Y given X is given by
p(y|x) = p(x,y) p(x)
If x and y are independent then p(y|x) = p(y) and p(x|y) = p(x) Bayes’ Theorem for Two Random Variables
For random variables x and y the following holds p(x|y) = p(y|x)p(x)
or equivalently
Remark: Generalization to n random variables is straightforward
p(y) p(y|x)p(x)
p(x|y)= ∞ p(y|x)p(x)dx −∞
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Optimal Bayesian Estimation
We want to determine estimator x∗ that minimizes the MSE The joint PDF is decomposed using Bayes’ Theorem
p(Y1:T , x) = p(x|Y1:T )p(Y1:T )
and therefore
MSE(x ) = (x −x ) p(x|Y1:T )dx p(Y1:T )dY1:T
∗∗2
Since p (Y1:T ) > 0 and is not a function of x , minimizing the MSE w.r.t x is
equivalent to minimize the term inside parentheses. Miminisation
∂∗2 ∗
∂x∗ (x−x ) p(x|Y1:T)dx =0 =⇒ x =E[x|Y1:T]
The estimator is also random x∗ = E[x|Y1:T ]
Minimizer is mean of posterior PDF. Also called minimum mean square error (MMSE) estimator.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Optimal Bayesian Estimation
In order to compute x ∗ = E[x |Y1:T ] we need the conditional PDF of the estimator, i.e., the a posteriori distribution p (x |Y1:T ).
Using Bayes’ Theorem yields
p(Y1:T |x)p(x) p(Y1:T |x)p(x)
p(x|Y1:T ) = p(Y1:T ) = p(Y1:T |x)p(x)dx
Assuming that x and vt are independent for all t and since v1 , · · · , vT is i.i.d.
T
p (Y1:T |x ) = ∏ p (yt |x )
t=1
T 1 1 2
=∏√ 2exp −2σ2(yt−x) t=1 2πσ
11T
= (2πσ2)T/2 exp −2σ2 ∑(yt −x)2 t=1
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Optimal Bayesian Estimation
Recall that x ∼ U[−∆,∆]. The resulting a posteriori PDF is Assuming that x and vt are independent for all t and since v1,··· ,vT is i.i.d.
p(x|Y )=
2∆(2πσ2)T/2 exp −2σ2 ∑t=1(yt−x)
|x|≤∆ |x|>∆
1 1T 2
1:T
1 2∆(2πσ )
exp− 1 ∑T (yt−x)2dx
2σ 0
And after some algebraic manipulation, the PDF is given by
2T/2
2 t=1
2 111T
2 exp − σ2 x−T ∑t=1yt
|x|≤∆ | x | > ∆
with
0
∆ T2
p (x |Y ) =
111
C= 2exp−σ2 x−T y dx
−∆ 2π σ 2 T t=1 T
C 2π σ 2 1:T T T
∑t
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Optimal Bayesian Estimation
The conditional PDF determines the MMSE Estimator: x∗ = E[x|Y1:T ]
∞ −∞
xp(x|Y1:T )dx
=
=
∆ x
−∆
1 1 T 2
2 exp − σ2 x−T∑t=1yt dx
∆ 1
−∆
1 1 T 2
2 exp − σ2 x−T∑t=1yt dx
2π σ TT
2
2π σ TT
2
Remarks:
If there are no observations x∗ = E[x] = 0.
Ifσ2 →0thenx∗=xˆ. T
Data balances the weight of the prior knowledge (x = 0) and the evidence arising from the data (x = xˆ).
As T increases the estimator relies less on the prior knowledge and more on the data.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Bayesian Estimation with Gaussian Priors
A Gaussian prior is considered for the previous example Let the a priori knowledge be given by the following PDF
Recall that:
1 1 2 p(x)=2πσx2exp −2σx2(x−μx)
11T p(Y1:T|x)= (2πσ2)T/2 exp −2σ2 ∑(yt −x)2
The posterior probability is given by: p(Y1:T|x)p(x) 1
t=1
1 2
p(x|Y1:T ) = p(Y1:T |x)p(x)dx = (2πσ2 x|Y1:T
)1/2 exp −2σ2 x −μx|Y1:T x|Y1:T
where
1(∑T y(t))+μx μx|Y =σ2 t=1 σx2
1:T T+1 σ2 σx2
and σx2|Y = T+1 1:Tσ2σ2 x
−1
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Bayesian Estimation with Gaussian Priors
The resulting posterior probability is also Gaussian distributed.
The mean and variance of the resulting posterior distribution includes the
observationsy1,···,yT andthepriordistributionstatistics The MMSE estimator is given by
E[x|Y1:T]=μx|Y1:T =
1(∑T y)+μx 2 T σ2 σ2t=1tσx2σx1 T
T 1 = 2 σ2 σ 2 + σ x2 σ x + T
T ∑yt t = 1
DATA
+ 2 σ2 μx σ x + T
PRIOR
Interplay between prior and data:
When large amount of data is available: σ2 ≫ σ2 then
xT When little data is available: σ2 ≪ σ2 then E[x|Y
E[x|Y1:T ] ≈ 1 ∑Tt=1 yt T
] ≈ μx Uncertainty reduction: the resulting variance is smaller than the initial one:
xT 1:T
2 T 1−1 σx2σ2 σ2 2 σx2 σ2
σx|Y1:T =
2 + 2 σx
=
2
2 =
2 2 σx = 2 σ2
σ
Tσx +σ
Tσx +σ
<1
σx + T T
<1
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Maximum A Posteriori Estimate (MAP) Estimate
If prior information about x is available, then an alternative to the Bayesian MMSE estimate is the maximum a posteriori (MAP) estimate.
xˆMAP =argmaxp(x|Y1:T) x
MAP estimate gives the mode of the posterior distribution p (x |Y1:T ). MMSE estimate gives the mean of the posterior distribution p (x |Y1:T ). When the posterior distribution is symmetric, xˆMAP ≡ xˆMMSE .
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Scalar Linear MMSE Estimation
Setting:
Let x ∈ R be a parameter to be estimated based on the observations y=[y1,···,yT ]. The parameter and the observations are modelled as random variables with joint PDF given by p(x,y) that is not known, only first and second order moments are available to perform the estimation, i.e., mean and covariance matrix. What is the best (in the MSE sense) linear estimator?
Available information:
μ=μxμy and Σxy=Σ⊤yx Optimality criterion (Bayesian MSE):
Structure of the estimator:
Definition of Linear Estimator
2 BMSE(xˆ)=E (x−xˆ(y))
The class of linear estimators is given by estimators of the form
T
xˆ = ∑ α t y t + α 0
t=1
where the estimator is determined by the choice of the weighting coefficients αt
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Scalar Linear MMSE Estimation
Evaluating the BMSE with the linear estimator gives:
T 2
Minimising with respect to α0, T
2 T
= 2E x − ∑ αt yt − α0
∂ E ∂α0
and we obtain:
x −
∑ αt yt + α0 t=1
= 0
BMSE(xˆ)=E x− ∑αtyt +α0 t=1
T
α0 =E[x]−∑αtE[yt]
t=1
Note that when all means are zero α0 = 0 (affine vs. linear estimator)
t=1
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Scalar Linear MMSE Estimation
Substituting the optimal value of α0 in the BMSE
T 2
BMSE(xˆ)=E ∑αt (yt −E[yt])−(x−E[x]) t=1
Denote α = [α1,α2,··· ,αT ] and note that
T 2
BMSE(xˆ) = E α (y−E[y])−(x −E[x]) = α⊤Σyyα−α⊤Σyx −Σxyα+Σxx
Taking the gradient to minimise yields
∂ BMSE(xˆ) = 2Σyyα −2Σyx = 0 ⇒ α = Σ−1Σyx
∂α Combining it with α0
yy
xˆ = α⊤y+α = E[x]+Σ⊤ Σ−1y−Σ⊤ Σ−1E[y] 0 yx yy yx yy
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Scalar Linear MMSE Estimation
Linear MMSE (LMMSE) Estimator
The linear MMSE Estimator of a scalar parameter x given observations y is given by xˆ =μx+ΣxyΣ−1(y−μy)
Note that
LMMSE yy
Remarks:
The LMMSE estimator is suboptimal in general
Only optimal if MMSE estimator is linear
For jointly Gaussian x and y: xLMMSE ≡ xMMSE
The minimum BMSE for an LMMSE estimator is given by BMSE(xˆ ) = Σxx −ΣxyΣ−1Σyx
LMMSE yy
BMSE (xˆLMMSE ) ≥ BMSE (xˆMMSE )
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Conditional PDF of Multivariate Gaussian
The reproducing property guarantees that given a joint Gaussian distribution the marginals and condition PDFs are also Gaussian.
Conditional PDF of Multivariate Gaussian
Letx=[x1,···,xM]andy=[y1,···,yN]bejointlyGaussianwithmeanμ=[μx μy]and
block covariance matrix and joint PDF
Σxx Σxy Σ= Σyx Σyy
1 −1 ⊤ p(x,y)=(2π)M+N|Σ|exp ([xy]−μ)Σ ([xy]−μ) ,
where x ∈ RM and y ∈ RN . The conditional PDF p(x|y) is also Gaussian, distributed as N(E[x|y],Σx|y) with
E[x|y] = μx + ΣxyΣ−1(y − μy) yy
Σ = Σxx −ΣxyΣ−1Σyx. x|y yy
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Sequential LMMSE Estimation
What is Sequential LMMSE: The process of updating the LMMSE estimator as new data becomes available
Example - Setting:
Consider the observation model
yt =x+vt t=1,···,T
where x ∼ N(0,σx2) is a parameter to be estimated and vt ∼ N(0,σ2).
Note that since x and yt are jointly Gaussian in this case the MMSE estimator is the LMMSE estimator
Let xˆT = E[x |y] denote the LMMSE estimator based on data up to T , i.e., y = [y1,y2,...,yT ]
Recall:
1(∑Ty) 2 T xˆT=σ2t=1t=σx ∑yt
T+1 Tσ2+σ2 σ2σx2 xt=1
T 1 − 1 σ 2 σ x2
BMSE(xˆT)= 2+2 = 2 2 σ σx Tσx +σ
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Sequential LMMSE Estimation
Update of the estimator when yT +1 becomes available: σ x2 T + 1
xˆT+1=(T+1)σ2+σ2 ∑yt x t=1
σx2 T
∑yt +yT+1 t=1
( T σ x2 + σ 2 )
= (T +1)σx2 +σ2 xˆT + (T +1)σx2 +σ2 yT+1
= (T +1)σ2 +σ2 x
σ x2
Define the gain factor: σx2
σ2σx2
2= 2 2 2 2 2=
BMSE(xˆT ) BMSE(xˆT)+σ
KT+1= 2 (T+1)σx +σ
2
= xˆT +
σx2
(T +1)σx2 +σ2 (yT+1 −xˆT )
Scaled version of prediction error: yT+1−xˆT
BMSE update:
σ σx +σ Tσx +σ BMSE(xˆT+1)) = (1−KT+1)BMSE(xˆT ))
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems