ACS6124 Multisensor and Decision Systems Part I: Multisensor Systems
Lecture 5: Multisensor Estimation
George Konstantopoulos g.konstantopoulos@sheffield.ac.uk
Automatic Control and Systems Engineering The University of Sheffield
(lecture notes produced by Dr. Inaki Esnaola and Prof. Visakan Kadirkamanathan)
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Multisensors for Environmental Monitoring
Monitoring of environmental hazards and pollution is being carried out with combining information from multiple sensors: Satelite imagery, camera based optical sensing, stationary and drone based chemical sensing etc.
The sensors are measuring the spatial extent of the pollution through different modes. Camera provides optical imagery of the pollution spatial region.
Chemical sensors provide pollution levels at specific spatial points.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Multisensor Types
Complementary Sensors:
Do not depend on each other directly, but can be combined to give a
more complete image of the environment.
Complementary data can often be fused by simply extending the limits of the sensors.
Competitive Sensors:
Provide independent measurements of the same information to give
increased reliability and accuracy.
Inconsistencies between sensor readings must be reconciled to remove uncertainties.
Cooperative Sensors:
Provide independent measurements, that when combined provide
information that would not be available from any one sensor. Cooperative sensor networks construct a new abstract sensor with data that does not resemble the readings from any one sensor.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Modes of Sensor Fusion
Data In-Data Out (DIDO)
Direct sensor measurements fused to form virtual sensors using signal estimation.
Example – Gyro and accelerometer data combined to estimate tilt angle.
Data In-Feature Out (DIFO)
Data from different sensors combined to extract some form of features or descriptors.
Example – Vibration signals transformed by Fourier transform into frequency components.
FeatureIn-FeatureOut(FIFO)
Features extracted from different sensor systems are fused to form higher level features.
Example – Image shape features combined with radar range information to estimate object volumetric size.
Feature In-Decision Out (FIDO)
Inputs features fused to form a decision using pattern recognition and classification.
Example – Feature vectors of health symptoms are classified disease classes.
Decision In-Decision Out (DIDO)
Decisions derived at a lower level are fused to form high level decisions using statistical decision theory.
Example – Weighted methods (voting techniques) and inference.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Data Level / Feature Level Sensor Fusion
Data Level Fusion
Feature Level Fusion
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Estimation from Signals over Time
Given the system measurement model: yt = xt + vt
Given the signal data over time: Y1:T = [y1,y2,··· ,yT ] Recall the following results:
If xt is constant and the measurements are independent with i.i.d noise, the MMSE estimate is given by:
1T
xˆMMSE = T ∑ yt Mean of all measurements
t=1
If xt is time varying and there is correlation between measurements, the Linear
MMSE estimate is given by:
xˆ =μx+ΣxyΣ−1(y−μy)
LMMSE yy
These results extend readily to multiple N number of sensors. For example,
y1 1 v1 y2 1 v2
. = . x+ . . . .
yN 1 vN
The MMSE estimate is simply the mean of all sensor measurements.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Vector Linear MMSE Estimation
Extension to a multidimensional setting of the scalar LMMSE The aim is to minimize the BMSE for each element
Setting:
Letx=[x1,···,xM]∈RM beaparametervectortobeestimatedbasedonthe observations y=[y1,···,yN]. The parameter and the observations are modelled as random variables with joint PDF given by P(x,y). Only first and second order moments are available.
The estimator for the m-th element:
xˆm =μx +Σx yΣ−1(y−μy) for m=1,···,M m m yy
Combining all estimators in matrix form −1
μx1 Σx1yΣyy (y−μy) μ Σ Σ−1(y−μ)
x2 x2yyy y
=μx+ΣxyΣ−1(y−μy) . yy
xˆ= . + ..
.. μx Σx yΣ−1(y−μy)
M Myy Error covariance matrix:
⊤ −1
Σee=E (x−xˆ)(x−xˆ) =Σxx−ΣxyΣyyΣyx⇒BMMSE(xˆm)=(Σee)m,m
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Conditional PDF of Multivariate Gaussian
The reproducing property guarantees that given a joint Gaussian distribution the marginals and condition PDFs are also Gaussian.
Conditional PDF of Multivariate Gaussian
Let x = [x1,··· ,xM ] and y = [y1,··· ,yN ] be jointly Gaussian with mean μ = [μx μy] and block covariance matrix
Σxx Σxy Σyx Σyy
Σ=
and joint PDF 1 (2π)M+N |Σ|
wherex∈RM andy∈RN.TheconditionalPDFp(x|y)isalsoGaussian,distributedasN(E[x|y],Σx|y)with E[x|y] = μx +ΣxyΣ−1(y−μy)
Σ = Σxx −ΣxyΣ−1Σyx. x|y yy
−1 ⊤ p(x,y)= exp ([xy]−μ)Σ ([xy]−μ) ,
yy
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Conditional PDF of Multivariate Gaussian
Consider the general linear model with multi-sensor measurements:
y=Hx+v
where
x is a vector of random variables distributed as N(μx,Σxx).
H ∈ RN×M is an observation matrix.
v is a noise vector of random variables distributed as N(0,Σvv),
Posterior PDF of Bayesian General Linear Model
For the observation model given above, the posterior PDF p(x|y) is Gaussian with
mean
and covariance
⊤ ⊤ −1
E[x|y] = μx +ΣxxH HΣxxH +Σvv (y−Hμx)
⊤ ⊤ −1 Σx|y = Σxx −ΣxxH HΣxxH +Σvv
HΣxx
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Gauss-Markov Theorem in a Bayesian Framework
Bayesian Gauss-Markov Theorem
Given an linear observation model
y=Hx+v
where x ∈ RM is the parameter vector with mean μx and covariance matrix Σxx,
H ∈ RN×M is the observation matrix, y ∈ RN is the observation, and v ∈ RN is the noise vector with zero mean and covariance Σvv and is uncorrelated with x. Then the LMMSE estimator is given by
⊤ ⊤ −1
xˆ = μx +ΣxxH HΣxxH +Σvv (y−Hμx)
and error covariance matrix
Σee =Σ−1+H⊤Σ−1H−1 xx vv
Remarks:
Is suboptimal in general: Only optimal if E[x|y] is linear
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Static Transformation
Withasetofmeasurements{y1,y2,···,yT}whereyt is(N×1)dimensionalvector. Model: y = Hx+v
We wish to determine a lower dimensional feature z ← (n × 1) with n < N that captures most relevant information in the data.
Data dimensionality is reduced, but time is not compressed.
In general, z = f(y) where f(·) is a mapping function.
For linear mapping, a transformation matrix, T (of dimension (n×N)) can be used:
z=Ty
(n×1) (n×N ) (N ×1)
Desirable to have transformed feature z with mean value 0.
Given Y1:T , the collection of T measurements from N sensors, the mean and variance can be calculated as follows:
μy = E[y]
Σyy =E[(y−μy)(y−μy)⊤]
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Principal Component Analysis
The signal transformation problem is to determine an optimal linear matrix T in: z=Ty−μy
Note:
u ⊤1
⊤
E[z] = E[Ty−Tμy] = TE[y−μy] = 0
u2
T= . areofdimension(n×N)
. u⊤n
ui isofdimensions(N×1)and∥ui∥=1 Take the first component of z as,
z1 =u⊤1 (y−μy) and
Variance: E[z12]=E[u⊤1 (y−μy)(y−μy)⊤u1]=u⊤1 Σyyu1
Mean: E[z1] = 0
E[z1]=0
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Eigen Decomposition
To determine u1 that maximises u⊤1 Σyyu1 such that ∥u1∥ = 1, we calculate the eigen decomposition of the covariance matrix Σyy.
Σyy = WΛW⊤
where W is an orthonormal eigenvector matrix
λ1 0 ... 0
0 λ2 ... 0 andΛ= . . . .
. . .. . 0 0 ... λN
where λ1 ≥λ2 ≥···≥λn ≥λn+1 ≥···≥λN >0 The variance of the first extracted component is:
E[z12] = u⊤1 Σyyu1 = u⊤1 WΛW⊤u1
= q⊤1 Λq1 where q1 = W⊤u1
The maximum variance is achieved when q1 = [1, 0, · · · , 0]⊤ which results in E[z12 ] = λ1 In general, the covariance of features,
q⊤ 1
E[zz⊤]=Q⊤ΛQ=[I 0] Λ ̄ 0 I =Λ ̄, Q= . and qi =W⊤ui 0Λ ̃0 .
q⊤n
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Principal Components
The v that maximises E[zz⊤] will have the uncorrelated form
Unfortunately, principal component analysis suffers from a “scale” issue. This is where one axis dominates another due to scaling.
Scale issues can be avoided with the normalisation of each variable (zero mean, unit variance).
In the example, y2 will dominate the PCA feature signal as scaling distorts the signal variation.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Components Selection
The total variation in the features can be found by calculating trace{Λ ̄} = ∑ni=1 λi The total variation in the complete data is given by trace{Σyy} = ∑Ni=1 λi
The ratio of total variation in features to the total variation data (e.g., 90% can be used to select a number of features:
Ratio=∑ni=1λi =1−∑Ni=n+1λi ∑Ni=1 λi ∑Ni=1 λi
Scree test uses the plot of eigen spectrum to choose the value of n based on the knee point.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Principal Component Feature Extraction
1
2
Compute mean and covariance of each of the measured variable yi . Data normalisation to remove scale effects by
3
4 5
Eigen decomposition of Σyy to determine ordered eigen values λj and eigen vectors uj .
6
Extract features
Choose number of components Choose transformation matrix
y ̃i = yi −μyi Σyy =Y ̃1:TY ̃⊤1:T σi
T=
z t = T y ̃ t
u ⊤1 ⊤
u2 .
. u⊤n
ACS6124 Multisensor and Decision Systems
G. Konstantopoulos
Feature Extraction from Time Data
Signals from vibration sensors do not show invariance in features if static transformations are used.
Dynamic transformations assume stationarity of the signal (within a time window if quasi-stationarity is assumed).
Setting:
Consider a zero mean signal x to be estimated based on the observations modelled as a wide sense stationary (WSS) random process y1,y2,··· ,yT with zero mean and acquistion model yt = xt + vt .
The covariance matrix is Toeplitz with the following structure:
ryy [0]
ryy [1]
ryy[T −1] ryy[T −2] ··· ryy[0] Recall the Autocorrelation Function
ryy [k] = E[yt yt+k ]
Σyy = Ryy =
ryy [1] ryy [0] .
.
. . .. .
· · · · · · .
ryy [T − 1] ryy [T − 2] .
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Power Spectral Analysis
An invariant feature set stationary dynamic signals is in frequency domain.
Power spectral density is the Fourier Transform of autocorrelation function. The Discrete Fourier Transform (DFT) gives,
1 T−1 2πft Sf=√T ∑ryy[t]exp{−j T }
τ=0 Choosing the T ×T DFT matrix with the (t,f) term
1 2πft Ft,f = √T exp{−j T }
The covariance matrix can be rewritten as,
Ryy = F‡SyyF
where F‡ is the Hermitian transpose of F and Syy is the diagonal matrix of power
spectral magnitude values.
The multisensor PCA transformation based feature extraction becomes dynamic sensor signal power spectral transformation.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Dynamic Transformation – Spectral Analysis
Power spectral densities have specific features associated with particular sensor signals.
Spectral analysis makes no assumption about the underlying model structure of the system, although there is a link between ARMA models and the spectra.
Feature extraction in the frequency domain. Average energy within frequency bands
Peak frequencies, amplitudes and bandwidths
Track order amplitudes and bandwidths for rotational systems
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems