ACS6124 Multisensor and Decision Systems Part I: Multisensor Systems
Lecture 6: Multisensor Detection
George Konstantopoulos g.konstantopoulos@sheffield.ac.uk
Automatic Control and Systems Engineering The University of Sheffield
(lecture notes produced by Dr. Inaki Esnaola and Prof. Visakan Kadirkamanathan)
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Motivation – Process Plant Monitoring
Chemical process plants typically have many component subsystems with large number of sensors for monitoring.
Failures can be unsafe and costly.
Condition based predictive maintenance has significant cost reduction implications.
Operational simplicity
Plant operation mostly in steady state
Poor quality of data and models
Data sensitive to environmental conditions
High dimensional models with lower precision
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
The Health States
Normal: A system’s health is said to be in normal state, if its system characteristics are consistent with known and designed operating conditions.
Anomaly: A system’s health is said to be in anomalous state, if there is a change of atleast one system parameter or system characteristic from its known normal or designed operating conditions.
Fault: A system’s health is said to be faulty if there is an unacceptable change of atleast one system parameter or system characteristic from its known normal or designed operating conditions.
A system that is anomalous can be normal or faulty, since it is about unseen system behaviour.
A faulty system is still operable, albeit under reduced functionality.
Malfunction: A system’s health is said to be malfunctioning, if there is an intermittent irregularity to satisfying the required function under specified operating conditions.
Failure: A system’s health is said to have failed, if there is a permanent interruption to the system performing the required function under specified operating conditions.
A malfunctioning system is still operable.
A failed system is inoperable.
The task of a health condition monitoring system is to determine which of the above state the system is in.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Functions of a Health Monitoring System
Health condition monitoring systems often have a focus on identifying one particular state of the system.
Novelty Detection – detects the anomaly state Detects the deviation from the normal state
Assumptions are made to define the normal state
Knowledge of system normal behaviour is required Fault Detection – detects the faulty state
Detects unacceptale deviation in any system parameter Determines the on-set time of fault
Assumptions are made to define unacceptable deviations Fault Isolation – detects the faulty state and
Detects which system parameter (location) has changed Determines the class or type of fault
Determines the on-set time of fault
Knowledge of fault classes or types is required Fault Identification – detects the faulty state and
Detects which system parameter (location) has changed Determines the level or size of fault
Determines the on-set time of fault
Association between system parameter and fault classes is
assumed unknown Prognosis – predicts the failure state
Predicts probable course of an emerging fault
Assumption that anomalous event history leads to fault is made Association between fault and event history is assumed known
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Novelty Detection
Problem:
Determine if the observation y(t) ∈ Φ0 and hence what region constitutes Φ0 (normal state region)
The decision rule is as follows:
The system is “normal” if y(t ) ∈ Φˆ 0
The system is “anomalous” if y(t ) ∈/ Φˆ 0
We need to estimate Φˆ0 based on historical (training) data:
Y ̃1:t ={y ̃(1),y ̃(2),···,y ̃(T)}
Alternatively, decision rules can be applied directly to the measurement as in discordancy test.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
1-D Discordancy Test
Assumptions:
Anomalous events, if any, are rare
DatarecordsY ̃1:T ={y ̃(1),y ̃(2),···,y ̃(T)}arefromthe“normal”systemstate
Problem:
Given data y(t), is the system normal, i.e., y(t) ∈ Φ0, the normal region?
We define
y(t)−y ̄ σy
s(t)=
where
y ̄= T ∑y ̃(t) and σy = T ∑[y ̃(t)−y ̄]
1T21T2 t=1 t=1
The Decision Rule
δ is typically chosen on the basis
of a specific level of confidence. If−δ η (the threshold)
p(y)= =
1 1 ⊤−1 n 1 exp −2[y−y ̄] S [y−y ̄]
(2π)2 |S|2 11
n 1 exp −2D(y) (2π)2 |S|2
Substitute into the maximum likelihood test p (y) > η and then take the ln{·} −n ln(2π)− 1 ln|S|− 1D(y) > ln(η)
222
−D(y) > 2 2 ln(2π)+ 2 ln|S|+ln(η)
n1 D(y)<τ
where the threshold τ = −nln(2π)−ln|S|−2ln(η)
Therefore, the Maximum Likelihood Novelty Detection is equivalent to the Mahalanobis Distance Discordancy Test.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Maximum Likelihood Novelty Detection
Decision Rule:
If p(y)>η then y∈φ0
The discordancy test is consistent with the assumption that y ∼ N(y ̄, S) which is the multivariate Gaussian Distribution.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Non-Parametric Probability Density Estimation
Data from Φ0 may not be distributed according to the Gaussian Distribution.
The maximum likelihood (ML) novelty detection can be carried out with the suitable construction of probability density of data (i.e. non-parametric methods)
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Histogram
ti is the number of observations with values in the range associated with bin i T is the total number of observations
Φ is based on the likelihood > η Issues
The size and number of bins need to be chosen. There is a trade-off between smooth variation and the ability to construct a detailed estimate.
The sensitivity to the bin centre value.
The curse of dimensionality – the number of bins required grows exponentially with measurement dimension.
The approximated density is not smooth.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Kernel Density Estimation
Produces smooth probability density
Usable in high dimensions – accuracy decreases
with dimension of y
The kernel width a trade-off between
smoothness and accuracy
Noisy: σ is too small
1T
p(y)=T ∑K(y;y ̃(t))
t=1
A common choice of the kernel function, K(y;y ̃(t)) is
Over-smooth: σ is too large
Gaussian:
1 1[y−y ̃(t)]2 exp −
K(y;y ̃(t))= √
2πσ 2 σ2
where σ is the kernel width.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Data-Based Diagnosis
The formulation of Data-Based fault diagnosis problem is to develop a system based
on data that will partition the feature/signal space into fault classes. Therefore, pattern
recognition methods are needed.
If we have a single measurement y(t)
If the fault classes in signal space have been identified, then the decision rule is:
If y(t ) ∈ Φi , then the system is faulty with the fault class Φi To construct the fault class regions from regions:
Use a test rig to generate data associated with each fault to be diagnosed (by injecting the various faults in the system)
Data associated with the known fault classes are available from operational data
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Data-Based Fault Diagnosis
DatasetY1:T =[y(1) Φi(1)],y(2) Φj(2),…
where Y1:T is the labelled data set, y(t) represents the measurement and Φi(t)
represents the fault class.
Based on Y1:T , we need an algorithm (method) to
partition the measurement/signal space into
Φ0,Φ1,Φ2,··· Assumption
Φ0 ∩ Φ1 ∩ Φ2 . . . = ∅ Φ0 ∪ Φ1 ∪ Φ2 . . . = U
Where ∅ denotes mutual exclusivity (i.e. non-overlapping regions) and U is the universal set/region.
The multiclass partition can be solved by a number of two-class partitions.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Classification with Known Densities
Assumption
The two classes are Gaussian distributed where μ0,μ1 are the means of the two different classes while their covariances are the same S.
p(y|Φ0)= p(y|Φ1)=
The prior probabilities of the two classes are equal.
The MAP optimal test is the Neyman-Pearson Likelihood Ratio test
If p(y|Φ0)>1 Decide y∈Φ0 p (y|Φ1 )
1 1 ⊤−1 n 1 exp −2(y−μ0) S (y−μ0)
(2π)2 |S|2
1 1 ⊤−1
n 1 exp −2(y−μ1) S (y−μ1) (2π)2 |S|2
The likelihood ratio is given by,
p(y|Φ0) p(y|Φ1) =
1 exp−1(y−μ )⊤S−1(y−μ ) n1200
(2π)2 |S|2
1 exp−1(y−μ )⊤S−1(y−μ )
n1211 (2π)2 |S|2
exp−1(y−μ0)⊤S−1(y−μ0) 2
=1 ⊤−1 exp −2(y−μ1) S (y−μ1)
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Classification with Known Densities
The likelihood ratio is given by,
exp−1(y−μ )⊤S−1(y−μ )
p(y|Φ0) 2 0 0 p(y|Φ)= 1 ⊤−1
1 exp −2(y−μ1) S (y−μ1) 1 ⊤−1
⊤−1 =exp −2 (y−μ0) S (y−μ0)−(y−μ1) S (y−μ1)
⊤−11⊤−11⊤−1 =exp (μ0−μ1) S y−2μ0 S μ0−2μ1 S μ1
If we define w = S−1(μ0 −μ1) and w0 = −[1 μ0⊤S−1μ0 + 1 μ1⊤S−1μ1], 22
p(y|Φ0) = expw⊤y+w p(y|Φ1) 0
The decision rule p(y|Φ0) > 1 is then equivalent to w⊤y+w0 > 0 p(y|Φ1 )
Decision Rule: Ifw⊤y+w0 >0theny∈Φ0
The decision boundary is a linear function of data y.
If prior probabilities are not equal, the decision boundary is not linear.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Linear Discriminant
If class probability densities are unknown, data must be used for classification –
pattern classification
In pattern recognition, each point of feature space is assigned to one of K decision regions for each class
Decision boundaries can be defined in terms of a set of discriminant functions.
The discriminant function is:
d(y(t)) = w1y1(t)+w2y2(t)+w0
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Linear Discriminant
The discriminant function can be expressed as:
w0 1
d(y(t))=w⊤x(t) wherew= w1 andx(t)= y1(t) w2 y2(t)
The task is to then estimate the weights w based on the dataset Y1:T , for example, by minimising the classification error. This procedure is known as training the classifier.
The decision rule is:
If d(y(t)) > 0 then y(t) ∈ Φ0
If d(y(t)) ≤ 0 then y(t) ∈ Φ1 Advantages
A linear classification boundary does not overfit
No labelled historical data are retained once training is complete
Decision making incurs little computation cost (which is true for most training based approaches)
Disadvantages
Being a linear classification boundary, it is inflexible
Integrating new labelled measurements is difficult (as no historical data are retained)
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Nearest-Neighbour Method
Find in the n-dimensional feature space the closest object from the training set to an object being classified
As the neighbour is nearby, it is likely to be similar to the object being classified and so it is assigned to the same class as that object
1-Nearest Neighbour
Advantages
Easily implemented
Non-linear boundaries can be found
No need to train a classifier
Additional labelled measurements can be readily incorporated
Can be made more robust to the noisy data and outliers by using K nearest-neighbours
Disadvantages
Need to keep all historical data
Decision making incurs high computational cost
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems
Nearest-Neighbour Method
Historical Data (Labelled):
Y1:T ={[y ̃(t),Φi(t)]}t=1,2,…,T
A new measurement y is made and so the nearest neighbour of y(t) amongst the
dataset {y ̃(t)}t=1,…,T is determined using the distance:
d(t) = ∥y−y ̃(t)∥ for all t = 1,2,…,T
For the 2-dimensional case,
∥y−y ̃(t)∥= [y1−y ̃1(t)] +[y2−y ̃2(t)]
22 The nearest neighbour is given by index t∗ where:
d(t∗) < d(t) for t ̸= t∗, t = 1,2,...,T
Decision Rule:
y ∈ Φi (t ∗ ) where t ∗ = arg mint d (t )
The decision rule is
Extending the approach to multi-class classification is straight forward.
G. Konstantopoulos
ACS6124 Multisensor and Decision Systems