CS计算机代考程序代写 algorithm flex 1.

1.
Tutorial Answers
(a) Abrupt fault: The fault signal takes constant values of zero and a fault level before and after the on-set of fault.
(b) Incipient fault: The fault signal changes gradually from zero, changing very slowly and continuously from the on-set of fault.
(c) Intermittent fault: The fault signal takes non-zero values over very short time periods but is repeated regularly.
(d) ∆ represents the level of change. i. y(t)=x(t)+∆+v(t)
ii. y(t) = x(t) + (1 + ∆)v(t)
(a) In order to determine if the system is normal or faulty based on the observation y(t), we can formulate hypothesis test such that:
H0: x(t) = θ0 i.e. the system is normal
H1: x(t) = θ1 i.e. the system is faulty
The change detection problem is to identify which hypothesis is true. We decide in favour of H0 if P(H0 |y(t)) > P(H1 |y(t))
The MAP decision rule is defined as:
P(H0 |y(t)) > 1 P (H1 | y(t))
If the above criteria is not satisfied, we decide in favour of H1. (b) Using Bayes’ law,
P (x(t) = θ|y(t)) = p(y(t)|x(t) = θ)P (x(t) = θ) p(y(t))
Applying this to the normal and change hypotheses, the MAP decision rule with P (H0 | y(t)) = p(x(t) = θ0|y(t)) and similarly for H1, becomes,
P (x(t) = θ0|y(t)) = p(y(t)|x(t) = θ0)P (x(t) = θ0) P (x(t) = θ1|y(t)) p(y(t)|x(t) = θ1)P (x(t) = θ1)
With τ = P (x(t)=θ0 ) being the prior probabilities ratio, and the likelihood expressed as, P (x(t)=θ1 )
2.
ACS6124 Multisensor and Decision Systems: I Multisensor Systems
1 p(y(t)|x(t) = θ) = √2πσ exp
the posterior probabilities ratio reduces to,
P(x(t) = θ0|y(t)) P(x(t)=θ1|y(t)) =τ
􏰋 1􏰃y(t)−θ􏰄2􏰌
−2 􏰋
σ
􏰇 􏰖 􏰗2􏰈 −1 y(t)−θ0
exp
exp−2 σ
􏰘􏰃
􏰄2
􏰄2􏰙􏰌
2 σ
􏰇 1 􏰖y(t)−θ1􏰗2􏰈 =τexp
1 −2
y(t) − θ0 σ
􏰃
y(t) − θ1 σ

1

If we take the natural logarithm of the above and apply it to the MAP decision rule,
􏰃θ1 −θ0􏰄􏰃 ∆􏰄
σ2 y(t)−θ0−2 0
3. (a)
The ROC curve is a curve of false alarm rate against detection rate for varying thresholds, and represents a performance curve of the fault detection system. The choice of the level of performance can vary from one application domain to another. For example, in target tracking applications, a constant false alarm rate may be desired. The threshold associated with this selection is obtained from the intersection of the ROC curve and the line with constant false alarm rate. An optimal performance can be deemed as one in which the false alarm rate and the mis-detection rate are equal. The associated threshold is obtained from the intersection of the ROC curve and the line connecting the points of (false alarm rate=0, detection rate=1) and (false alarm rate=1, detection rate=0).
When the system is operating under normal conditions, s(τ ) ∼ N (0, 1), a zero mean unit variance random variable. Hence, the mean value of g(t) is,
􏰋t􏰌t
E{g(t)} = E 􏰣s(τ) = 􏰣E{s(τ)} = 0
τ=1 τ=0 and the variance of g(t) given that it is zero mean is,
 2 􏰉t􏰊􏰋􏰉tt􏰊􏰌t t
E{g(t)2} = E 􏰣s(τ) = E 􏰣􏰣s(τ)s(ν) = 􏰣E{s(τ)2} = 􏰣1 = t  τ=1  τ=1ν=1 τ=1 τ=1
Note that E{s(τ)s(ν)} = 0 for all τ ̸= ν.
(b) Since the system normal operational value is θ0 = 1250 and the variance σ02 = 4, the
residual signal is s(t) = y(t)−θ0 . The one-sided positive CUSUM test can be found at ”Part σ0
I – Lecture 4: Sensor Signal Detection”. Applying the one-sided CUSUM test to the data,
The change onset time is t = 7.
4. (a) Intuitively: We are interested in determining an unknown parameter x, the speed, that can take values in R. The parameter is observed through a random process, the alteration induced by the random variations, which results in the observation Y modelled as random variable.ThedistributionoftheresultingobservationPY isafunctionofx,i.e.,Px.Fora given observation y, the estimate is produced as xˆ(y).
t
1
2
3
4
5
6
7
8
y(t)
1252
1246
1253
1248
1246
1262
1266
1264
s(t)
1
-2
1.5
-1
-2
6
8
7
g(t)
1
0(-1)
1.5
0.5
0(-1.5)
6
0(14)
7
2

Formal Definition: Consider a random observation Y modelling the noisy speed mea- surement taking values in R with a family of distributions {Px;x ∈ R}. The aim of the estimator is to find a function xˆ : R → R which satisfies some performance criteria.
(b) WewanttodetermineestimatorXˆthatminimizestheMSE=E[(X−Xˆ)2].ThejointPDF is decomposed using Bayes’ Theorem
and therefore
PX,Y (x, y) = PX|Y (x|y)PY (y), (1) ˆ􏰍􏰃􏰍2 􏰄
MSE(X) = (x − xˆ) PX|Y (x|y)dx PY (y)dy. (2) Since PY (y) > 0 minimizing the MSE is equivalent to minimize the term inside parenthe-
ses. The miminization yields
∂􏰍2ˆ
∂xˆ (x−xˆ) PX|Y(x|y)dx=0 =⇒ X=E[X|Y], (3)
and therefore the minimizer is the mean of the posterior PDF.
(c) The problem boils down to determining the scaling factor α. Notice that the orthogonality principle implies
E[(X − Xˆ )Y ] = 0. (4) Operating the left term in (4) we get
which yields
E[(X −Xˆ)Y] = E[(X −αY)Y] (5) = E[XY]−E[αY2] (6) = E[XY ] − αE[Y 2], (7)
α = E[XY ]. (8) E[Y2]
5.
Computing the covariance
E[XY]=E[X(X+Z)]=E[X2]=σX2 , (9)
and variance term
E[Y2]=E[(X+Z)(X+Z)]=E[X2]+E[Z2]=σX2 +σ2, (10)
and substituting in (8) results in
σX2
α=σX2 +σ2. (11)
(a) Normal: A system’s health is said to be in normal state, if its system characteristics are consistent with known and designed operating conditions.
(b) Anomaly: A system’s health is said to be in anomalous state, if there is a change of atleast one system parameter or system characteristic from its known normal or designed operat- ing conditions.
(c) Fault: A system’s health is said to be faulty if there is an unacceptable change of atleast one system parameter or system characteristic from its known normal or designed operating conditions.
(d) Malfunction: A system’s health is said to be malfunctioning, if there is an intermittent irregularity to satisfying the required function under specified operating conditions.
(e) Failure: A system’s health is said to have failed, if there is a permanent interruption to the system performing the required function under specified operating conditions.
3

6.
• The features should have invariant characteristics when the patient state does not change. In addition, the features should also have significant discriminatory pattern when the pa- tient is in different health states. Finally, the features should also result in data compres- sion that eliminates redundancy and correlation in the raw signal data.
• EEG signals have discriminatory information in the different frequency bands of the EEG signals. Hence a potential set of features are obtained from constructing the power spec- tral density of the signals and then aggregating these into energies in different frequency bands. Alternatively, the relationship of an auto-regressive (AR) model to power spectral patterns suggests that an AR model can be fitted to the EEG time series signals and the model parameters be used as the set of features.
(a) The PCA involves the following steps:
• Compute mean and covariance of each of the measured variable yi.
• Data normalisation to remove scale effects by
y ̃=yi−y ̄i P=Y ̃ Y ̃⊤ iσ 1:T1:T
i
• Eigen decomposition of P to determine ordered eigen values λj and eigen vectors uj .
• Choose number of components using eigen spectrum, scree test or percentage varia-
7.
tion.
• Choose transformation matrix
 u ⊤1 
 u⊤ 
2 T= .  
u⊤n (b) The eigen values of the matrix C is obtained as,
|λI − C| = 0
which gives solutions of λ1 = 2, λ2 = 6, λ3 = 12. The eigen vectors are calculated as
Cu = λu.
The percentage of variation captured ratio using the largest eigenvalues can be calculated
􏰧2i=1 λi 6 + 3
Ratio= 􏰧3 λ = 6+3+1 =0.9 (90%)
i=1 i
For the chosen eigenvalues the corresponding eigenvectors are
1 1 u1=K−2, u2=K1,
11
for any integer number K.
(c) The feature vectors are obtained as,
􏰃 u ⊤1 􏰄 z= u⊤ y
2
Applying this to the measurements the feature vectors, for K = 1, are 􏰃 1.2044 􏰄 􏰃 4.7499 􏰄 􏰃 1.7321 􏰄
􏰃 −3.0676 􏰄 1.1751 .
z(1) = 3.3257 , z(2) = −1.6140 , z(3) = 1.7321 , z(4) = 4

8.
(a) Novelty Detection: Determination of any deviation from known “normal” or designed operating conditions.
(b) Fault Detection: Determination of the presence of system faults and the time of on-set of faults.
(c) Fault Diagnosis: Determination of fault type, size, location and on-set time. (d) Prognosis: Prediction of probable course of an emerging fault to failure state.
9. (a)
A novelty detection method that can be applied when the mean and standard deviation of the normal population is known, is the discordancy test. In the one dimensional case, the discordancy test for novelty is given by,
y ∈ Φ0 if d = |y − y ̄| < τ σ where y ̄ is the mean of the data from system under normal conditions, σ is the standard deviation of the data from system under normal conditions and Φ0 is indicative of the system in normal state. τ is a threshold that is appropriately chosen, for example from the 95% confidence intervals of a normal distribution. For multiple sensors giving rise to n dimensional data y, the discordancy test becomes the above applied separately to each dimension. The decision rule becomes, y ∈ Φ0 if for all i di = |yi − y ̄i| < τ σi At 95% confidence level, the decision boundary is two sided with ± 1.96, and values ex- ceeding these limits are deemed an anomalous event. 0 – Normal state, 1 – Anomalous state Thediscordancytestwillhaveadecisionboundarythatisarectangleinthe2-Dspaceand therefore can include data points in its corner that have much lower likelihood to have come from a system operating normally. We would therefore expect to miss detection of novel events. They also lack the flexibility of arbitrary decision boundaries. However, they are easier to implement as they only need to check individual sensor signals exceeding upper and lower limits. Mahalanobis distance based novelty detection method decision rule is given by, y(t) ∈ Φ0 if D(y) < τ where τ is the threshold. Maximum likelihood novelty detection is given by, If y ∈ Φ0 then p(y) > η (the threshold).
If the data are distributed as a Gaussian with mean y ̄ and covariance S,
(b)
(c)
10. (a)
(b)
k
1
2
3
4
5
6
7
8
r(k)
1.1
-0.3
-2.0
2.1
-0.9
1.8
0.9
2.4
d(k)
0
0
1
1
0
0
0
1
p(y)= =
1 􏰇1 ⊤−1 􏰈 n 1 exp −2[y−y ̄] S [y−y ̄]
(2π)2 |S|2 1􏰇1􏰈
n 1 exp −2D(y) (2π)2 |S|2
Substitute into the maximum likelihood test p(y) > η and then take the ln{·} −n ln(2π) − 1 ln |S| − 1D(y) > ln(η)
222
−D(y)>2 2ln(2π)+2ln|S|+ln(η) =τ
Therefore, the Maximum Likelihood Novelty Detection is equivalent to the Mahalanobis Distance Discordancy Test.
􏰇n 1 􏰈
5

(c)
11. (a)
(b)
If the normal class data are not distributed according to Gaussian, then the probability of miss detection of novel events will be greater. Maximum likelihood novelty detection ap- proach hass the ability to form arbitrary boundaries when the distribution is constructed using nonparametric density estimation techniques and therefore can produce tighter re- gions of normality than would be possible with discordancy test or their equivalents. Two nonparametric density estimation methods are the histogram method and the kernel den- sity estimation method. The former will result in non-smooth decision boundaries where as the latter will produce smoother and robust decision boundaries.
The assumptions that need to be made are that there are no regions in the feature or data space in which the fault classes overlap and that collectively, the fault classes represent all the possible classes to which the features or data belong. Mathematically, with fault classes represented as Φi, this can be expressed as,
Φ0 ∩ Φ1 ∩ Φ2 . . . = ∅ Φ0 ∪ Φ1 ∪ Φ2 . . . = U
Nearest neighbour method utilises historical dataset with features and their correspond- ing class labels to classify data that contain only the features and unknown class. The classification is performed by seeking the nearest neighbour to the new feature or data y with the historical data D1:T = {(y ̃(t), Φ(t))|t = 1, · · · , T } and chooses the class that its nearest neighbour belongs to. Φ(t) ∈ {Φi|i = 1, · · · , c} indicates the class label, amongst c classes, of the features y ̃(t). Mathematically, the decision rule is given by,
y ∈ Φi∗, where Φ(t∗) = Φi∗ and t∗ = argmind(t) t
with the nearest neighbour distance for an n dimensional data is defined as the Euclidean distance,
d(t)=∥y−y ̃(t)∥, with ∥y−y(t)∥2 =(y1 −y ̃1(t))2 +···+(yn −y ̃n(t))2
• Nearest neighbour methods construct nonlinear decision boundaries whereas linear discriminant method constructs linear boundaries. Nonlinear decision boundaries are more powerful in solving difficult classification problems.
• Nearest neighbour method use all historical data to make decisions and hence require large data storage, whereas linear discriminant method parametrises the boundary and so no historical data need to be stored.
• Nearest neighbour methods can incorporate new data readily in the decision making process, whereas linear discriminant method requires re-training with all previous data to accommodate new data in decision making.
• Nearestneighbourmethhodsdonothaveatrainingphaseandsotheclassifierisbuilt very quickly whereas linear discriminant method needs a training process that can be slow in the classifier construction phase.
• Run-time computation is slow in nearest neighbour methods because it needs to com- pute distances with all historical data whereas the linear discriminant method is very fast requiring only the computation of the discriminant function.
• Combine all observation in a single compound group sensor model • The problem formulation is equivalent to a vector Kalman filter
• Combining all observations yields:
12. (a)
(c)
y ̃ n = H ̃ n x n + z ̃ n – Group observations: y ̃n = 􏰎yT yT . . . yT 􏰏T
where
1,n 2,n S,n 6

– Group observation matrix: H ̃i,n = 􏰎HT HT , . . . , HT 􏰏T
1,n 2,n – Group observation noise: z ̃n = 􏰔z1T z2T . . . zST 􏰕T
S,n
(b) •
• In track-to-track fusion algorithms, local sensor sites generate local track estimates
using a local Kalman filter
• Local tracks are communicated to a Track Fusion Center that combines them to gen-
erate a Global Track Estimate
• When global estimate is communicated back to local sensor sites: feedbak configura-
Track Fusion Center combines the estimates from different local sensor sites
tion
where
S
xˆ = n|n
Pn|n =
P 􏰣 P −1 xˆ (Global Estimate)
– xˆi,n|n is the local estimate at local sensor i
n|n
􏰉S 􏰊−1
i=1 􏰣 P −1
(Global MSE)
i,n|n i=1
i,n|n i,n|n
– P −1 is the local minimum MSE at local sensor i i,n|n
• This methods ignores the correlation between different estimates
• Weights can be tuned to consider the correlation between estimates (c) Advantages and Limitations of Group Sensor Method:
• The computational complexity of computing the Kalman Gain Matrix, the Correction, and the Minimum MSE grows with the number of sensors S.
• Straightforward approach when the number of sensors is small but problematic in higher dimensions.
Advantages and Limitations of Track to Track Fusion:
• Local estimates are available at local sensors
• Reduced communication requirements: local estimate contains less information than
raw data from observations
• Robustness to sensor failure
• Complexity of the system is increased
• Scalability: Extension to large networks is feasible
• Computational burden can be balanced across the network
7