—— SOLUTIONS ——
1. Question one is compulsory
a. What is Pattern Recognition?
Syllabus: Introduction of Pattern Recognition
Copyright By PowCoder代写 加微信 powcoder
The assignment of a physical object or event to one of several pre-specified categories. 3 marks
b. Draw a block diagram of the key components of a pattern recognition system.
Syllabus: Introduction of Pattern Recognition
Marking scheme
1 mark for each block.
Pre-processing
SEE NEXT PAGE
Feature Extraction
Domain Knowledge
Classification
– Examples from the same class should have similar
feature values
—— SOLUTIONS ——
c. Sketch a diagram to show a two-dimensional two-class dataset of the following cases. Use cross to indicate class 1 and circle to indicate class 2.
– Examples from different classes have different feature values
i. Linearly separable
“Good” features
“Bad” features
Feature Space Properties
ii. Nonlinearly separable iii. Multi-modal
Syllabus: Introduction of Pattern Recognition
Linearly separable Non-linear separability Multi-modal
Marking scheme
3 marks for each diagram. In each diagram, the samples are shown by two symbols and the hyperplane should be shown.
SEE NEXT PAGE
—— SOLUTIONS ——
d. Given the Minimum Error Rate Classifier for class ωi with the discriminant
P(ωi) 1 T−1 gi(x)= exp − (x−μi)Σi (x−μi),
d/2 (2π) |Σi| 2
determine if the following discriminant functions can be used to achieve
minimum error rate classification. Explain your answer. i. gi(x) + 5
ii. 25 × gi(x) 3
iii. sin(gi(x))
iv. −1(x−μi)TΣ−1(x−μi)− dln(2π)− 1ln(|Σi|)+ln(P(ωi))
where x denotes a sample, ωi denotes class i, P (ωi) is the prior probability,
μi is the mean vector, Σi is the covariance matrix, sin is the sinusoidal function and ln is the natural logarithm function.
Syllabus: Bayesian Decision Theory and Linear Discriminant Functions
i Yes, adding a constant to discriminant functions does not change the nature of the classifier. 2 marks
ii Yes, multiplying a positive constant to discriminant functions does not change the nature of the classifier. 2 marks
iii No, applying a non-monotonically increasing function to discriminant
functions will change the nature of the classifier. sin is a non-monotonically
increasing function. 2 marks iv Yes, ln(gi(x)) = −1(x−μi)TΣ−1(x−μi)−d ln(2π)−1 ln(|Σi|)+
ln(P (ωi)). Applying a monotonically increasing function to discriminant
functions does not change the nature of the classifier. ln is a monotonically increasing function. 2 marks
SEE NEXT PAGE
—— SOLUTIONS ——
May 2016 7CCSMPNN
Marking scheme
1 mark for yes/no answer and 1 mark for the explanation for each function.
SEE NEXT PAGE
—— SOLUTIONS ——
2. a.Drawablockdiagramofamulticlassdiscriminantfunctionclassifierand
explain how it works for classification. Answer
Syllabus: Linear Discriminant Functions g1(x)
[10 marks]
6 marks The multiclass discriminant function classifier evaluates the feature vector
x by the discriminant functions g1(x), g2(x), · · · , gc(x). The max operator picks the maximum output and assigns the feature vector to class ωi if gj(x) > gi(x) ∀i ̸= j. 4 marks
Marking scheme
1 mark for each block; 1 mark for “x”; 1 mark for “output”.
b. Describe the Sequential Multiclass Perceptron Learning Algorithm using pseudocode.
SEE NEXT PAGE
—— SOLUTIONS ——
May 2016 7CCSMPNN
Algorithm 1: Sequential Multiclass Perceptron Learning Algorithm
Initialisation: Define c classes; initialise a1 to ac to arbitrary solution and learning rate η
while not converge do foreach exemplar (y, ω) do
ω′ = argmax gh(x) h
if ω ̸= ω′ then aω ← aω + ηy aω′ ←aω′ −ηy
Marking scheme
1 mark for each line from lines 1 to 7.
c. Consider the training dataset shown in Table 1.
Feature vector
[−2, 6] [−1, −4] [3, −1] [−3, −2] [−4, −5]
1: Training dataset.
a new feature vector x = [−2,0] with the
Determine the class of
k-nearest-neighbour classifier using Euclidean distance and the following values of k:
SEE NEXT PAGE
—— SOLUTIONS ——
i. k = 1 ii. k = 3 iii. k = 5
Syllabus: Bayesian Decision Theory and Density Estimation
Feature vector
[−2, 6] [−1, −4] [3, −1] [−3, −2] [−4, −5]
New feature vector
[−2,0] [−2, 0] [−2, 0] [−2, 0] [−2, 0]
Euclidean distance
(−2−(−2))2 +(6−0)2 =6 (−1 − (−2))2 + (−4 − 0)2 = 4.1231 (3 − (−2))2 + (−1 − 0)2 = 5.0990 (3 − (−2))2 + (−2 − 0)2 = 2.2361 (−4 − (−2))2 + (−5 − 0)2 = 5.3852
When k = 1, the new feature vector [−2, 0] is closer to point 4 ([−3, −2]). So, it is classified as class 2. 1 mark
When k = 3, the new feature vector [−2, 0] is closer to points 2, 3 and 5 ([−1, −4], [3, −1] and [−3, −2]). As 2 out of 3 points are of class 1, the new feature vector is classified as class 1. 1 mark
When k = 5, all 5 points are covered. As 3 out of 5 points are of class 1, the new feature vector is classified as class 1. 1 mark
Marking scheme
1 mark for each correct Euclidean distance in the table; 1 mark for each correct classification.
SEE NEXT PAGE
—— SOLUTIONS ——
3. a. Name three feature extraction techniques
Syllabus: Feature Extraction
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Independent Component Analysis (ICA)
Marking scheme
1 mark for each correct answer.
SEE NEXT PAGE
—— SOLUTIONS ——
b. Consider the dataset in Table 2.
Class Feature vector xT 1 [1,2]
2 [6,5] 2 [7,8]
Table 2: Training dataset.
i. Using Linear Discriminant Analysis (LDA), given a projection vector
wT = [1, 2], find the projection of the feature vectors. Is the dataset
in the projection space linearly separable? Explain your answer.
Syllabus: Feature Extraction
Feature vector xT
[1,2]×[1,2]T =5 [2,1]×[1,2]T =4 [3,3]×[1,2]T =9 [6,5]×[1,2]T =16 [7,8]×[1,2]T =23
1 [1,2] 1 [2,1]
5 marks It can be seen that the projected dataset is obviously separable into two separate clusters in the projection space, i.e., 5, 4 and 9 for class 1; 16 and 23 for class 2. So the projected dataset in the projection
space is linearly separable. 3 marks Marking scheme
1 mark for each correct answer in the table.
SEE NEXT PAGE
—— SOLUTIONS ——
ii. Using Fisher’s method, determine which of the following projection weights is more effective in the context of Linear Discriminant Analysis (LDA). Explain your answer.
•wT =[−1,5] •wT =[2,−6]
Syllabus: Feature Extraction
Sample mean for class 1 is
mT1 =13([1,2]+[2,1]+[3,3])=[2,2].
Sample mean for class 2 is
mT2 =21([6,5]+[7,8])=[6.5,6.5].
[14 marks]
ForwT =[−1,5]:
Between class scatter (sb) = |wT (m1 − m2)|2 = |[−1, 5] × ([2, 2] − [6.5, 6.5])T |2 = 324 2 marks
Within class scatter (sw) = (wT (x − m1))2 + (wT (x − x∈ω1 x∈ω2
m2))2 =([−1,5]×([1,2]−[2,2])T)2+([−1,5]×([2,1]−[2,2])T)2+ ([−1,5]×([3,3]−[2,2])T)2 +([−1,5]×([6,5]−[6.5,6.5])T)2 + ([−1, 5] × ([7, 8] − [6.5, 6.5])T )2 = 140
2 marks Cost J(w) = sb = 324 = 2.3143 1 mark
ForwT =[2,−6]:
Between class scatter (sb) = |wT (m1 − m2)|2 = |[2, −6] × ([2, 2] − [6.5, 6.5])T |2 = 324 2 marks
Within class scatter (sw) = (wT (x − m1))2 + (wT (x − x∈ω1 x∈ω2
SEE NEXT PAGE
—— SOLUTIONS ——
m2))2 =([2,−6]×([1,2]−[2,2])T)2+([2,−6]×([2,1]−[2,2])T)2+ ([2,−6]×([3,3]−[2,2])T)2 +([2,−6]×([6,5]−[6.5,6.5])T)2 + ([2, −6] × ([7, 8] − [6.5, 6.5])T )2 = 184
2 marks Cost J(w) = sb = 324 = 1.7609 1 mark
As J(w) given by wT = [−1,5] is higher, it is a more effective
projection weight.
SEE NEXT PAGE
—— SOLUTIONS ——
4. A diagram of 3-layer partially connected neural network is shown in Figure 1.
Input layer x1 /
Hidden layer Output layer
Figure 1: A diagram of 3-layer partially connected neural network.
a. Given that the output of the neural network in Figure 1 can be expressed as
z =fW fW x+W +W 1 kj ji j0 k0
2 marks 2 marks
1 and f(·) is any activation function, determine the
matrices Wji, Wkj, Wj0 and Wk0.
Wji= 0.5−3 0
Wkj = 6 7 Wk0 = −8
Syllabus: Multilayer Neural Networks 1 0
SEE NEXT PAGE
—— SOLUTIONS ——
b. A classifier is designed as y = sgn(z1) where sgn(z1) = 1 if z1 ≥ 0,
−1 ifz1<0 y = 1 represents class 1 and y = −1 represents class 2. Considering a
sample x = −6 and the activation function f(·) as a logarithmic
sigmoid function, i.e., f(n) = 1 , determine which class the
sample belongs to. Answer
Syllabus: Multilayer Neural Networks 2
Sample: x = −6 y
=logsigW x+W ji j0
1 02 0 =logsig 0.5 −3 −6 + −2
0.8808 = 1.0000 ,
where logsig(n) = 1 . 1+e−n
z =logsigW y+W 1 kj j0
0.8808 =logsig 6 7 1.0000 −8
SEE NEXT PAGE
—— SOLUTIONS ——
2 marks Classifier output: y = sgn(z1) = sgn(0.9864) = 1 which indicates that
the sample x = −6 belongs to class 1. 1 mark
SEE NEXT PAGE
—— SOLUTIONS ——
c. Consider the activation function in all the input, hidden and output units
as a linear function. The neural network is trained with the stochastic
backpropagation algorithm using the cost J = 21∥z1 − t∥2 where ∥ · ∥
denotes the l2 (or Euclidean) norm operator. Given the only training 1
sample x = 5 and its target output t = 0.8, determine the updated value of w20 at the next iteration using the learning rate η = 0.01.
Syllabus: Multilayer Neural Networks The update of w20 is:
∆w20 = −η ∂J ∂w20
[12 marks]
∂J ∂f(net)∂net∂y2
∂f(net) ∂net ∂y2
∂a , ∂a ∂w20
f(·) is a linear function,
J = 21 ∥ z 1 − t ∥ 2 ,
z1 =f(net)=net=m11y1 +m12y2 +m10 =6y1 +7y2 −8, y1 = w11x1 = x1,
y2 =f(a)=a=w21x1 +w22x2 +w20 =0.5x1 −3x2 −2.
So, ∂J = 1∂∥z1−t∥2
1 mark 1 mark 1 mark 1 mark
∂f(net) ∂f(net) = 1,
∂net = m12,
Thus, we have ∆w20 = −η(z1 − t)m12.
SEE NEXT PAGE
—— SOLUTIONS ——
Consideringx= 5 ,wehavey1 =x1 =1andy2 =0.5x1−3x2−
2 = 0.5×1−3×5−2 = −16.5, and z1 = 6×1+7×(−16.5)−8 = −117.5. 1 mark
Considering t = 0.8, we have ∆w20 = −η(z1 − t)m12 = 0.01 × (−117.5 − 0.8) × 7 = 8.2810. 1 mark
Applying the update rule: ∆w20 ← w20 + ∆w20 = −2 + 8.2810 =
SEE NEXT PAGE
—— SOLUTIONS ——
5. a. What are Linguistic Hedges in the context of fuzzy set?
Syllabus: Fuzzy Inference System
Linguistic Hedges, like adverbs or objectives, modify the fuzzy set. The definition of the hedge functions are arbitrary, just to name a few, very, somewhat, more or less, slightly. 4 marks
b. Name three categories of Linguistic Hedges in the context of fuzzy set? Briefly describe them.
Syllabus: Fuzzy Inference System
Three categories of hedges, concentrations, dilations and Intensifications, which change the shape of membership functions. 3 marks
• concentrations: It concentrates a fuzzy set by reducing the membership degree of all element. 2 marks
• dilations: It stretches or dilates a fuzzy set by increasing the membership degree of all element. (Sometimes referred to as dilutions.) 2 marks
• Intensifications: It is a combination of concentration and dilation. It increases (decreases) the membership degree of those elements in the fuzzy set with original membership values greater than 0.5 (less
than 0.5).
SEE NEXT PAGE
—— SOLUTIONS ——
c. Consider a 2-input single-output Sugeno fuzzy inference system with the following 4 rules:
Rule 1: IF x is Small and y is Small THEN z is −x+y+1 Rule2:IFxisSmallandyisLargeTHENzis −y+3 Rule3:IFxisLargeandyisSmallTHENzis −x+3
Rule 4: IF x is Large and y is Large THEN z is x + y + 2
where the membership functions are define as μx
1 , 1 + e−x
(x) = 1 , μy
and μy Large
1. 1 + e−y+5
Given that the fuzzy “AND” operation is the “product” operation, determine the output of the Sugeno fuzzy inference system for x = 1 and y = 8.
Syllabus: Fuzzy Inference System For x = 1 and y = 8,
[12 marks]
1 mark 1 mark 1 mark 1 mark
“product” operation, we have
1 mark 1 mark 1 mark
1 = 0.2689, 1+e−1
Performing fuzzy AND operation using w1 = μxSmall (1) × μySmall (8) = 0.0128, w2 = μxSmall (1) × μyLarge (8) = 0.2562, w3 = μxLarge (1) × μySmall (8) = 0.0347, w4 = μxLarge (1) × μyLarge (8) = 0.6964,
Individual output of each rule:
z1 =−x+y+1=−1+8+1=8, z2 =−y+3=−8+3=−5,
z3 =−x+3=−1+3=2,
xSmall xLarge ySmall yLarge
1 = 0.0474,
(8) = 1 1+e−8+5
SEE NEXT PAGE
—— SOLUTIONS ——
May 2016 7CCSMPNN
z4 =x+y+2=1+8+2=11.
Performing defuzzification, the inferred output is:
z = w1z1+w2z2+w3z3+w4z4 = 0.0128×8+0.2562×−5+0.0347×2+0.6964×11 = 6.5507
w1 +w2 +w3 +w2
0.0128+0.2562+0.0347+0.6964
SEE NEXT PAGE
—— SOLUTIONS ——
6. A support vector machine (SVM) classifier is employed to classify the following
1 Class1:x1= 2 .
7 10 Class2:x2= 8 ,x3= 15 .
a. Identify the support vectors by inspection. Answer
Syllabus: Support Vector Machines
00 2 4 6 8 10 12 14 16
The samples are linearly separable as a linear hyperplane can be placed
3 marks in between x1 and x2 to separate the samples into two classes. So,
x1 = 2 and x2 = 8 are the support vectors. 4 marks
SEE NEXT PAGE
—— SOLUTIONS ——
b. Design an SVM classifier to correctly classify all given samples.
[15 marks]
Syllabus: Support Vector Machines DefinethelabelforClass1as+1andClass2as−1sowehavey1 =+1 forx1 andy2 =y3 =−1forx2 andx3. 3marks
The hyperplane is wT x + w0 = 0. As x1 and x2 are support vectors,
7 T 1 Forx=x1, λ1 2 −λ2 8 2
1 ⇒ 5λ1 − 23λ2 + w0 = 1.
1 Forx=x2, λ1 2 −λ2
7 T 7 8 8
⇒23λ1−113λ2+w0 =−1.
As3i=1λiyi =0,wehaveλ1−λ2 =0⇒λ1 =λ2. So,wehave 5λ1 −23λ2 +w0 = −18λ1 +w0 = 1 and 23λ1 −113λ2 +w0 = −90λ1 + w0 = −1.
−181λ 1
Itfollowsthat −90 1 w1 = −1 . 2marks
Itgivesλ1 =λ2 =0.0278andw0 =1.5. Asaresult,w=0.0278 2 −
1 0.0278 8 = −0.1668 2 marks
7 −0.1668
SEE NEXT PAGE
1 7 wehavew=λ1y1x1+λ2y2x2 =λ1 2 −λ2 8 .
Recall that yi(wT x + w0) = 1 when x is a support vector.
—— SOLUTIONS ——
May 2016 7CCSMPNN
The hyperplane is
wTx+w0 =−0.1668x1 −0.1668x2 +1.5=0.
c. What is the margin given by the hyperplane obtained in Question 6.b?
Syllabus: Support Vector Machines
The hyperplane is −0.1668x1 − 0.1668x2 + 1.5 = 0, which gives the
margin 2 =√ 2 =8.4785. 3marks. ∥w∥ (−0.1668)2+(−0.1668)2
FINAL PAGE
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com