程序代写 —— SOLUTIONS ——

—— SOLUTIONS ——
1. a. Give a brief definition of each of the following terms:
i. Minimum Error Rate classifier ii. Minimum Loss classifier
i) Minimum Error Rate classifier: a classifier that minimises the number

Copyright By PowCoder代写 加微信 powcoder

of misclassified exemplars
ii) Minimum Loss classifier: a classifier that minimises the costs associated with misclassifying exemplars
Marking scheme
2 marks for each correct definition.
Tests knowledge of basic terminology used in pattern recognition.
b. Write down the equation for the Likelihood Ratio Test for minimum error, and explain how this equation is used to determine the class of an exemplar.
Likelihood Ratio Test:
p(x|ω1) > P(ω2) p(x|ω2) P (ω1)
Exemplar, x, is predicted to be in class ω1 if the above expression is true.
Marking scheme
Tests knowledge of Bayesian decision theory.
SEE NEXT PAGE

—— SOLUTIONS ——
Table 1, below, shows samples that have been recorded from two univariate class-conditional probability distributions:
samples of x
0.82 1.42 1.53 0.21 0.72 1.73 0.49 1.03 0.69 2.12 2.54 1.36 3.77 2.86 0.92 6.01
c. Apply kn nearest neighbours to the data shown in Table 1 in order to estimate the density of p(x|ω1) at location x = 1.5, using k = 3.
The closest three samples to x = 1.5 are at: 1.53, 1.42, and 1.73.
We need to make a “volume” of radius |1.5 − 1.73| = 0.23 (diameter 0.46) around x = 1.5 to encompass these three samples.
Therefore density is:
p(x = 1.5|ω1) = k/n = 3/10 = 0.652 V 0.46
Marking scheme
2 marks for correct method, 2 marks for correct application. Tests ability to perform density estimation.
SEE NEXT PAGE

—— SOLUTIONS ——
d. The class-conditional probability distribution for class 2 in Table 1, p(x|ω2), is believed to be a Normal distribution. Calculate the parameters of this Normal distribution using Maximum-Likelihood Estimation, use the “unbiased” estimate of the covariance.
Maximum-Likelihood estimate of mean is the arithmetic mean of the sample values. Therefore,
μˆ = (2.54+1.36+3.77+2.86+0.92+6.01) = 2.91
Maximum-Likelihood “unbiased” estimate of the covariance for univariate
+ (2.86 − 2.91)2 + (0.92 − 2.91)2 + (6.01 − 2.91)2] = 3.37
Therefore,
σˆ 2 = n − 1
σˆ 2 = 1[(2.54 − 2.91)2 + (1.36 − 2.91)2 + (3.77 − 2.91)2
Marking scheme
2 marks for correct methods, 2 marks for correct application. Tests ability to perform density estimation.
( x k − μˆ ) 2
SEE NEXT PAGE

Marking scheme
P(ω1) = 10 = 0.625 16
P(ω2) = 6 = 0.375 16
—— SOLUTIONS ——
e. Using the number of samples given in Table 1 estimate the prior probabilities of each class, i.e., calculate P(ω1) and P(ω2).
1 mark for each correct answer.
Tests ability to apply Bayesian decision theory.
SEE NEXT PAGE

—— SOLUTIONS ——
f. Using the Likelihood Ratio Test and the results from question 1.c, question 1.d, and question 1.e determine the class in which a new sample with value x = 1.5 should be placed.
From part 1.c, p(x = 1.5|ω1) = 0.652
3.37), so p(x = 1.5|ω2) =
From part 1.d, p(x|ω2) is given by N(2.91, √ 1 exp 􏰂− (1.5−2.91)2 􏰃 = 0.162
2π×3.37 2×3.37
Likelihoods ratio states, choose ω1 if: p(x|ω1) > P(ω2)
From part 1.e, P(ω2) = 0.375 = 0.6 P(ω1) 0.625
Therefore we should choose ω1 if: 4.02 > 0.6 This is true, so we should choose class 1.
Marking scheme
1 mark for correctly calculating p(x = 1.5|ω2), 2 marks for correct application of Likelihood Ratio Test.
Tests ability to apply Bayesian decision theory.
p(x|ω2 ) P (ω1 ) Hence, when x = 1.5, choose ω1 if: 0.652 = 4.02 > P(ω2)
0.162 P(ω1)
SEE NEXT PAGE

—— SOLUTIONS ——
g. Using the Likelihood Ratio Test with losses and the results from question 1.c and question 1.d determine in what class a new sample with value x = 1.5 should be placed, when P(ω1) = 0.4P(ω2) and the loss, λij, associated with choosing class i when the actual class is j, is as follows: λ11 =0,λ12 =2,λ22 =0,λ21 =1.
Likelihoods ratio with losses states, choose ω1 if: p(x|ω1) > 􏰂λ12−λ22 􏰃 P(ω2)
p(x|ω2 ) λ21 −λ11 P (ω1 ) Therefore, we should choose ω1 if: 4.02 > 􏰀2􏰁 × 2.5
2 marks for knowing the equation, 2 marks for correct application. Tests ability to apply Bayesian decision theory.
This is false, so we should choose class 2. Marking scheme
SEE NEXT PAGE

—— SOLUTIONS ——
2. a. In augmented feature space, a dichotomizer is defined using the following linear discriminant function g(x) = aT y where aT = (−2, 1, 2, 1, 2) and y = 􏰀1, xT 􏰁T . Determine the class of the following feature vectors, x1 = (1,1,1,1)T and x2 = (0,−1,0,1)T.
To classify a point x we calculate g(x), x is in class 1 if g(x) > 0.
Therefore class 1.
(−2 1 2 1 2)  1  = 4  1 
(−2 1 2 1 2)  −1  = −2
 0   1
Therefore class 2. Marking scheme
4 marks for correct method, 1 mark each for correct application. Tests ability to apply linear discriminant functions.
SEE NEXT PAGE

—— SOLUTIONS ——
b. One method of learning the parameters of a linear discriminant function is by using the perceptron learning algorithm. Write pseudo-code for the sequential perceptron learning algorithm.
Augment the feature vectors
set ωk = 1 for each sample in class 1, and ωk = −1 for each sample in class 2.
Initialise a
For each sample, yk, in the dataset:
Calculate g(x) = aT y
If yk is misclassified (i.e., if sign(g(x)) ̸= ωk Update solution such that: a ← a + ηωkyk
Repeat until a stops changing.
Augment and apply sample normalisation to the feature vectors. Initialise a
For each sample, yk, in the dataset:
Calculate g(x) = aT y
If yk is misclassified (i.e., if g(x) < 0) Update solution such that: a ← a + ηyk Repeat until a stops changing. Marking scheme Tests ability to describe a learning algorithm for defining linear discriminant functions. SEE NEXT PAGE —— SOLUTIONS —— xT class (5,1) 1 (5, −1) 1 (7,0) 1 (2, 1) −1 (1, −1) −1 c. Determine if the dataset shown in Table 2 is linearly separable. By plotting the data it can be seen that the dataset is linearly separable. Marking scheme Tests knowledge of basic terminology used in pattern recognition. SEE NEXT PAGE 0 1 2 3 4 5 6 7 8 aTold yT g(x) = aT y ω aTnew = aTold + ωyT if misclassified [−25,5,2] [1,5,1] 2 1 [−25,5,2] [−25, 5, 2] [1, 5, −1] −2 1 [−24, 10, 1] [−24,10,1] [1,7,0] 46 1 [−24,10,1] —— SOLUTIONS —— d. Apply the Sequential Perceptron Learning Algorithm to determine the parameters of a linear discriminant function that will correctly classify the data shown in Table 2. Assume an initial value of a = (w0, wT )T = (−25, 5, 2)T , and use a learning rate of 1. For the Sequential Perceptron Learning Algorithm, weights are updated such that: a ← a + ηωyk, where yk is a misclassified exemplar, and ω is the class label associated with yk. Here, η = 1. [−24,10,1] [1,3,0] 6 −1 [−25,7,1] [1,2,1] −10 −1 [−25, 7, 1] [1, 1, −1] −19 −1 [−25,7,1] [1,5,1] 11 1 [−25,7,1] [1,5,−1] 9 1 [−25,7,1] [1,7,0] 24 1 [−25,7,1] [−25,7,1] [−25, 7, 1] [−25,7,1] [−25, 7, 1] [−25,7,1] Learning has converged, so required parameters are a = (−25, 7, 1)T . Marking scheme Tests ability to apply a learning algorithm for defining linear discriminant functions. SEE NEXT PAGE —— SOLUTIONS —— e. A support vector machine (SVM) is to be used to classify the data shown in Table 2. i. Identify the support vectors by inspection. By inspection, the two classes can be separated by constructing a hyperplane in the region of 1 ≤ x1 ≤ 3. The support vectors are: 􏰄5􏰅 􏰄5􏰅 􏰄3􏰅 x1= 1,x2= −1,x4= 0. Marking scheme 3 marks, 1 mark for each. Tests understanding of Support Vector Machines. SEE NEXT PAGE —— SOLUTIONS —— May 2017 7CCSMPNN ii. Design a SVM classifier to classify all given points. [6 marks] Define the label for Class 1 as +1 and Class 2 as −1 so we have y1 =y2 =y3 =1andy4 =y5 =y6 =−1. The hyperplane is wTx+w0 = 0 where w = λ1y1x1 +λ2y2x2 + 􏰄5􏰅 􏰄5􏰅 􏰄3􏰅 λ4y4x4=λ1 1 +λ2 −1 −λ4 0 . 1mark yi(wT x + w0) = 1 when x is a support vector. Hence: 􏰇 􏰄5􏰅 􏰄5􏰅 􏰄3􏰅􏰈T􏰄5􏰅 Forx=x1, λ1 1 +λ2 −1 −λ4 0 1 +w0=1 ⇒ 26λ1 +24λ2 −15λ4 +w0 = 1. 􏰇 􏰄5􏰅 􏰄5􏰅 􏰄3􏰅􏰈T􏰄5􏰅 Forx=x2, λ1 1 +λ2 −1 −λ4 0 −1 +w0 =1 ⇒ 24λ1 +26λ2 −15λ4 +w0 = 1. 􏰇 􏰄5􏰅 􏰄5􏰅 􏰄3􏰅􏰈T􏰄3􏰅 Forx=x4, λ1 1 +λ2 −1 −λ4 0 0 +w0 =−1 ⇒15λ1+15λ2−9λ4+w0 =−1. 2marks Combining with the conditions 􏰕6i=1 λiyi = 0 ⇒ λ1 + λ2 − λ4 = 0, we have the following system of linear equations: 2624−151λ 1 1 24 26 −15 1λ2   1    15 15 −9 1λ =−1  4 Thisgivesλ1 =λ2 =0.25,λ4 =0.5andw0 =−4. SEE NEXT PAGE —— SOLUTIONS —— May 2017 7CCSMPNN 􏰄5􏰅 􏰄5􏰅 􏰄3􏰅 􏰄1􏰅 Asaresult,w=0.25 1 +0.25 −1 −0.5 0 = 0 ThehyperplaneiswTx+w0 =x1−4=0. 2marks Marking scheme Tests understanding of Support Vector Machines. SEE NEXT PAGE —— SOLUTIONS —— 3. a. Give brief definitions of the following terms: i. an artificial neural network ii. action potential iii. firing rate i) a parallel architecture composed of many simple processing elements interconnected to achieve certain collective computational capabilities ii) the signal outputted by a biological neuron iii) the number of action potentials emitted during a defined time-period. Marking scheme 2 marks for each correct definition. Tests knowledge of basic terminology used in neural networks. SEE NEXT PAGE —— SOLUTIONS —— Table 3, below, shows a linearly separable dataset. class (0,2) 1 (2,1) 1 (−3, 1) 0 (−2, −1) 0 (0, −1) 0 b. Is the dataset shown in Table 3: i. Univariate or multivariate? ii. Continuous or discrete? i) multivariate ii) discrete Marking scheme 1 mark for each correct answer. Tests knowledge of basic terminology used in pattern recognition. SEE NEXT PAGE —— SOLUTIONS —— c. Re-write the dataset shown in Table 3 using augmented vector notation. Using Augmented notation the dataset is: Marking scheme xT t (1, 0, 2) 1 (1, 2, 1) 1 (1, −3, 1) 0 (1,−2,−1) 0 (1, 0, −1) 0 Tests ability to apply a method used in neural networks. SEE NEXT PAGE Marking scheme —— SOLUTIONS —— d. A linear threshold neuron has a transfer function which is a linear weighted sum of its inputs and an activation function that applies a threshold (θ) and the Heaviside function. For a linear threshold neuron with parameter values θ = −2, w1 = 0.5 and w2 = 1 calculate the output of the neuron in response to each of the samples in the dataset given in Table 3. Using Augmented notation, output is y = H(wx) where w = [−θ, w1, w2], and x = [1,x1,x2]T. Here, w = [2,0.5,1]. Hence, response to each sample in dataset is: xT wx y=H(wx) (1,0,2) 4 1 (1,2,1) 4 1 (1, −3, 1) 1.5 1 (1, −2, −1) 0 1(or 0.5) (1,0,−1) 1 1 2 marks for correct method, 2 marks for correct application. Tests ability to apply a method used in neural networks. SEE NEXT PAGE —— SOLUTIONS —— e. Plot the data given in Table 3 and on the same axes plot the decision boundary defined by the linear threshold neuron with parameter values θ=−2,w1 =0.5andw2 =1. -1 0 1 2 3 4 Marking scheme Tests understanding of neural networks. SEE NEXT PAGE —— SOLUTIONS —— f. Apply the sequential delta learning algorithm to find the parameters of a linear threshold neuron that will correctly classify the data in Table 3. Assume initial values of θ = −2, w1 = 0.5 and w2 = 1, and a learning rate of 1. Initial weight values are w = [2, 0.5, 1] For the Sequential Delta Learning Algorithm, weights are updated such that: w←w+η(t−y)xT. Here,η=1. 0 [2, 0.5, 1] [1, 0, 2] 4 1 1 1 [2, 0.5, 1] [1, 2, 1] 4 1 1 w ← w + (t − y)xT [2, 0.5, 1] [2, 0.5, 1] [2,0.5,1]−[1,−3,1]=[1,3.5,0] [1, 3.5, 0] [1,3.5,0]−[1,0,−1]=[0,3.5,1] [0, 3.5, 1] [0, 3.5, 1] [0, 3.5, 1] [0, 3.5, 1] [0, 3.5, 1] 2 [2,0.5,1] [1,−3,1] 1.5 3 [1, 3.5, 0] [1, −2, −1] −6 4 [1,3.5,0] [1,0,−1] 1 1 0 0 0 1 0 5 [0, 3.5, 1] 6 [0, 3.5, 1] 7 [0, 3.5, 1] 8 [0, 3.5, 1] 9 [0, 3.5, 1] [1, 2, 1] [1, −3, 1] [1, −2, −1] [1, 0, −1] 8 1 1 −9.5 0 0 −8 0 0 −1 0 0 Learning has converged, so required parameters are w = (0, 3.5, 1). Marking scheme 2 marks for knowing update rule, 5 marks for correct method. Tests ability to apply a learning algorithm for defining a neural network. SEE NEXT PAGE —— SOLUTIONS —— 7CCSMPNN a. A diagram of 2-2-1 fully connected multilayer neural network is shown below, where “/” denotes a linear function, f(·) = sgn(·) denotes the  sign function, sgn(a) = 1 if a ≥ 0. −1 ifa<0 Input layer Hidden layer Output layer Show that this neural network is able to solve the XOR problem as defined in Table 4. = [x1, x2]T [−1, −1] [−1, +1] [+1, −1] [+1, +1] [10 marks] y1 = sgn(w11x1 + w12x2 + w10) = sgn(x1 + x2 + 0.5) SEE NEXT PAGE m12 = −0.4 w20 = −1.5 2 [+1,−1] 1 [+1,+1] y1 sgn(−1−1+0.5) = −1 sgn(−1+1+0.5) = +1 sgn(+1−1+0.5) = +1 sgn(+1+1+0.5) = +1 y2 sgn(−1−1−1.5) = −1 sgn(−1+1−1.5) = −1 sgn(+1−1−1.5) = −1 sgn(+1+1−1.5) = +1 —— SOLUTIONS —— y2 = sgn(w21x1 + w22x2 + w20) = sgn(x1 + x2 − 1.5) z1 = sgn(m11y1 + m12y2 + m10) = sgn(0.7y1 − 0.4y2 − 1) z1 sgn(0.7×−1−0.4×−1−1) = −1 sgn(0.7×+1−0.4×−1−1) = +1 sgn(0.7×+1−0.4×−1−1) = +1 sgn(0.7×+1−0.4×+1−1) = −1 2 marks It can be seen from the table that the output z1 is −1 for class 1 and the output z1 is +1 for class 2. The output −1/ + 1 clearly shows that the input samples of the XOR problem are classified correctly. Marking scheme Tests understanding of multilayer neural networks. SEE NEXT PAGE —— SOLUTIONS —— b. A radial basis function (RBF) neural network is to be used to correctly classify the XOR dataset shown in Table 4. i. There are two hidden units that both employ a Gaussian activation function with a standard deviation of σ = ρmax . The centres of j √2nH these hidden units are given by the first and the last input patterns (i.e., the two input patterns in class 1). Compute the outputs at the hidden units for all four input patterns and list them in a table. c1=[−1 −1] c2 = [1 1]. σ=σ=ρmax = √(−1−1)2+(−1−1)2 √2×2 p x1 x2 y1(x) y2(x) 1 -1 -1 1.0000 0.1353 2 -1 1 0.3679 0.3679 3 1 0 0.3679 0.3679 4 1 1 0.1353 1.0000 Marking scheme 2 marks for correctly calculating σ, 3 marks for correct method of calculating y1 and y2, 2 marks for correct application. Tests ability to apply multilayer neural networks. y (x ) = exp 􏰚−∥xp−c1∥2 􏰛 y (x ) = exp 􏰚−∥xp−c2∥2 􏰛. SEE NEXT PAGE —— SOLUTIONS —— ii. Show that the dataset when represented by the hidden units is linearly separable. By plotting y2(x) against y1(x) it can be seen that the feature vectors corresponding to the two classes can be separated by a straight line, and so are linearly separable. Marking scheme Tests knowledge of basic terminology used in pattern recognition. SEE NEXT PAGE —— SOLUTIONS —— iii. The RBF network has one output unit which employs a linear activation function, with a threshold. Write down the equation that would be used to determine the weights of this output neuron using the least squares method. Hence, using the answer you obtained for question 4.b.i write an expression in matrix form that could be used to calculate the weights of the output neuron (you do not need to calculate the pseudo-inverse). The output of the RBF neural network is: z1(x) = 􏰔w1jyj(x) = wy. Where w = [−θ,w1,w2] where θ is the threshold. We can calculate the output for all input patterns, in matrix notation, as: If the weights are correct, Z = wY = T w = TY† Where Y† is the pseudo-inverse of Y. Therefore, (w10 w11 OR w12) = (0 1 w12) = (−1  1 1 1 1 † 0)  1 0.3679 0.3679 0.1353  0.1353 0.3679 0.3679 1  1 1 1 1 † 1 −1)  1 0.3679 0.3679 0.1353  0.1353 0.3679 0.3679 SEE NEXT PAGE —— SOLUTIONS —— May 2017 7CCSMPNN Marking scheme 3 marks for correct equation, 2 marks for correct application. Tests ability to apply multilayer neural networks. SEE NEXT PAGE —— SOLUTIONS —— 5. a. Give a brief definition of each of the following terms. i. Feature Selection ii. Feature Extraction iii. Dimensionality Reduction i) Feature Selection: choosing a subset of measured features to use for classification. ii) Feature Extraction: projecting the original feature vectors into a new feature space. iii) Dimensionality Reduction: selecting/extracting feature vectors of lower dimensionality (length) than the original feature vectors. Marking scheme 2 marks for each correct definition. Tests knowledge of basic terminology used in pattern recognition. b. Write pseudo-code for applying Oja’s learning rule to find the first principal component of a multi-dimensional data set. Initialise w and subtract the mean from all data vectors. For each zero-mean sample, xk, in the dataset: Updatewsuchthat: w←w+ηy(xT −yw) Repeat until w converges or maximum number of iterations is reached. Marking scheme Tests ability to describe an algorithm for performing feature extraction. SEE NEXT PAGE —— SOLUTIONS —— c. Consider the following 2-dimensional dataset: 􏰇 1 􏰈 􏰇 3 􏰈 􏰇 5 􏰈 x1= 2 ,x2= 5 ,x3= 4 ,x4= i. Calculate the equivalent zero-mean dataset. Subtract the mean from all data vectors. The mean of the data is μ = 5 , hence, the zero-mean data x ′1 = − 3 , x ′2 = 0 , x ′3 = 􏰇 2.4 􏰈 − 1 , x ′4 = 2 , Tests ability to apply an algorithm for performing feature extraction. 􏰇 −4.6 􏰈 􏰇 5.4 􏰈 x′5= 2 . Marking scheme 􏰇 8 􏰈 􏰇 11 􏰈 7 ,x5= 7 . SEE NEXT PAGE —— SOLUTIONS —— ii. Apply two epochs of the batch version of Oja’s learning rule to the zero-mean data calculated in answer to question 5.c.i. Use a learning rate of 0.01 and an initial weight vector of [-1,0]. For the Batch Oja’s learning rule, weights are updated such that: w←w+η􏰕pyp(xTp −ypw). Here,η=0.01. Epoch 1, initial w = [−1, 0] xT (−4.6, −3) (−2.6, 0) (−0.6, −1) (2.4, 2) (5.4, 2) Epoch 2, initial w = [−1, −0.3] y=wx xT −yw ηy(xT −yw) (−1, −0.3) 4.6 (0, −3) 2.6 (0, 0) 0.6 (0, −1) −2.4 (0, 2) (0, −0.138) (0, 0) (0, −0.006) (0, −0.048) (0, −0.108) (0, −0.3) −5.4 (0, 2) total weight change= xT y=wx (−4.6, −3) 5.5 (−2.6, 0) 2.6 (−0.6, −1) 0.9 (2.4, 2) −3 (5.4, 2) −6 xT −yw (0.9, −1.35) (0, 0.78) (0.3, −0.73) (−0.6, 1.1) (−0.6, 0.2) ηy(xT −yw) (0.0495, −0.0743) (0, 0.0203) (0.0027, −0.0066) (0.018, −0.033) (0.036, −0.012) (0.1062, −0.1055) (−0.8938, −0.4055) total weight change= Marking scheme Tests ability to apply an algorithm for performing feature extraction. SEE NEXT PAGE —— SOLUTIONS —— iii. Using the answer obtained to question 5.c.ii, project the zero-mean data onto the first principal component. Projection of the data onto the subspace spanne 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com