May 2017 7CCSMPNN
1. a. Give a brief definition of each of the following terms: i. Minimum Error Rate classifier
ii. Minimum Loss classifier
b. Write down the equation for the Likelihood Ratio Test for minimum error, and explain how this equation is used to determine the class of an exemplar.
Copyright By PowCoder代写 加微信 powcoder
Table 1, below, shows samples that have been recorded from two univariate class-conditional probability distributions:
class samples of x
ω1 0.82 1.42 1.53 0.21 0.72 1.73 0.49 1.03 0.69 2.12
ω2 2.54 1.36 3.77 2.86 0.92 6.01
c. Apply kn nearest neighbours to the data shown in Table 1 in order to estimate the density of p(x|ω1) at location x = 1.5, using k = 3.
d. The class-conditional probability distribution for class 2 in Table 1, p(x|ω2), is believed to be a Normal distribution. Calculate the parameters of this Normal distribution using Maximum-Likelihood Estimation, use the “unbiased” estimate of the covariance.
QUESTION 1 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2017 7CCSMPNN
e. Using the number of samples given in Table 1 estimate the prior probabilities of each class, i.e., calculate P(ω1) and P(ω2).
f. Using the Likelihood Ratio Test and the results from question 1.c, question 1.d, and question 1.e determine the class in which a new sample with value x = 1.5 should be placed.
g. Using the Likelihood Ratio Test with losses and the results from question 1.c and question 1.d determine in what class a new sample with value x = 1.5 should be placed, when P(ω1) = 0.4P(ω2) and the loss, λij, associated with choosing class i when the actual class is j, is as follows: λ11 =0,λ12 =2,λ22 =0,λ21 =1.
SEE NEXT PAGE
May 2017 7CCSMPNN
a. In augmented feature space, a dichotomizer is defined using the following linear discriminant function g(x) = aT y where aT = (−2, 1, 2, 1, 2) and y = 1, xT T . Determine the class of the following feature vectors, x1 = (1,1,1,1)T and x2 = (0,−1,0,1)T.
b. One method of learning the parameters of a linear discriminant function is by using the perceptron learning algorithm. Write pseudo-code for the sequential perceptron learning algorithm.
xT class (5,1) 1 (5, −1) 1 (7,0) 1
(2, 1) −1 (1, −1) −1
c. Determine if the dataset shown in Table 2 is linearly separable.
QUESTION 2 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2017 7CCSMPNN
d. Apply the Sequential Perceptron Learning Algorithm to determine the parameters of a linear discriminant function that will correctly classify the data shown in Table 2. Assume an initial value of a = (w0, wT )T = (−25, 5, 2)T , and use a learning rate of 1.
e. A support vector machine (SVM) is to be used to classify the data shown in Table 2.
i. Identify the support vectors by inspection.
ii. Design a SVM classifier to classify all given points.
[3 marks] [6 marks]
SEE NEXT PAGE
3. a. Give brief definitions of the following terms: i. an artificial neural network
ii. action potential iii. firing rate
Table 3, below, shows a linearly separable dataset.
class (0,2) 1 (2,1) 1 (−3, 1) 0 (−2, −1) 0 (0, −1) 0
b. Is the dataset shown in Table 3: i. Univariate or multivariate?
ii. Continuous or discrete?
c. Re-write the dataset shown in Table 3 using augmented vector notation. [2 marks]
QUESTION 3 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2017 7CCSMPNN
d. A linear threshold neuron has a transfer function which is a linear weighted sum of its inputs and an activation function that applies a threshold (θ) and the Heaviside function. For a linear threshold neuron with parameter values θ = −2, w1 = 0.5 and w2 = 1 calculate the output of the neuron in response to each of the samples in the dataset given in Table 3.
e. Plot the data given in Table 3 and on the same axes plot the decision boundary defined by the linear threshold neuron with parameter values θ=−2,w1 =0.5andw2 =1.
f. Apply the sequential delta learning algorithm to find the parameters of a linear threshold neuron that will correctly classify the data in Table 3. Assume initial values of θ = −2, w1 = 0.5 and w2 = 1, and a learning rate of 1.
SEE NEXT PAGE
May 2017 7CCSMPNN
a. A diagram of 2-2-1 fully connected multilayer neural network is shown below, where “/” denotes a linear function, f(·) = sgn(·) denotes the
sign function, sgn(a) = 1 if a ≥ 0.
Input layer
−1 ifa<0 Hidden layer
Output layer
Show that this neural network is able to solve the XOR problem as defined in Table 4.
QUESTION 4 CONTINUES ON NEXT PAGE
= [x1, x2]T [−1, −1] [−1, +1] [+1, −1] [+1, +1]
SEE NEXT PAGE
[10 marks]
m12 = −0.4
w20 = −1.5
May 2017 7CCSMPNN
b. A radial basis function (RBF) neural network is to be used to correctly classify the XOR dataset shown in Table 4.
i. There are two hidden units that both employ a Gaussian activation function with a standard deviation of σ = ρmax . The centres of
these hidden units are given by the first and the last input patterns
(i.e., the two input patterns in class 1). Compute the outputs at the hidden units for all four input patterns and list them in a table.
ii. Show that the dataset when represented by the hidden units is linearly separable.
iii. The RBF network has one output unit which employs a linear activation function, with a threshold. Write down the equation that would be used to determine the weights of this output neuron using the least squares method. Hence, using the answer you obtained for question 4.b.i write an expression in matrix form that could be used to calculate the weights of the output neuron (you do not need to calculate the pseudo-inverse).
SEE NEXT PAGE
5. a. Give a brief definition of each of the following terms. i. Feature Selection
ii. Feature Extraction
iii. Dimensionality Reduction
b. Write pseudo-code for applying Oja’s learning rule to find the first principal component of a multi-dimensional data set.
[4 marks] 1 3 5 8 11
c. Consider the following 2-dimensional dataset: x1= 2 ,x2= 5 ,x3= 4 ,x4=
i. Calculate the equivalent zero-mean dataset.
7 ,x5= 7 . [4 marks]
ii. Apply two epochs of the batch version of Oja’s learning rule to
the zero-mean data calculated in answer to question 5.c.i. Use a
learning rate of 0.01 and an initial weight vector of [-1,0].
iii. Using the answer obtained to question 5.c.ii, project the zero-mean data onto the first principal component.
SEE NEXT PAGE
May 2017 7CCSMPNN
6. a. Apart from clustering, name one other application for unsupervised learning in pattern recognition.
Table 6, below, shows unlabelled 2-dimensional feature vectors which are to be clustered into two classes.
i xTi 1 (1,0) 2 (0,2) 3 (1,3) 4 (3,0) 5 (3,1)
b. Apply the k-means algorithm to the dataset shown in Table 6. Use the Euclidean distance criterion and start with initial cluster centres of μT1 = (3, 2) and μT2 = (4, 0). Determine the final cluster centres and the cluster to which each training sample is allocated.
c. Apply hierarchical agglomerative clustering to the dataset shown in Table 6, in order to find two clusters. Use the Euclidean distance as the similarity metric and determine the distance between clusters using single-link clustering (where the distance between clusters is the minimum distance between constituent samples). Determine the cluster to which each training sample is allocated.
QUESTION 6 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2017 7CCSMPNN
d. Apply the competitive learning algorithm (without normalisation) to the dataset shown in Table 6. Perform five iterations of the algorithm with the samples chosen in the order of x1, x2, x3, x4, x5. Start with initial cluster centres of w1T = (1, 1) and w2T = (2, 2). Use a learning rate of η = 0.5. Determine the final cluster centres and the cluster to which each training sample is allocated.
FINAL PAGE
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com