May 2019 7CCSMPNN
a. Write down Bayes’ Rule, giving an expression for the posterior, P (ωj |x), in terms of the likelihood, prior and evidence. Where ωj represents category j, and x is a sample.
b. Briefly describe what is meant by a “Minimum Error Rate classifier”. [2 marks]
c. Explain how Bayes’ Rule can be used to determine the class of an exemplar so as to minimise error rate.
Copyright By PowCoder代写 加微信 powcoder
d. Show, mathematically, that the k-nearest-neighbour classifier provides an estimate of P(ωj|x)
Table 1, below, shows samples from two classes:
class xT ω1 (5,1)
ω2 (2,1) ω2 (4,2)
QUESTION 1 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2019 7CCSMPNN
e. Apply the k-nearest-neighbour classifier to the data given in Table 1 to determine the class of a new sample with value xT = (4, 0). Use Euclidean distance to measure the distance between samples, and use:
i. k=1 ii. k=3
f. Apply kn nearest neighbours to the data shown in Table 1 in order to estimate the likelihoods, or class-conditional probability densities, at location (4,0) for each class, i.e., calculate p(x|ω1) and p(x|ω2). Use Euclidean distance to measure the distance between samples, and use k=1.
g. Using Table 1 estimate the prior probabilities of each class, i.e., calculate P(ω1) and P(ω2).
h. Using the answers to sub-questions 1.f and 1.g apply Bayes’ Rule to determine the posterior for each class for the sample with feature vector (4,0), i.e., calculate P (ω1|x) and P (ω2|x). If you have failed to answer sub-questions 1.f and 1.g use the following values: p(x|ω1) = 0.2, p(x|ω2) = 0.1, P (ω1) = 0.3, and P (ω2) = 0.7.
SEE NEXT PAGE
a. Give a brief definition of each of the following terms:
i. Feature Space
ii. Decision Boundary
b. A classifier consists of three linear discriminant functions: g1(x), g2(x), and g3(x). The parameters of these three discriminant functions, in augmented notation, are: aT1 = (1, 0.5, 0.5), aT2 = (−1, 2, 2), aT3 = (2, −1, −1). Determine the class predicted by this classifier for a feature vector xT = (0, 1).
Table 2, below, shows a dataset consisting of five exemplars that come from three classes:
Class Feature Vector, xT 1 (0,1)
2 (0.5,0.5)
c. Do the linear discriminant functions defined in sub-question 2.b correctly classify the data shown in Table 2? Justify your answer.
QUESTION 2 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2019 7CCSMPNN
d. It is suggested that a quadratic discriminant function of the form g(x) = w0 + i wixi + i,j wijxixj could successfully classify the data. To learn such a quadratic discriminant function we can learn a linear discriminant function in an expanded feature space. Re-write Table 2 showing the new feature vectors in this expanded feature space.
e. Apply the sequential multiclass perceptron learning algorithm to find the parameters for three linear discriminant functions that will correctly classify the data in the expanded feature space defined in answer to sub-question 2.d. Assume that the initial parameters of these three discriminant functions, in augmented notation, are:
aT1 = (1, 0.5, 0.5, −0.75), aT2 = (−1, 2, 2, 1), aT3 = (2, −1, −1, 1). Use a learning rate of 1. If more than one discriminant function produces the maximum output, choose the one with the lowest index (i.e., the one that represents the smallest class label).
f. Write pseudo-code for the sequential Widrow-Hoff learning algorithm when applied to learning a discriminant function for a two-class problem.
SEE NEXT PAGE
May 2019 7CCSMPNN
a. A linear threshold neuron has a threshold θ = −2 and weights w1 = −1 and w2 = 3. The activation function is the Heaviside function. Calculate the output of this neuron when the input is x = (2, 0.5)T .
b. Write pseudo-code for updating the parameters of a single linear threshold neuron using the sequential delta learning algorithm.
c. Briefly explain what is meant by a competitive neural network, and describe two methods for implementing the competition.
d. A negative feedback network has three inputs and two output neurons, 110
that are connected with weights W = 1 1 1 . Determine the
activation of the output neurons after 4 iterations when the input is x = (1,1,0)T, assuming that the output neurons are updated using parameter α = 0.25, and the activations of the output neurons are initialised to be all zero.
e. Write down the objective function used in sparse coding, and explain the role played by each of the components of the function in representing data efficiently.
QUESTION 3 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2019 7CCSMPNN
f. Given a dictionary, VT , what is the best sparse code for the signal x out of the following two alternatives:
i) yT = (1,2,0,−1)
ii)yT =(0,0.5,1,0)
WhereVT = −4 3 2 −1 ,andx=(2,3)T. Assumethat
sparsity is measured as the count of elements that are non-zero. Use a trade-off parameter of λ = 1 in order to calculate the costs.
SEE NEXT PAGE
May 2019 7CCSMPNN
a. Figure 1 shows a decision tree of a binary-class classification problem
where the classes are denotes as ω1 for class 1 and ω2 for class 2. The
sample representation is x = x1 < 3
Figure 1: A decision tree.
i. Given a sample x = 7 , predict its class.
ii. Sketch the decision boundary given by the decision tree where the values of x1 and x2 in the axes should be shown.
QUESTION 4 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2019 7CCSMPNN
b. Figure 2 shows a 3-layer partially connected feed-forward neural network.
Linear function is used as the activation function in all the input,
hidden and output units. The training dataset is given in Table 3
where the sample representation is in the form of x = 1 , output
x2 representation is in the form of z = 1 and its target representation
is in the form of t = 1 . Batch Backpropagation is employed to
perform training using the cost J =
denotes the number of training samples; zp is the network output due
to the p-th sample and tp denotes its target. Assume that only w11 is updated. Given the learning rate η = 0.1, determine the updated value of w11 at the next iteration.
Input layer
Hidden layer
[15 marks]
Output layer
m11 =5 f(·) m22 =7 f(·)
Jp where Jp = 2∥tp −zp∥ ; N
QUESTION 4 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
Figure 2: A diagram of neural network.
t (target output) −6
−3 −2 44
Table 3: Input samples x and its target output t.
SEE NEXT PAGE
May 2019 7CCSMPNN
a. Draw the diagram of Fuzzy Inference System with labels. Briefly
describe each component.
b. A fuzzy inference system using max-product inference method has 2 rules as shown below:
Rule 1: IF x is Zero THEN y is Big Rule 2: IF x is Nonzero THEN y is Small
where the fuzzy sets are given as:
Zero = {0.1 + 0.3 + 0.7 + 1 + 0.7 + 0.3 + 0.1}, −5 −3 −1 0 1 3 5
Nonzero = {0.9 + 0.7 + 0.3 + 0 + 0.3 + 0.7 + 0.9}, −5 −3 −1 0 1 3 5
Big = {1 + 1 + 0.9 + 0.5 + 0.2 + 0}, 012345
Small = {0 + 0.2 + 0.5 + 0.9 + 1 + 1}. 012345
i. Identify the linguistic variables and linguistic terms.
ii. Considering input x = 1, what are the firing strengths for rules 1 and 2? Sketch the inferred output membership function for each rule. Sketch the overall inferred output membership function. The grades of membership must be shown.
iii. Based on the overall inferred output membership function obtained in Q5.b.ii, compute the defuzzified output using the Centroid method.
SEE NEXT PAGE
May 2019 7CCSMPNN
x a. In a classification problem, the sample representation is x = 1
2.5 1 2.5 −0.5 Class2:x5= −1.5 ,x6= 0 ,x7= 1.5 ,x8= −1 ,
where class 1 is denoted by class label “−1” (i.e., y1 = y2 = y3 = y4 = −1) and class 2 is denoted by class label “+1” (i.e., y5 = y6 = y7 = y8 = +1). A support vector machine (SVM) hard classifier with hyperplane x1 = 0 was designed to classify the dataset.
i. Is the datasets linearly separable? Explain your answer.
x2 −2.5 −1 −2.5 0.5
The following dataset is considered: Class 1:
x1= −1.5 ,x2= 0 ,x3= 1.5 ,x4= −0.5 ;
ii. Determine the margin of the SVM classifier f(x).
[5 marks] [6 marks]
iii. In the context of SVM, the constraint for samples is given as
yi(wT xi + w0) ≥ 1 − ξi
where w is the weight vector of hyperplane, w0 is the bias of hyperplane and ξi is known as slack variable. Identify the samples with non-zero ξi and determine their values.
QUESTION 6 CONTINUES ON NEXT PAGE
SEE NEXT PAGE
May 2019 7CCSMPNN
iv. In the context of SVM, the primal problem can be described as
L(w,w0,ξ,λ,μ)=2∥w∥ +C
−μiξi −λi(yi(wTxi +w0)−1+ξi))
whereμ= μ μ ... μ ;λ= λ λ ... λ areLagrange
multipliers; μi ≥ 0, λi ≥ 0 for all i = 1,2,··· ,N; N denotes the
number of training samples. What are the values of λi corresponding to the samples x1, x3, x5 and x7. Explain your answer.
FINAL PAGE
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com