CS代写 May 2018 7CCSMPNN

May 2018 7CCSMPNN
a. Explain how Bayes decision rule for minimum error is used to determine
the class of an exemplar.
Table 1, below, shows samples that have been recorded from three univariate class-conditional probability distributions:

Copyright By PowCoder代写 加微信 powcoder

class samples of x
ω1 3.54 4.83 0.74 3.86 3.32 1.69 2.57 3.34 6.58
ω2 15.85 4.75 17.17 5.63 1.68 5.57 0.98
ω3 3.75 4.00 4.53 5.34 1.59
b. Apply kn nearest neighbours to the data shown in Table 1 in order to estimate the density of p(x|ω1) at location x = 4, using k = 3.
c. The class-conditional probability distribution for class 2 in Table 1, p(x|ω2), is believed to be a Normal distribution. Calculate the parameters of this Normal distribution using Maximum-Likelihood Estimation; use the “unbiased” estimate of the covariance.
QUESTION 1 CONTINUES ON NEXT PAGE
SEE NEXT PAGE

May 2018 7CCSMPNN
d. For the data in Table 1, use Parzen-window density estimation to calculate the density of p(x|ω3) at the location x = 4. Use a Gaussian window with width hn = 2. For a Gaussian Parzen window the estimate of the probability density, pn, at a value, x, is given by the formula:
1􏰔n 1 􏰗x−xi􏰘 pn=n hφ h
e. Using the number of samples given in Table 1 estimate the prior probabilities of each class, i.e., calculate P(ω1), P(ω2) and P(ω3).
f. Use the answers to sub-questions 1.b, 1.c, 1.d and 1.e to determine the class of a new sample with value x = 4.
SEE NEXT PAGE

May 2018 7CCSMPNN
a. Give a brief definition of each of the following terms in the context of
classification:
i. Supervised Learning
ii. Generalisation
b. A dichotomizer is defined using the following linear discriminant function: g(x) = wT x+w0 where wT = (−1, −2) and w0 = 3. Using this linear discriminant function determine the class predicted for the feature vectorxT =(2.5,1.5).
c. Sketch the decision boundary defined by the linear discriminant function specified in sub-question 2.b.
d. Table 2, shows a linearly separable dataset. xT Class
(2,−1) (−1,0) (0,0) (1,1) (0,−1)
i. Re-write the dataset shown in Table 2 using augmented vector notation.
QUESTION 2 CONTINUES ON NEXT PAGE
SEE NEXT PAGE

May 2018 7CCSMPNN
ii. A linear threshold neuron has a transfer function which is a linear weighted sum of its inputs and an activation function that applies a threshold (θ) and the Heaviside function. For a linear threshold neuron with parameter values θ = −1, w1 = 3 and w2 = 0.5 calculate the output of the neuron in response to each of the samples in the dataset given in Table 2.
iii. One method for learning the parameters of a linear threshold neuron, is the sequential delta learning algorithm. Give the learning rule for this algorithm, and apply the learning algorithm to find the parameters of a linear threshold neuron that will correctly classify the data in Table 2. Assume initial values of θ = −1, w1 = 3 and w2 = 0.5 , and a learning rate of 1.
SEE NEXT PAGE

a. Give brief definitions of the following terms:
i. Feature selection ii. Feature extraction
b. Principal component analysis (PCA) is a commonly used method for feature extraction in pattern recognition problems. Briefly describe what principal components are, and explain why performing PCA on a dataset before learning may increase classifier accuracy.
c. Write pseudo-code for the Karhunen-Love Transform method for PCA. [5 marks]
d. When performing PCA using neural networks, Oja’s update rule is often used in place of standard Hebbian learning. Give Oja’s update rule, and briefly explain why this rule is preferred.
e. Linear Discriminant Analysis (LDA) is a supervised projection method for finding linear combinations of features in a dataset. Briefly explain Fisher’s LDA method for estimating the weight parameters of a decision boundary to separate two classes in a dataset, and give the objective function.
QUESTION 3 CONTINUES ON NEXT PAGE
SEE NEXT PAGE

May 2018 7CCSMPNN
f. Figure. 1 shows a dataset containing two classes of two-dimensional data, with one class labelled with crosses, and the other class labeled with circles. An axis on which the data could be projected is shown with a black arrow.
Is this projection a suitable axis for maximally discriminating between the two classes of data? Justify your answer.
SEE NEXT PAGE

May 2018 7CCSMPNN
a. Draw a diagram to show each of the following activation functions. • Symmetric hard limit (signum)
• Linear function
• Symmetric sigmoid function • Logarithmic sigmoid function • Radial basis function
b. Referring to the activation functions listed in Question 4.a, which activation function is not suitable for training a feedforward neural network using the backpropagation algorithm? Explain your answer.
c. When the backpropagation algorithm is employed to train a neural network, the dataset is usually partitioned into three sub-datasets. Name these three sub-datasets.
QUESTION 4 CONTINUES ON NEXT PAGE
SEE NEXT PAGE

May 2018 7CCSMPNN
d. Figure 2 shows a diagram of a feedforward neural network with two inputs and two outputs. All activation functions indicated by “/” are a linear function. In the feedforward operation, Table 3 shows the input-output relationship between inputs (x1 and x2) and outputs (z1 and z2). For example, referring to Table 3, when x1 = 2 and x2 = −0.5, the neural network’s outputs are z1 = 98 and z2 = 7.5. Determine the connection weights w11, w12, w21 and w22 which will produce the results in Table 3.
[12 marks]
Hidden layer Output layer / m11=8/z1
/ m22=9/z2 Figure 2: A diagram of 3-layer neural network.
x1 x2 z1 z2 2 −0.5 98 7.5
−4 −6 −168 −246
Table 3: Inputs x1 and x2, and outputs z1 and z2.
Input layer x1/w11
SEE NEXT PAGE

May 2018 7CCSMPNN
a. Briefly explain what a “weak classifier” is.
b. Explain briefly the idea of “ensemble methods” for classification.
c. Name two methods used for designing ensemble classifiers.
d. The Adaboost algorithm is given in Table 4.
Algorithm: Adaboost
Given D = {(x1,y1),(x2,y2),…,(xn,yn)}, yi = {−1,1},i = 1,…,n begin initialise kmax, W1(i) = 1/n, i = 1, . . . , n
do k ← k + 1
hˆk = arg min Ej where Ej = hj ∈H
􏰔 Wk(i) (Weighted error rate)†
i=1,yi ̸=hj (xi )
εk = overall weighted error rate of classifier hˆk (i.e., the minimum Ej)
if εk > 0.5
αk = 12 ln􏰂1−εk􏰃
W k ( i ) e − α k y i hˆ k ( x i )
Wk+1(i) = Z
until k = kmax end
, i = 1,…,n (Update Wk(i))‡
† Finds the classifier hk ∈ H that minimises the error εk
with respect to the distribution Wk(i).
‡ Zk is a normalization factor chosen so that Wk+1(i) is a normalised distribution.
Table 4: Adaboost Algorithm.
QUESTION 5 CONTINUES ON NEXT PAGE
SEE NEXT PAGE

May 2018 7CCSMPNN
i. Consider a classification problem with 5 samples. The dataset is denoted as D = {(x1, −1), (x2, −1), (x3, +1), (x4, +1), (x5, +1)}. Three weak classifiers, h1(x), h2(x) and h3(x), are designed where the classification results are shown in Table 5. Referring to this table, taking h1(x) and x1 as an example, the classification given by the classifier h1(x) with the input sample x1 is +1. Determine α1 by runing the first iteration of Adaboost algorithm shown in Table 4 with kmax = 3.
[14 marks]
Classifier x1 x2 x3 x4 x5 h1(x) +1 −1 +1 +1 +1 h2(x) −1 +1 −1 +1 +1 h3(x) −1 −1 +1 −1 −1
Table 5: Classification Results.
ii. When the Adaboost algorithm continues, it is assumed that hˆ2(x) = h3(x), hˆ3(x) = h2(x), α2 = 0.8 and α3 = 0.9 are obtained. With the results obtained in Question 5.d.i, give the formula for this ensemble classifier.
SEE NEXT PAGE

May 2018 7CCSMPNN
a. In the context of linear support vector machines (SVMs), considering the linearly separable case, the design of a linear SVM classifier is formulated as the following optimisation problem:
min J(w) = 21∥w∥2 w
subjecttoyi(wTxi+w0)≥1, i=1,2,…,N
where w and w0 are the parameters of the linear SVM classifier; yi ∈ {−1,+1} denotes the class label of the sample xi; N is the number of samples and ∥ · ∥ denotes the Euclidean norm operator.
i. Briefly explain why “J(w) = 12∥w∥2” has to be minimised.
ii. Briefly explain why the constraint “yi(wT xi + w0) ≥ 1” has to be satisfied.
iii. If the samples are nonlinearly separable, what can be done to the samples so that the linear SVM classifier becomes a nonlinear one?
b. A training dataset is given as follows:
Class1: x1 =−3;x2 =−2;x3 =6;x4 =8.
Class2: x5 =−1;x6 =0;x7 =1;x8 =2. i. Is the dataset linearly separable? Justify your answer.
QUESTION 6 CONTINUES ON NEXT PAGE
SEE NEXT PAGE

May 2018 7CCSMPNN
􏰄x􏰅 ii.Afeaturemappingfunctionzi =Φ(xi)= i ,i=1,2,3,4,
5, 6, 7, 8, is considered. It is found that the samples z1 to z8 in
the feature space after mapping are linearly separable. Using z2, z5 and z8 as support vectors, design an SVM classifier to correctly classify all the given samples x1 to x8.
[11 marks]
FINAL PAGE

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com