CS计算机代考程序代写 decision tree The University of Hong Kong Department of Electrical and Electronic Engineering

The University of Hong Kong Department of Electrical and Electronic Engineering
ELEC6008 Pattern recognition and machine learning (2020-2021)
Written Assignment: 100 marks
(Counts towards 15% of the overall assessment)
1. Submission Format and Deadline
Solution addressing the problems should be submitted in PDF format and uploaded
to the Moodle System by Apr 14, 2021 23:55 (Wed) (GMT +8:00).
2. Reminder on plagiarism:
Plagiarism is a serious misconduct. Students taking part in plagiarism, whether copying from others or allowing others to copy one’s work, will receive heavy penalties. The misconduct will be reported to the University’s Disciplinary Committee for disciplinary action.
Answer ALL questions. Q1.
(a) Let the likelihood of the two classes 1 and 2 with respect to x be given by (x+2)2 (x−5)2
p(x|1)= 1 e− 2 and p(x|2)= 1 e− 8 . 2 2 2
(Sub-total: 18) The a priori probabilities for the two classes are given by
P(1)=0.8and P(2)=0.2.
i) Find the Maximum Likelihood Classifier .
(5 marks)
ii) Using the Bayes rule P(i | x) = p(x |i )P(i ) , find the classifier(s) for the two p(x)
classes. (If there are more than one decision boundaries, you should find them as well)
(5 marks)
Client X wants to apply the above classifier for bio-medical applications and has suggested the following loss functions for Bayes classification:
is actually1 is actually2
0
5
1
0
ELEC6008 Written Assignment
pg 1 of 5
choosing 1 choosing 2

iii) Write down the 4 different values of the loss function (1 |1), (1 |2), (2 |1)and (2 |2).
(2 mark)
iv) Find the Bayes Minimum Risk Classifier using the new loss function in (iii).
(4 marks)
v) Suggest and explain whether the new classifier in (iv) is still a minimum error rate classifier.
(2 marks)
(b) Consider the following criterion function for finding a hyperplane to separate the two classes of samples, which contain x1 =[4,1]T , x2 =[3,2]T (Class 1) and
x3 = [6,8]T , x4 =[9,9]T (Class 2),
Jq(a)= −aT y. (Sub-total:15)
yYC
i) The Gradient Descent can be used to solve Jq (a). Write down the expression
in terms of (k) , a Jq (a), a(k+1) and a(k) that solves a iteratively.
(2 marks)
ii)Supposetheaugmentedfeaturevectorisdefinedas y=[1,×1,x2]T .Using(i)and (ii),find a(2) and a(3) withaninitialization a(1) =[0,0,0]T andastepsize (k) =1.
(6 marks)
iii) Student Y suggests the soft-margin SVM should be employed rather than the
2N
perceptron, which is given as min w +Cmax(0,1−z(wTx+w)),
~
zi = 1,−1 . Using an initialization w = [w0 , w ] = [0,0,0]
 ( k ) = 0.1 and regularization parameter C = 10 , find w~ ( 2 ) and w~ ( 3 ) .
ELEC6008 Written Assignment
pg 2 of 5
w,w0 ,i 2 i=1
~(1) (1) (1)T T T
ii0
, step size
(7 marks)

Q2.
(a) Let the likelihood of a parameter  of the density function given as
 p(x|)=3 

x0 otherwise
.
(Sub-total:16)
{x1 , x2 , x3 , x4 } = {2,5,7,11} , (6 marks)
4
5/2×4 exp(−x2) 0
i) Given a set of
determine the maximum likelihood of  .
independent
Assume that the parameter  has an a priori probability
p( ) = 0.5[ ( − 2) +  ( − 3)] ,
where  (.) is the ideal unit impulse function informally defined as:
(y)= y=0 and  (y)=1.  0 otherwise −
ii) Determine the posterior probability p( | x1 , x2 , x3 , x4 ) . iii) Find the Maximum A Posteriori (MAP) Estimate of  .
(6 marks) (4 marks)
feature samples
(b) Consider the following independently drawn samples
X = {1,2,2,4,5,7,8,9,9} , N = 9 (Sub-total: 17)
i) Find p(x) for x = 4.5 using the Parzen window with a bandwidth hd = 2 using the rectangular window.
(5 marks)
ii) The Silverman’s Rule is a method to choose the bandwidth. Suggest under what situation the determined bandwidth is optimal.
iii) Find p(x) for x = 4.5 using the kNN method with kn = 3 .
(1 mark)
(5 marks)
iv) Suppose X 1 = [1,1,3,4,5,5] and X 2 = [8,9,9,11,12,12] belongs to class 1 and class 2 respectively, suggest which class does an arbitrary value x = 7 belongs to if the kNN method with kn = 3 is used.
(5 marks)
vi) Explain why an even kn should not be used in a two-class classification problem.
(1 mark)
ELEC6008 Written Assignment pg 3 of 5

Q3.
(a) Consider the following 2-class data samples.
Figure 1. A plot of the two-class samples. (Sub-total: 16)
i) Draw a neural network with fewest units that can separate the samples.
(2 marks)
ii) Given the 10 independent samples above x1 , x2 ,…x10 and target values t1 ,t2 ,…t10 ,write the risk function for back propagation assuming a quadratic loss
in terms of xn and tn , n =1,2,…,10.
iii) What is the dimension of the gradient of the risk function?
(2 marks) (2 marks)
iv) Express the partial derivative R/wjk of the synapse wjk connecting the
j − th input to the k − th net activation of the final layer in terms of the targets tn , output from the previous layers zp, j,n , the activation function fk (u) and its
derivative fk'(u).
v) Write down the primal formulation of the linear hard-margin SVM in the form a
constrained optimization problem.
Consider the following inequality constrained solver for vi), vii) and viii):
~~
min . f (w) subject to bl  c(w)  bu , w = [w1 , w2 ,…, wI ]
T
.
x
vii) What should be f (w) and c(w) if it is quadratic programming solver? Define
~~
the necessary input matrices, vectors or scalars (if required)
viii) Using the results of vii), write down the input matrices, vectors or scalars (if required) if the above quadratic programming solver is used to solve a hard- margin linear SVM.
(3 marks)
ELEC6008 Written Assignment pg 4 of 5
(2 marks) (2 marks)
(3 marks)

(b) Consider the following data samples:
X =[[2,4]T,[3,6]T,[5,8]T,[6,6]T,[8,10]T]T
1
= [[7,4]T ,[8,5]T ,[9,7]T ,[10,6]T ,[11,10]T ]T
X
i)
ii) Determine the Fisher’s Linear Discriminant (FLD) w . Normalize w .
iii) State ONE merit and ONE short-coming for the PCA and the FLD.
(2 marks)
iv) Student A suspects that there is a hidden class among the samples and he wants to perform clustering assuming there are 3 classes. Describe the k-means procedure in words.
(3 marks)
v) Student Z raised concern about Student A’s assumption on the number of classes. He suggests one should begin by assigning each sample to the same cluster (so there is only one cluster). Afterwards, the cluster is split into parts until no smaller clusters could be formed. Name this approach.
2
Find the 1st and 2nd Principal Components (PCs) of X =[XT ,XT ]T 12
vi) Explain the meaning of an impurity measure in decision tree.
(1 mark) (2 marks)
vii) Explain why transformation and filtering techniques are used in feature extraction. Suggest an advantage of HARR transform over a 2-D Fourier Transform.
(2 marks)
 END OF ASSIGNMENT 
ELEC6008 Written Assignment pg 5 of 5
(Sub-total: 18)
(4 marks) (4 marks)