COMP90049 Introduction to Machine Learning, Final Exam
The University of Melbourne Department of Computing and Information Systems
COMP90049 Introduction to Machine Learning November 2021
Identical examination papers: None
Copyright By PowCoder代写 加微信 powcoder
Exam duration: 120 minutes
Reading time: Fifteen minutes
Length: This paper has 9 pages including this cover page.
Authorised materials: Lecture slides, workshop materials, prescribed reading, your own project re- ports.
Calculators: Permitted
Instructions to students: The total marks for this paper is 120, corresponding to the number of min-
utes available. The mark will be scaled to compute your final exam grade.
This paper has three parts, A-C. You should attempt all the questions.
This is an open book exam. You should enter your answers in a Word document or PDF, which can include typed and/or hand-written answers. You should answer each question on a separate page, i.e., start a new page for each of Questions 1–9. Parts within questions do not need new pages. Write the question number clearly at the top of each page. You have unlimited attempts to submit your answer-file, but only your last submission is used for marking.
You must not use materials other than those authorised above. You are not permitted to communi- cate with others for the duration of the exam, other than to ask questions of the teaching staff via the Big Blue Button. Your computer, phone and/or tablet should only be used to access the authorised materials, enter or photograph your answers, and upload these files. The work you submit must be based on your own knowledge and skills, without assistance from any person or unauthorized materials.
There is an embargo on discussing the exam contents for 48 hours after the end of the exam. You must not discuss the exam with anyone during this time (this includes both classmates and non- classmates.)
page 1 of 9 Continued overleaf . . .
COMP90049 Introduction to Machine Learning, Final Exam
COMP90049 Introduction to Machine Learning Final Exam
Semester 2, 2021
Total marks: 120
Students must attempt all questions
Section A: Short answer Questions [27 marks]
Answer each of the questions in this section as briefly as possible. Expect to answer each question in 1-3 lines, with longer responses expected for the questions with higher marks.
Question 1: [27 marks]
(a) You are developing a model to detect an extremely contagious disease. Your data consists of 4000 patients, out of which 100 are diagnosed with this disease. You achieve 96% classification accuracy. (1) Can you trust the outcome of your model? Explain why. (2) what type of error is most important in this task? (3) Name at least one appropriate evaluation metric that you would choose to evaluate your model. [2 marks]
(b) (1) Contrast Wrapper and filtering approaches in terms of how they define the best feature(s). (1-2 sentences) (2) Provide an example scenario where you would prefer using the Wrapper strategy in comparison to feature filtering. (1-2 sentences) [3 marks]
(c) You are given a dataset with three Boolean features, X = (X1, X2, X3), and a Boolean label, Y . You have trained (1) Logistic Regression (2) Naive Bayes (3) Perceptron to learn the mapping from X to Y . For each classifier, state whether using the learned model parameters, they can compute the P (X1, X2, X3, Y ) or not and justify your answer. [4 marks] (N.B. if your answer is yes, write down the formula for calculating P (X1, X2, X3, Y ), otherwise state why you can’t compute this probability)
(d) Suppose you are using gradient descent to train a logistic regression model:
θ(t+1) ← θ(t) − η ∂f (1)
the loss function decreases but only very slowly, (1) what could be the reason and what should you do? (1-2 sentences) (2) Describe one method for deciding when to terminate learning. (1-2 sentences) [3 marks]
(e) Connect the machine learning algorithms on the left with all concepts on the right that apply. [4 marks] (N.B. You may copy the answers onto your answer sheet. You do not need to justify your answer.)
a. Logistic Regression
b. 5-Nearest Neighbor
c. Categorical Naive Bayes d. Perceptron
e. Decision stump
f. Decision tree (depth: 10)
1. Generative model
2. Non-parametric model
3. Probabilistic model
4. Instance-based model
5. Linear decision boundary
6. Non-linear decision boundary 7. Parametric model
page 2 of 9
Continued overleaf . . .
COMP90049 Introduction to Machine Learning, Final Exam
(f) Suppose you have trained a multilayer perceptron that contains three hidden layers for classification on a given dataset. The training accuracy of the model is very high but the validation accuracy is very low. (1) What problem does the model suffer from? (2) Describe one possible reason for this problem. (3) How can you change the number of layers and the number of units in the hidden layers to address the problem? [4 marks]
(g) Briefly explain why the Random Forest manipulates both instances and features for ensemble learn-
ing. [2 marks]
(h) In AdaBoost, if a sample is incorrectly classified by a base model, will the weight of the sample definitely be increased? Justify your answer. [2 marks]
(i) Suppose you want to detect anomalies on a dataset with various densities. You compute the pair- wise distance between every two data points to identify the number of neighboring points within a distance D for every data point. You identify a point as an anomaly if the number of its neighbours is smaller than a certain threshold p. This method may not be able to identify all possible anomalies. Describe two possible reasons. [3 marks]
page 3 of 9 Continued overleaf . . .
COMP90049 Introduction to Machine Learning, Final Exam
Section B: Method Questions [71 marks]
In this section you are asked to demonstrate your conceptual understanding of the methods that we have studied in this subject.
Question 2: Naive Bayes [7 marks]
You want to build a Naive Bayes classifier with 2 Categorical features each with three possible values, X1 ∈ {r,g,b}, X2 ∈ {l,m,h}, and a Boolean label, Y.
(a) What is the minimum number of parameters that you have to estimate to train your NB model? [3 marks](N.B. write down the parameters that you have to estimate and their total number)
(b) Explain the conditional independence assumption in Naive Bayes? [1 mark]
(c) Whatistheminimumnumberofparametersthatyouhavetoestimateifwedon’tassumeconditional independence? [3 marks](N.B. you don’t have to enumerate all the parameters, simply write down the total number and explain in 2 sentences how you achieved this number)
Question 3: Optimization [7 marks]
Consider the two plots of objective functions for a given model M:
(a) For each plot, name a model and a loss function that could result in this shape. Label the axes of both plots accordingly. [4 marks]
(b) What strategy would you choose to optimize each objective function? [1 mark]
(c) Discuss one requirement/characteristic of gradient descent in the context of these two plots. [2 marks]
Question 4: Evaluation [9 marks]
Consider a binary classification task where we aim to learn a function that maps a 2-dimensional input to classes {1,−1}. Training instances belonging to class 1 and −1 are denoted by blue circles and red crosses respectively.
(a) For each of the following learning algorithms, draw the decision boundary on the given training dataset and justify your solution. [6 marks] (N.B. justify each decision boundary in 1-2 sentences. You can copy the image and draw the boundaries in your word/PDF document. Word has a draw option, or use applications such as Preview and Markup (Mac users). You may also copy the plots (approximately) onto your answer sheet, rather than annotating the exercise sheet directly if that is easier.)
page 4 of 9 Continued overleaf . . .
COMP90049 Introduction to Machine Learning, Final Exam
(i) Logistic Regression (ii) Zero-R (ii) 1-NN
(b) Which algorithm (i-iii) results in the highest bias and which in the highest variance? Justify your answer. [3 marks]
Question 5: Neural Network [15 marks]
In the following two-class dataset, X1 and X2 are the input features of the data, and Y is the output class.
0.5 0.2 1 0.3 -0.4 0 -0.4 0.2 0 -0.3 -0.5 1
(a) Can we train a perceptron to perfectly classify the data? Explain why. [2 marks]
(b) Assume that you have built the following multilayer Perceptron (MLP) to classify the data. X0
in the input layer is the bias node, which is set to 1 (For simplicity, no bias node is added to the
hidden layer). a(1) and a(1) are two units in the hidden layer. a(2) is the output unit. The activation 12
functions g of the hidden layer and output layer are Sigmoid function, i.e., g(x) = 1 . All the 1+e−x
parameters of the MLP are initialized as 1. What is the prediction of this MLP on the data point [X1=0.3,X2=-0.4, Y=0]? [3 marks] (N.B. Show your working and provide the values of the hidden units to obtain the output value. Round your calculation by two decimal digits.)
(c) Based on the MLP in Question (b) (No bias node is added to the hidden layer), assume that you want to use this data point [X1=0.3,X2=-0.4, Y=0] to update the parameters of the MLP based on backpropagation algorithm. The loss function is: L = 21 (Y − a(2) )2 . The learning rate is set to 1. After one epoch of training on the selected data point, what are the new parameters of the network? (show your working and provide error of each node to update the parameters.) [10 marks] (N.B. round your calculation by two decimal digits.) (Hint: the derivative of the activation function) g′(x) = g(x)(1 − g(x))
page 5 of 9 Continued overleaf . . .
COMP90049 Introduction to Machine Learning, Final Exam
Question 6: Decision Trees [15 marks]
In the following table, we have 5 instances with 3 attributes Suburb, Area, New, a Class Label. Each row is showing an instance.
(N.B. Calculations up to two decimal points)
Suburb Area
1 S1 Large N 1 2 S2 Large N 1 3 S3 Large Y 1 4 S4 Large Y 2 5 S5 Medium Y 2 6 S6 Large Y 3 7 S4 Large Y 3 8 S7 Small N 3
(a) Calculate the information gain and gain ratio of “New” feature on the dataset. use log2 to compute the results of each step to get full marks.)
[7 marks] (N.B. (b) Does a decision tree exist, which can perfectly classify the given instances? If yes, draw that decision
tree, otherwise, explain why not, by referring to the data. [2 marks]
(c) If we use “Area” to build a decision stump, what is the the predicted label of decision stump for
each of the 8 instances in the data set? [4 marks]
(d) If we use “Suburb” to build a decision stump, what would you expect to see for the accuracy of the decision stump given an evaluation dataset that you have not seen before? Explain why the stump has good/bad accuracy. [2 marks]
Question 7: K-means [10 marks]
Consider the following two clustering results for a dataset that contains 1D data points (each point is shown as a blue dot and the value of each point is shown under the point).
page 6 of 9 Continued overleaf . . .
COMP90049 Introduction to Machine Learning, Final Exam
(a) Use Manhattan distance between data point and cluster centroid to assign a new data point X = 4 to one of the clusters. Which cluster will the point be assigned to based on the clustering result 1? Which cluster will the point be assigned to based on the clustering result 2? [7 marks](N.B. Show your mathematical working.)
(b) Which clustering result do you think is better? Select a criterion to justify your answer. [3 marks]
Question 8: Bias and Fairness [8 marks]
Consider the following data set consisting of 8 training instances, where each instance corresponds to an article written by an author. Each article has four features: Number of citations, quality of publication venue (denoted by A*,A, and B where A* denotes the highest quality and B the worst), number of downloads since publication, and gender of the author. For the purpose of this question, we consider gender feature as a protected attribute. Each instance has a true binary label y which indicates whether the article is deemed ground breaking (1) or not (-1). We also have access to predicted labels from a Multi-layer Perceptron classifier, yˆfull, which was trained to automatically predict the label from all available features.
ID citations
(N.B. Show your mathematical working.)
(c) Propose a strategy to improve the fairness of the Multi-layer Perceptron model in the context of the dataset given. [3 marks]
gender y yˆf ull Male 1 1
Female 1 -1 Female -1 1 Female 1 1
B 1 Male -1 1 B 26 Male -1 -1 A 11 Female-1-1
the fairness criterion of Predictive Parity in the context of the above (b) Is the full model (column yˆfull) fair with respect to the concept of predictive parity? [3 marks]
(a) Define in your own words scenario. [2 marks]
quality downloads
A 140 A* 175
page 7 of 9 Continued overleaf . . .
COMP90049 Introduction to Machine Learning, Final Exam
Section C: Design and Application Questions [22 marks]
In this section you are asked to demonstrate that you have gained a high-level understanding of the methods and algorithms covered in this subject, and can apply that understanding. Expect your an- swer to each question to be from one third of a page to one full page in length. These questions will require significantly more thought than those in Sections A–B, and should be attempted only after having completed the earlier sections.
Question 9: [22 marks]
Imagine that, after graduating from the University of Melbourne, you are hired by a job search engine company. The company has asked you to develop a tool that can assign a category to a CV (curriculum vitae, or resume). You receive thousands of CVs that are uploaded by applicants for you to consider. For each submission, after processing the CV, you have access to the applicant’s structured profile that contains the following list of features:
• Overall GPA
• Degree Title
• Degree Major
• Date of degree completion • Name of the applicant
• Duration of past employment • Home address of the applicant • Gender of the applicant
All submitted CVs belong to three primary categories of interest: “Technology and Engineering”, “Ad- vertising, Arts, and Media”, and “Retail and Consumer Products”. You want to build a machine learning model that assigns incoming CVs to a job category. You do not have access to any labelled data to begin with.
(a) (1) State a specific machine learning algorithm that is appropriate for this task in the beginning and Justify your choice. (2) Explain each step of this algorithm in the context of this task. [5 marks]
(b) Assume now you have access to an additional small set of CVs which are labelled with their correct job category. You train a multi-layer perceptron classifier to assign incoming CVs to a category. (1) Choose an appropriate evaluation strategy and justify your choice. (2) Describe each of the steps you would follow in evaluating your model under this strategy in the context of the given task and data set. [6 marks]
(c) After evaluation, you find that the performance of the model is not satisfactory. Discuss two reasons why this may be the case. [2 marks]
(d) Now you want to improve the performance of your model, by also using CVs for which the true categories are unknown. Assume you have access to an expert who can distinguish Job categories based on CVs. You want to improve the performance of the model by leveraging the expert’s knowledge efficiently. [7 marks]
(d.1) Select an appropriate machine learning algorithm and justify your choice. (d.2) Explain the algorithm in the context of this data set.
(d.3) Justify any settings of the algorithm you may need to decide on.
page 8 of 9 Continued overleaf . . .
COMP90049 Introduction to Machine Learning, Final Exam
(e) Discuss two problems you may anticipate in taking advantage of the full feature set in this dataset. [2 marks]
— End of Exam —
page 9 of 9 End of Exam
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com