Winter 2018
Midterm Test Duration: 120 minutes
Aids Allowed: None
CSC 411/2515 H1
Copyright By PowCoder代写 加微信 powcoder
Student Number: Family Name(s): Given Name(s): Lecture Section:
Afternoon Section Evening Section
Do not turn this page until you have received the signal to start. In the meantime, please read the instructions below carefully.
This test consists of 6 questions on 22 pages (including this one), printed on both sides of the paper. When you receive the signal to start, please make sure that your copy of the test is complete, fill in the identification section above, and write your name on the back of the last page.
Answer each question directly on the test paper, in the space provided, and use the reverse side of the pages for rough work. If you need more space for one of your solutions, use the reverse side of a page and indicate clearly the part of your work that should be marked.
Write up your solutions carefully! If you are giving only one part of an answer, indicate clearly what you are doing. Part marks might be given for incomplete solutions where it is clearly indicated what parts are missing.
You must write the test in pen if you would like to potentially request for the test to be regraded.
Marking Guide
#1: /15 #2: /20 #3: /10 #4: /15 #5: /10 #6: /20
TOTAL: /90
Page 1 of 22 Good Luck!
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 2 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1
Question 1. [15 marks]
Part (a) [3 marks]
We would like to use 1-Nearest Neighbour to classify the point p as either X or O using the training data shown below. What is the prediction if cosine distance (i.e, negative cosine similarity) is used as the distance measure? What is the prediction if Euclidean distance is used?
(A) Cosine distance: O, Euclidean distance: X (B) Cosine distance: O, Euclidean distance: O (C) Cosine distance: X, Euclidean distance: X (D) Cosine distance: X, Euclidean distance: O
Part (b) [3 marks]
Which of the following learning curves demonstrates overfitting? Circle one of choices 1-5.
Page 3 of 22
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 4 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Part (c) [3 marks]
Suppose you train a logistic regression classifier and the learned hypothesis function is hθ(x) = σ(θ0 + θ1×1 + θ2×2),
where θ0 = 6,θ1 = 0,θ2 = −1. Which of the following represents the decision boundary for hθ(x)?
Part (d) [3 marks]
Alice and Bob, two students from the CSC411/2515 night section, walk home after lecture, and see what appears to be an alien spaceship floating in the sky. Alice concludes that aliens are real. How can we use Bayes’ rule to explain why Bob might not reach the same conclusion?
(A) Bob’s reasoning is more like gradient descent with momentum, which can give different results than gradient descent without momentum.
(B) Alice is using MAP inference, whereas Bob is using Maximum Likelihood inference.
(C) Bob’s prior beliefs about aliens’ existence are different from Alice’s.
(D) None of the above.
Part (e) [3 marks]
Which of the following is (are) true about optimizers?
(A) We can speed up training by using an optimizer that uses a different learning rate for each weight.
(B) Dropout should not be used alongside momentum.
(C) Reducing the batch size when using Stochastic Gradient Descent always improves training.
(D) It does not make sense to use Stochastic Gradient Descent to train a linear regression model because linear regression is convex.
(E) All of the above.
Page 5 of 22 over…
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 6 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Question 2. [20 marks]
Your training set is D = {(x(1), y(1)), (x(2), y(2)), …, (x(m), y(m))}. Assume that your model for the data is y(i) ∼ Laplace(θT x(i), 1).
The probability density function of the Laplace distribution with mean μ and scale parameter b is
1 −|x−μ| f(x|μ,b) = 2b exp b
Part (a) [5 marks]
Write down the formula for the likelihood of the training set D.
Part (b) [15 marks]
Suppose you would like to learn the model with penalized maximum likelihood, with an L1 regulariza- tion/penalty term. Derive the Gradient Descent update for this setting. Briefly justify every step. Draw a rectangle around the formula that you derived.
Page 7 of 22 over…
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 8 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Question 3. [10 marks]
Suppose we are interested in a neuron in the 4-th layer of a convolutional neural network. What are two different methods of visualizing what the job of that neuron is? Do not just give names of algorithms or methods; give enough detail for the reader to be able to implement the methods you are describing.
Method 1: Details:
Method 2: Details:
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 10 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Question 4. [15 marks]
Bob would like to classify emails as spam or non-spam. He would like to estimate the probability that a new email e containing the keywords (w1,w2,…,wn) is spam by taking all the emails in the training set with those keywords, and then computing the proportion of those emails that are spam. Specifically, he estimates the probability using
P(spam|new email e) = num. of spam emails with keywords w1,w2,…,wn in the training set. num. of total emails with keywords w1, w2, …, wn in the training set
Part (a) [3 marks]
Explain why Bob’s plan will generally not work.
Part (b) [5 marks]
Describe the datasets for which Bob’s plan might work. Be specific: state which properties are required of
the datasets.
Page 11 of 22 over. . .
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 12 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Part (c) [5 marks]
Describe how to estimate the probability that an email is spam using the Naive Bayes assumption. Your answer should include formulas.
Part (d) [2 marks]
Why does the Naive Bayes assumption address a problem with Bob’s plan?
Page 13 of 22 over. . .
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 14 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Question 5. [10 marks]
Explain why a one-hidden-layer neural network is less likely to overfit if it has fewer units in the hidden layer. You should assume the reader is familiar with neural networks, but you should not assume that the reader is familiar with other concepts from CSC411/2515.
Page 15 of 22 over. . .
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 16 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Question 6. [20 marks]
The Poisson distribution is used to model data that consists of non-negative integers. Its probability mass function is p(k) = λke−λ . For the dataset {k1,k2,k3,…,kn}, the maximum likelihood estimate for λ is
Suppose you observe m integers in your training set. Your model assumption is that each integer is sampled from one of two different Poisson distributions. You would like to learn this model using the EM algorithm.
Part (a) [2 marks]
List all the parameters of the model.
Part (b) [4 marks]
What is the likelihood of the training set under the model?
Part (c) [7 marks]
Derive the E-step for this model. Your answer should include a mathematical justification. Draw a
rectangle around the formula that you derived.
1 n k! n j=1 kj.
Page 17 of 22 over. . .
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 18 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Part (d) [7 marks]
Derive the M-step for this model. Your answer should include a mathematical justification. Draw a rectangle around the formula that you derived.
Page 19 of 22 over. . .
Use this page for rough work—clearly indicate any section(s) to be marked.
Page 20 of 22 cont’d. . .
Winter 2018 Midterm Test CSC 411/2515 H1 Additional page for answers
Page 21 of 22 over. . .
Family Name(s): Given Name(s):
On this page, please write nothing except your name.
Page 22 of 22 Total Marks = 90 End of Midterm Test
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com