CSE 404: Introduction to Machine Learning (Fall 2020)
Homework #6
Due 11/9/2020 by 11.59pm
Note: (1) LFD refers to the textbook “Learning from Data”. (2) Please upload a soft copy of your homework on D2L.
1. (50 points) Exercise 3.6 (page 92) in LFD. Cross-entropy error measure.
(a) (25 points) More generally, if we are learning from ±1 data to predict a noisy target P (y|x) with candidate hypothesis h, show that the maximum likelihood method reduces to the task of finding h that minimizes
N 1 1 Ein(w)= [yn =+1]lnh(xn)+[yn =−1]ln1−h(xn)
n=1
Hint: Use the likelihood p(y|x) = likelihood formulation.
h(x) for y = +1 1−h(x) fory=−1
and derive the maximum
(b) (25 points) For the case h (x) = θ wT x, argue that minimizing the in-sample error in part (a) is equivalent to minimizing the one given below
N n=1
Note from Book : For two probability distributions {p, 1 − p} and {q, 1 − q} with binary
N
Ein(w)= 1 ln1+e−ynwTxn
outcomes, the cross-entropy (from information theory) is plog1+(1−p)log 1 .
q 1−q
The in-sample error in part (a) corresponds to a cross-entropy error measure on the data
point (xn,yn), with p = [yn = +1] and q = h(xn).
2. (50 points) Exercise 3.7 (page 92) in LFD. For logistic regression, show that
∇Ein(w)=−1 N ynxn
N n=1 1+eynwTxn
1 N
= N
Argue that a ’misclassified’ example contributes more to the gradient than a correctly classi-
fied one.
Hint: Remember the logistic regression objective function Ein (w) = 1 N ln 1 + exp −ynwT xn N n=1
and take it’s derivative with respect to w.
n=1
−ynxnθ −ynwT xn
1