EECS E6720: Bayesian Models for Machine Learning Columbia University, Fall 2020
Homework 1: Due Sunday, October 11, 2020 by 11:59pm
Please read these instructions to ensure you receive full credit on your homework.
Submit the written portion of your homework as a single PDF file through Courseworks (less than 5MB). In addition to your PDF write-up, submit all code written by you in their original extensions through Courseworks (e.g., .m, .r, .py, etc.). Any coding language is acceptable, but your code should be your own. Do not submit Jupyter or other notebooks, but the original source code only. Do not wrap your files in .rar, .zip, .tar and do not submit your write-up in .doc or other file type. Your grade will be based on the contents of one PDF file and the original source code. Additional files will be ignored. We will not run your code, so everything you are asked to show should be put in the PDF file. Show all work for full credit.
Late submission policy: Late homeworks will have 0.1% deducted from the final grade for each minute late. Your homework submission time will be based on the time of your last submis- sion to Courseworks. Therefore, do not re-submit after midnight on the due date unless you are confident the new submission is significantly better to overcompensate for the points lost. You can resubmit as much as you like, but each time you resubmit be sure to upload all files you want graded! Submission time is non-negotiable and will be based on the time you submitted your last file to Courseworks. The number of points deducted will be rounded to the nearest integer.
Problem 1. (25 points)
In the first two problems, you will derive and implement an EM algorithm for the object recom- mendation problem discussed in class. Here, we have a data set of the form R = {rij } restricted to a subset of pairs (i,j) ∈ Ω, where i can range from 1,…,N and j can range from 1,…,M and we let rij ∈ {±1}. The goal of matrix factorization is to find a low rank approximation of this data.
To this end, let the unknown model variables be ui ∈ Rd and vj ∈ Rd for i = 1,…,N and j = 1,…,M. Let U = {ui} and V = {vj}. The model and priors are,
ind T iid iid
rij|U,V ∼ Bernoulli(Φ(ui vj/σ)), for all (i,j) ∈ Ω, ui ∼ N(0,cI), vj ∼ N(0,cI).
The function Φ(·) is the CDF of a standard normal random variable, as discussed in class. (Hint: You will find the probit regression discussion useful for this homework.)
The goal of this homework will be to derive and implement an EM algorithm for maximizing p(R,U,V,φ) q(φ)
lnp(R,U,V) = q(φ)ln q(φ) dφ + q(φ)lnp(φ|R,U,V)dφ,
L(U,V )
where the auxiliary variable φ is used in the probit setup similar to that discussed in class for
probit regression: φ = {φij} for (i,j) ∈ Ω and rij = sign(φij), φij ∼ Normal(uTi vj,σ2).1 1 Note: In the data, rij ∈ {−1, +1} instead of {0, 1}. This won’t impact the final derivation.
1
Problem 2 will focus on implementing the algorithm. To this end:
a) Derive q(φ) for the EM algorithm. Hint: You will be able to show q(φ) = (i,j)∈Ω q(φij).
b) Derive L(U, V ). You can use the Eq [·] notation and write the relevant expectation separately.
c) Derive the U, V that optimize L(U, V ). Hint: Derive for one (and therefore all) ui and vj .
d) Summarize the final EM algorithm in a list of steps that are easy to read, yet detailed enough for someone to be able to implement it. (Be sure to write out ln p(R, U, V ).)
Problem 2. (25 points)
In this problem, you will implement the EM algorithm you derived in Problem 1 for finding a local optimal solution to ln p(R, U, V ). The data is provided on Courseworks along with a README file explaining the data. For your implementation, set d = 5, c = 1 and σ2 = 1. Treat the users as U and the movies as V when implementing the algorithm.2
You will need to initialize your algorithm in some way. One possibility is to start by generating each ui and vj from Normal(0,0.1I). Please use this method for this assignment.
a) Run your algorithm for 100 iterations and plot ln p(R, U, V ) for iterations 2 through 100.
b) Rerun your algorithm for 100 iterations using 5 different random starting points. Plot the 5 different objective functions for iterations 20 through 100. Note: This is simply a repeat of Problem 2(a), only showing 5 objective functions instead of one and changing the x-axis.
c) Predict the values given in the test set and show your results in a confusion matrix. Show the raw counts in this confusion matrix.
Problem 3. (25 points)
In this problem, you will implement the MCMC sampling approach discussed in Lecture 3 on the data from Problem 2. In this scenario, treat rij as a real-valued observation as described in the notes. Therefore, φij does not appear in this problem. This is not ideal, but can still produce useful results. Again set d = 5, c = 1 and σ2 = 1. Initialize U and V to all zeros. (It’s worth thinking why this initialization wouldn’t work for EM, but works for MCMC.) Since the algorithm was already derived in the notes, you don’t need to rederive the Gibbs sampler.
a) Write out the log joint likelihood for the model (for any d, c and σ2), showing all steps in how you arrived at the final mathematical equation.
b) Run your code for 1000 iterations. In two separate figures, plot the log joint likelihood for all iterations, and the log joint likelihood for iterations 100 to 1000.
c) Using samples collected after every 25 iterations (starting with iteration 100), calculate a Monte Carlo approximation to E[rij|R] for each value in the test set. Predict +1 if this expectation is positive, and −1 otherwise. Show your test results in a confusion matrix as you showed them in Problem 2(c).
2Hint: Depending how you implement it, your code may be very fast or very slow. You may want to change the data format so you aren’t looping over the list of ratings. Also, the EM algorithm will work if you change around the orders: e.g., update q(φ), then update U, then update q(φ), then update V. We didn’t present EM this way—we presented it as doing “E” for everything then “M” for everything in one iteration—but it may help develop your understanding of EM if you think through why this suggestion also works. Your code will work if you ignore this hint, but it may end up being slow, and this hint is probably not the only way to faster code.
2