Recommender Systems
https://spark.apache.org/docs/latest/ml-collaborative-filtering.html
ECA5372 Big Data and Technologies 1
Recommendation Systems
Who are using RecSys?
ECA5372 Big Data and Technologies 2
Recommendation Systems
The different types
• Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. https://en.wikipedia.org/wiki/Collaborative_filtering
• if a person A has the same opinion as a person B on an issue, A is more likely to have the same opinion as B on a different issue than that of a randomly chosen person.
• Content-based filtering methods are based on a description of the item and a profile of the user’s preferences.
https://en.wikipedia.org/wiki/Recommender_system
• •
In a content-based recommender system, keywords are used to describe the items and a user profile is built to indicate the type of item this user likes.
These algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended.
ECA5372 Big Data and Technologies
3
Explicit versus Implicit Feedback
What’s the difference?
• Explicit feedback or preferences are given by the user to the item, for example, users
giving ratings to movies.
• It is common in many real-world use cases to only have access to implicit feedback (e.g. views, clicks, purchases, likes, shares etc.).
• Essentially, instead of trying to model the matrix of ratings directly, this implicit feedback approach treats the data as numbers representing the strength in observations of user actions (such as the number of clicks, or the cumulative duration someone spent viewing a movie).
• Those numbers are then related to the level of confidence in observed user preferences, rather than explicit ratings given to items.
• The model then tries to find latent factors that can be used to predict the expected preference of a user for an item.
https://spark.apache.org/docs/latest/ml-collaborative-filtering.html#explicit-vs-implicit-feedback
ECA5372 Big Data and Technologies 4
Matrix Multiplication
Multiplying a matrix by another matrix
(1, 2, 3) • (7, 9, 11) = 1×7 + 2×9 + 3×11 = 58
(1, 2, 3) • (8, 10, 12) = 1×8 + 2×10 + 3×12 = 64
ECA5372 Big Data and Technologies 5
Matrix Factorization
What is it and how does it work?
Matrix factorization is a decomposition of a matrix into a product of
matrices.
• In the case of collaborative filtering, matrix factorization algorithms work by decomposing
the user-item interaction matrix into the product of two lower dimensionality rectangular matrices.
• One matrix can be seen as the user matrix
• rows represent users and columns are latent factors.
• The other matrix is the item matrix
• rows represent latent factors and columns are items.
Fewer number of columns
• Latent factors are the features in the lower dimension latent space projected from user-item interaction matrix.
• The idea behind matrix factorization is to use latent factors to represent user preferences or movie topics in a much lower dimension space.
ECA5372 Big Data and Technologies 6
Matrix Factorization
Decomposition of a matrix into a product of matrices
r
≈
p
q
r = rating matrix
m users, n movies
p = user matrix
m users, f latent factors or features
q = item matrix
n movies, f latent factors or features
A rating rui can be estimated by the dot product of user matrix pu and item matrix qi transposed.
items
user matrix
Rating Prediction
item matrix
“̂ ≈& ⋅() #$ # $
User Preference Factor Vector
Item Preference Factor Vector
ECA5372 Big Data and Technologies
7
users
Using Matrix Factorization
for Collaborative Filtering
Objective: Predict the the unknown ratings
Given a sparse matrix of user ratings on items as shown in this figure, we will decompose it (using matrix factorization) into an user matrix and an item matrix. A sparse matrix contains many empty cells with no values.
As can be observed in this figure, we are interested to predict or estimate the ratings that are originally unknown in this sparse matrix.
We can do that by using machine learning methods (e.g. regression) to approximate the corresponding values in the user matrix and the item matrix. After matrix multiplication, the approximated values should give us a predicted rating.
ECA5372 Big Data and Technologies 8
Using Matrix Factorization
for Collaborative Filtering
We use rating of 5 in matrix X, to approximate the k values in this row.
The obvious issue is to decide how we can approximate those two sets of k values in the U matrix and the V matrix. Typically, we start by using random numbers and then iteratively check whether the matrix multiplication of those numbers will produce a number that is close to the known rating in matrix X. We usually run a regression using gradient descent to minimise the prediction error. After many rounds, we can get two sets of k values that can produce a predicted rating similar to the known rating in matrix X. This
process is then repeated for all known ratings in matrix X.
We use rating of 5 in matrix X, to approximate the k
values in this column.
≈
ECA5372 Big Data and Technologies 9
Using Matrix Factorization
for Collaborative Filtering
0.5 0.6 0.2
1.9 6.2 1.6
≈
ECA5372 Big Data and Technologies
10
Using Matrix Factorization
for Collaborative Filtering
0.5 0.6 0.2 1.7 2.3 4.3
0.3 1.9 0.9 6.2 0.1 1.6
≈
ECA5372 Big Data and Technologies
11
Alternating Least Squares
What is it and how does it work?
• ALS is a matrix factorization algorithm, it breaks down a matrix M into two matrices X and Y
such that:
• Since there could be millions of users and many thousands of items, m and n are typically very
! =& ⋅)* • Rmxn istheratingsmatrix
“×$ “×’ ‘×$ • Umxk andPnxk arethetwolatentfactors
large values. We usually choose the value of k to be small. This is also called rank.
• ALS is alternating because it works by keeping one of X or Y as a fixed, with pre-filled values.
Read this: https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
ECA5372 Big Data and Technologies 12
Alternating Least Squares
What is it and how does it work?
Read this: https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
ECA5372 Big Data and Technologies 13
Alternating Least Squares
What is it and how does it work?
Read this: https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
ECA5372 Big Data and Technologies 14
Alternating Least Squares
What is it and how does it work?
Read this: https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
ECA5372 Big Data and Technologies 15
Alternating Least Squares
What is it and how does it work?
Read this: https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
ECA5372 Big Data and Technologies 16
Alternating Least Squares
What is it and how does it work?
Read this: https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
ECA5372 Big Data and Technologies 17
Gradient Descent
Minimizing the error
Gradient Descent minimizes a cost function. Minimizing any function means finding the deepest valley in that function. The cost function is used to monitor the error in predictions of an ML model. So minimizing this means getting to the lowest error value possible or increasing the accuracy of the model. In short, We increase the accuracy by iterating over a training data set while tweaking the parameters (the weights and biases) of our model. So, the whole point of Gradient Descent is to minimize the cost function.
ECA5372 Big Data and Technologies 18