Introduction to Machine Learning Collaborative Filtering
Prof. Kutty
Announcements
Copyright By PowCoder代写 加微信 powcoder
• Final exam
– if you have a hard conflict email me by end of day Monday April 4
– For logistical reasons, no accommodations can be made if you reach out after this date.
movies (from our own planet)
movie lens
user id itemid ratings
users (in a galaxy far far away…)
Recommendations as Matrix Completion
call this the utility (or user-item) matrix Y
How to solve for the missing ratings?
1)Matrix factorization
2)Nearest neighbor prediction
Restate problem with constraints
Given ! with empty cells
Construct low rank !” to approximate ! Use to !” predict missing ratings
What would this mean?
b dimensional
“! e.g.,ifrank”!=2
approximated by
fimensionality
n xm matrix
based on example from Aggarwal 2016
Low-Rank Factorization: interpretation
COMEDY and ACTION are the latent factors rows are user vectors
columns are item vectors
approximated by
more action
less action
Why can we do this?
Theorem: Let “! ∈ R!×# and rank “! = &. ! Thenthereis’∈R!×$ and(% ∈R$×# suchthat “='(%
e . g . , i f r a n k “! = 2
Definition: linearly independent
A set of vectors {#̅ $ , … , #̅(&)} is said to be linearly independent if no vector #̅ ( for ‘ ∈ {1,…,*} can be represented as a linear
combination of the remaining vectors.
In other words, these vectors are linearly dependent if for some ‘
#̅( =,-)#̅) forsome-)∈R
= 2 #̅ = 8 #̅ = 12 are linearly dependent
Definition: rank of a matrix
• Column rank of a matrix ) ∈ R!×# is the size of the largest subset of columns of ) that constitute a linearly independent set.
– column rank of A = row rank of A = rank(A) – rank(A) ≤ min(m, n)
• If rank(A) = min(m, n) then A is said to be full rank
Matrix factorization
Theorem:Let:∈R&×/ and rank : =;.Thenthereis<∈R&×0 and=∈ R0×/ such that : = <=
Example: 1235
• := 2 4 8 12 3 6 7 13
• Claim: rank (A) = 2
– Linearly independent columns {A@ $ ,
– A@ , = 2 A@ ( $ )
• <= 2 8 == 0 0 1 1 :=<=
Matrix factorization: intuition
Given & with empty cells
We may think of " as being approximated by "! = ' ( %
Construct low rank &'
' contains the relevant features of the user and ( contains the relevant features of the movie
UV factorization
dimensionality of
• "! = ' ( %
• Let+*()) ∈R+ bethe,,- rowof'
in this example 2 = 2
' = = +* . , ... , +* !
• Let0̅(0) ∈R+ bethe1,- rowof(
(% = = 0̅ . ,...,0̅ #
UV factorization
"! = ' ( % = )0 )0
= +* ) ⋅ 0 ̅ 0
in this example
"! = ' ( %
Objective Function
Let 3 = { ,, 1 : ")0 is not empty} be the set of observed entries "! = ' ( %
S o "! = ' ( % = +* . , ... , +* ! % 0 ̅ . , ... , 0 ̅ # = +* ) ⋅ 0 ̅ 0 )0 )0 )0
@ ' , ( = 12 C " ) 0 − ' ( % ) 0 4 + 2F C ' ) 5 4 + 2F C ( 0 5 4 (),0)∈3 ),5 0,5
= 12 C " ) 0 − +* ) ⋅ 0 ̅ 0 4 + 2F C +* ) 4 + 2F C 0 ̅ 0 4 (),0)∈3 ) 0
Idea: Minimize @ ', ( using coordinate descent
Algorithm Overview
• Initialize “movie” features 0̅ . , ... , 0̅ #
• Iterate until convergence
to small (random) values
fix 0̅ . , ... , 0̅ #
s o l v e f o r +* . , ... , +* !
m i n . ∑ ( ) , 0 ) ∈ 3 " ) 0 − +* ) ⋅ 0 ̅ 0 4 + 8 +* ) 4 76!4i 4
f i x +* . , ... , +* !
solve for 0̅ . ,...,0̅ #
m i n . ∑ ( ) , 0 ) ∈ 3 " ) 0 − +* ) ⋅ 0 ̅ 0 4 + 8 0 ̅ 0 4
Ridge regression!! ||✓||2 1 Xn (y(i) (✓ ̄ · x ̄(i)))2
Jn, (✓)= 2 +ni=1 2
min Yi4vett4,3actv9 acts
444 ttC7 1225 4 4 E4
2174s5 I244O
D 1,1 1,3 212 3,2 3,3 4,1 15,2 5,3 an 5 Example
y 3 1 V!I23
.4 Fix(findnew+* EIR s
m i n 1 C " . 0 − +* . ⋅ 0 ̅ 0 4 + F +* . 4 76 $ 2 ( . , 0 ) ∈ 3 2
=min 1 "..−+*. ⋅0̅. 4+1 ".;−+*. ⋅0̅; 4+F +*. 76$2 2 2
Goal: Find rank 1 ". Assume $ = 1 in the objective function. Suppose after 1 iteration ' = t6oo,2,3,3,5 # and - = 4,1,5 #
=min 1 5−4+*. 4+1 7−5+*. 4+F +*. 76$2 2 2
Set partial derivative of this expression to 0 and solve for +* . +*. ≈1.3
Noticethaterror "%% − '-# %% & goesfrom 5−24 & to 5−5.2 &
How to solve for the missing ratings?
1)Matrix factorization
2)Nearest neighbor prediction
Nearest Neighbor Prediction
Suppose user a has not rated movie i To predict the rating
– compute similarity between user a and all other users in the system – findthek‘nearestneighbors’(i.e.,kusersmostsimilartousera)who
have rated movie i
– compute a prediction based on these users’ ratings of i
Approach 2: Nearest Neighbor Prediction
Suppose user a has not rated movie i To predict the rating
– compute similarity between user a and all other users in the system
– find the k ‘nearest neighbors’ of user a who have rated movie i – computeapredictionbasedontheseusers’ratingsofi
Correlation as a measure of similarity
Users: a, b
R(a, b): set of movies rated by both users a and b
Y ̃ = 1 X Y
a:b |R(a, b)| aj j2R(a,b)
average rating of user a for movies rated by both users
sim(a, b) = q
Pj2R(a,b)(Y j Y ̃ )(Y j Y ̃ ) a a:b b b:a
P ̃2P ̃2 j2R(a,b)(Yaj Ya:b) j2R(a,b)(Ybj Yb:a)
how much the ratings vary individually i.e., std dev of the ratings of a and b
expected value of how much the ratings vary together i.e., covariance of the ratings of a and b
Sample Pearson’s correlation coefficient
Correlation as a measure of similarity
(Y Y ̃ )(Y Y ̃ ) sim(a,b)= q Pj2R(a,b) aj a:b bj b:a
• measure of linear relationship between two variables
• ranges from [-1, 1]
– -1, +1 → perfect linear relationship
– 0 → no linear correlation
– absolute value is a measure of degree of correlation
̃2 ̃2 j2R(a,b)(Yaj Ya:b) j2R(a,b)(Ybj Yb:a)
Correlation (intuition)
user a's ratings
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
Approach 2: Nearest Neighbor Prediction
Suppose user a has not rated movie i To predict the rating
– compute similarity between user a and all other users in the system
– find the k ‘nearest neighbors’ of user a who have rated movie i – computeapredictionbasedontheseusers’ratingsofi
sim(a, user 1) sim(a, user 2)
sim(a, user k)
ordered in decreasing order of absolute value
Approach 2: Nearest Neighbor Prediction
Suppose user a has not rated movie i To predict the rating
– compute similarity between user a and all other users in the system
– find the k ‘nearest neighbors’ of user a who have rated movie i – computeapredictionbasedontheseusers’ratingsofi
Prediction
Yˆ=Y ̄+P ai a
b2kN N (a,i)
|sim(a,b)|
X sim(a,b)(Y Y ̄) bi b
b2kN N (a,i)
kN N (a, i)
k “nearest neighbors”
i.e., the k most similar users to a, who have rated movie i
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com