CS代考 Introduction to Machine Learning Collaborative Filtering

Introduction to Machine Learning Collaborative Filtering
Prof. Kutty

Announcements

Copyright By PowCoder代写 加微信 powcoder

• Final exam
– if you have a hard conflict email me by end of day Monday April 4
– For logistical reasons, no accommodations can be made if you reach out after this date.

movies (from our own planet)
movie lens
user id itemid ratings
users (in a galaxy far far away…)

Recommendations as Matrix Completion
call this the utility (or user-item) matrix Y

How to solve for the missing ratings?
1)Matrix factorization
2)Nearest neighbor prediction

Restate problem with constraints
Given ! with empty cells
Construct low rank !” to approximate ! Use to !” predict missing ratings

What would this mean?
b dimensional
“! e.g.,ifrank”!=2
approximated by
fimensionality
n xm matrix
based on example from Aggarwal 2016

Low-Rank Factorization: interpretation
COMEDY and ACTION are the latent factors rows are user vectors
columns are item vectors
approximated by

more action
less action

Why can we do this?
Theorem: Let “! ∈ R!×# and rank “! = &. ! Thenthereis’∈R!×$ and(% ∈R$×# suchthat “='(%
e . g . , i f r a n k “! = 2

Definition: linearly independent
A set of vectors {#̅ $ , … , #̅(&)} is said to be linearly independent if no vector #̅ ( for ‘ ∈ {1,…,*} can be represented as a linear
combination of the remaining vectors.
In other words, these vectors are linearly dependent if for some ‘
#̅( =,-)#̅) forsome-)∈R
= 2 #̅ = 8 #̅ = 12 are linearly dependent

Definition: rank of a matrix
• Column rank of a matrix ) ∈ R!×# is the size of the largest subset of columns of ) that constitute a linearly independent set.
– column rank of A = row rank of A = rank(A) – rank(A) ≤ min(m, n)
• If rank(A) = min(m, n) then A is said to be full rank

Matrix factorization
Theorem:Let:∈R&×/ and rank : =;.Thenthereis<∈R&×0 and=∈ R0×/ such that : = <= Example: 1235 • := 2 4 8 12 3 6 7 13 • Claim: rank (A) = 2 – Linearly independent columns {A@ $ , – A@ , = 2 A@ ( $ ) • <= 2 8 == 0 0 1 1 :=<= Matrix factorization: intuition Given & with empty cells We may think of " as being approximated by "! = ' ( % Construct low rank &' ' contains the relevant features of the user and ( contains the relevant features of the movie UV factorization dimensionality of • "! = ' ( % • Let+*()) ∈R+ bethe,,- rowof' in this example 2 = 2 ' = = +* . , ... , +* ! • Let0̅(0) ∈R+ bethe1,- rowof( (% = = 0̅ . ,...,0̅ # UV factorization "! = ' ( % = )0 )0 = +* ) ⋅ 0 ̅ 0 in this example "! = ' ( % Objective Function Let 3 = { ,, 1 : ")0 is not empty} be the set of observed entries "! = ' ( % S o "! = ' ( % = +* . , ... , +* ! % 0 ̅ . , ... , 0 ̅ # = +* ) ⋅ 0 ̅ 0 )0 )0 )0 @ ' , ( = 12 C " ) 0 − ' ( % ) 0 4 + 2F C ' ) 5 4 + 2F C ( 0 5 4 (),0)∈3 ),5 0,5 = 12 C " ) 0 − +* ) ⋅ 0 ̅ 0 4 + 2F C +* ) 4 + 2F C 0 ̅ 0 4 (),0)∈3 ) 0 Idea: Minimize @ ', ( using coordinate descent Algorithm Overview • Initialize “movie” features 0̅ . , ... , 0̅ # • Iterate until convergence to small (random) values fix 0̅ . , ... , 0̅ # s o l v e f o r +* . , ... , +* ! m i n . ∑ ( ) , 0 ) ∈ 3 " ) 0 − +* ) ⋅ 0 ̅ 0 4 + 8 +* ) 4 76!4i 4 f i x +* . , ... , +* ! solve for 0̅ . ,...,0̅ # m i n . ∑ ( ) , 0 ) ∈ 3 " ) 0 − +* ) ⋅ 0 ̅ 0 4 + 8 0 ̅ 0 4 Ridge regression!! ||✓||2 1 Xn (y(i) (✓ ̄ · x ̄(i)))2 Jn,(✓)= 2 +ni=1 2 min Yi4vett4,3actv9 acts 444 ttC7 1225 4 4 E4 2174s5 I244O D 1,1 1,3 212 3,2 3,3 4,1 15,2 5,3 an 5 Example y 3 1 V!I23 .4 Fix(findnew+* EIR s m i n 1 C " . 0 − +* . ⋅ 0 ̅ 0 4 + F +* . 4 76 $ 2 ( . , 0 ) ∈ 3 2 =min 1 "..−+*. ⋅0̅. 4+1 ".;−+*. ⋅0̅; 4+F +*. 76$2 2 2 Goal: Find rank 1 ". Assume $ = 1 in the objective function. Suppose after 1 iteration ' = t6oo,2,3,3,5 # and - = 4,1,5 # =min 1 5−4+*. 4+1 7−5+*. 4+F +*. 76$2 2 2 Set partial derivative of this expression to 0 and solve for +* . +*. ≈1.3 Noticethaterror "%% − '-# %% & goesfrom 5−24 & to 5−5.2 & How to solve for the missing ratings? 1)Matrix factorization 2)Nearest neighbor prediction Nearest Neighbor Prediction Suppose user a has not rated movie i To predict the rating – compute similarity between user a and all other users in the system – findthek‘nearestneighbors’(i.e.,kusersmostsimilartousera)who have rated movie i – compute a prediction based on these users’ ratings of i Approach 2: Nearest Neighbor Prediction Suppose user a has not rated movie i To predict the rating – compute similarity between user a and all other users in the system – find the k ‘nearest neighbors’ of user a who have rated movie i – computeapredictionbasedontheseusers’ratingsofi Correlation as a measure of similarity Users: a, b R(a, b): set of movies rated by both users a and b Y ̃ = 1 X Y a:b |R(a, b)| aj j2R(a,b) average rating of user a for movies rated by both users sim(a, b) = q Pj2R(a,b)(Y j Y ̃ )(Y j Y ̃ ) a a:b b b:a P ̃2P ̃2 j2R(a,b)(Yaj Ya:b) j2R(a,b)(Ybj Yb:a) how much the ratings vary individually i.e., std dev of the ratings of a and b expected value of how much the ratings vary together i.e., covariance of the ratings of a and b Sample Pearson’s correlation coefficient Correlation as a measure of similarity (Y Y ̃ )(Y Y ̃ ) sim(a,b)= q Pj2R(a,b) aj a:b bj b:a • measure of linear relationship between two variables • ranges from [-1, 1] – -1, +1 → perfect linear relationship – 0 → no linear correlation – absolute value is a measure of degree of correlation ̃2 ̃2 j2R(a,b)(Yaj Ya:b) j2R(a,b)(Ybj Yb:a) Correlation (intuition) user a's ratings https://en.wikipedia.org/wiki/Pearson_correlation_coefficient Approach 2: Nearest Neighbor Prediction Suppose user a has not rated movie i To predict the rating – compute similarity between user a and all other users in the system – find the k ‘nearest neighbors’ of user a who have rated movie i – computeapredictionbasedontheseusers’ratingsofi sim(a, user 1) sim(a, user 2) sim(a, user k) ordered in decreasing order of absolute value Approach 2: Nearest Neighbor Prediction Suppose user a has not rated movie i To predict the rating – compute similarity between user a and all other users in the system – find the k ‘nearest neighbors’ of user a who have rated movie i – computeapredictionbasedontheseusers’ratingsofi Prediction Yˆ=Y ̄+P ai a b2kN N (a,i) |sim(a,b)| X sim(a,b)(YY ̄) bi b b2kN N (a,i) kN N (a, i) k “nearest neighbors” i.e., the k most similar users to a, who have rated movie i 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com