程序代写代做代考 flex Recommender Systems

Recommender Systems
COMPSCI 753 Kaiqi Zhao

Memory-based collaborative filtering
§ So far, we have learned memory-based CF
§ Use the entire user-item rating matrix
§ Find nearest neighbors based on the ratings / interpolation weights
§ Challenges:
§ Sparsity: <1% on real datasets § Each user often purchases hundreds from millions of items. § The similarity computed from the sparse user rating vector 𝑟! results in poor similarity measure § Need dimension reduction § Nearest neighbor is not scalable § Computation cost grow with number of users and items, i.e., 𝑂(𝑛𝑚) § Need to store all the training transactions in memory 49 Model-based collaborative filtering § Learn a model to uncover user preference User profile Item profile 𝑟̂ *% Learn user and item profiles using Machine learning models § Dimension reductionàhandle sparse data Item 1 Item 2 Item 3 Rating representation Latent representation 50 Factor from Item 1 and Item 2 Matrix Factorization § MF assumes there are latent vector representation of users and items. § The latent vector of users and items can reconstruct the rating matrix 𝑹 Items Users 𝑸 𝑷𝑻 ≈× 𝑘 factors 𝑘 factors User profiles 𝑛×𝑘 Item profiles 𝑘×𝑚 51 Example § Netflix – (# factors = 3) Item 1 3 5 5 4 5 4 ? 4 2 1 3 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 .1 -.4 .2 -.5 .6 .5 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 Items 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 R ≈ Q ⋅ PT Fig from mmds.org 52 User User Latent space of user and item Geared towards females Ocean’s 11 The Lion King Geared towards males The Color Purple Sense and Sensibility Serious Amadeus Braveheart Lethal Weapon The Princess Diaries Independence Dumb and Day Dumber Fig from mmds.org Funny 53 Singular value decomposition (SVD) 𝑅 𝑈 Λ 𝑉g 𝑟&& ⋯ 𝑟&) 𝑢&& ⋯ 𝑢&, 𝜆&& ⋯ 0 𝑣&& ⋯ 𝑣&) ⋮⋱⋮≈⋮⋱⋮⋮⋱⋮⋮⋱⋮ 𝑟(& ⋯ 𝑟() 𝑢(& ⋯ 𝑢(, 0 ⋯ 𝜆,, 𝑛×𝑚 𝑛×𝑘 𝑘×𝑘 § 𝑅 – rating matrix § 𝑈 – user preference on (latent) factors § Λ – singular values, importance of the factors § 𝑉 – item quality on (latent) factors 𝑹≈𝑸𝑷g, 𝑸 = 𝑼, 𝑷g =𝚲𝑽g 𝑣,& ⋯ 𝑣,) 𝑘×𝑚 54 Singular value decomposition (SVD) § Properties of SVD 𝑅 = 𝑈Λ𝑉^: § 𝑈 and 𝑉 are orthogonal matrices: 𝑈g𝑈 = 𝐼 and 𝑉g𝑉 = 𝐼 § Each column of 𝑈 is an eigenvector of 𝑅𝑅g 𝑈Λ𝑉g𝑉Λ𝑈g = 𝑈Λ-𝑈g § Each column of 𝑉 is an eigenvector of 𝑅g𝑅 𝑉Λ𝑈g𝑈Λ𝑉g = 𝑉Λ-𝑉g 55 Matrix factorization § SVD is both user-based and item-based 𝑅=𝑈𝛬𝑉F ⇒ c𝑅𝑅F𝑢=𝜆𝑢 𝑅F𝑅𝑣 = 𝜆𝑣 𝑈 and 𝑉 are in latent space that encodes similarities i𝑟!%𝑟!% ⋯ i𝑟!%𝑟4% 𝑅𝑅3=%⋮ ⋱%⋮ i𝑟4%𝑟!% ⋯ i𝑟4%𝑟4% % 𝑛×𝑛% User dot-product matrix i𝑟*!𝑟*! ⋯ i𝑟*!𝑟*5 𝑅3𝑅=*⋮ ⋱*⋮ i𝑟*5𝑟*! ⋯ i𝑟*5𝑟*5 * 𝑚×𝑚 * Item dot-product matrix 56 Solving SVD as Optimization § Minimize the Sum of Squared Errors min 6 𝑅ik−𝑈Λ𝑉mik n f,g,h i∈f,k∈l § Pros: § It directly optimizes the SSE and thus minimize the RMSE § Cons: § SVD needs to run on all the entries! § Missing values will also be used to calculate the loss! 57 Latent factor model § A special way to solve the matrix factorization problem by only counting the loss on available ratings! § In this model, we aim to predict ratings 𝑟̂ =𝑞⋅𝑝 - F,n _!#∈𝑹 minZ 𝑟*+−𝑞*⋅𝑝+ § Additional merit - it is easy to incorporate the bias terms! 𝑟̂ = 𝑏 + 𝑏 + 𝑏 + 𝑞 ⋅ 𝑝 ik o i k i k ik i k 58 Finding latent factors § Our goal: F,n _!#∈𝑹 - minZ 𝑟*+−𝑞*⋅𝑝+ § The number of factors 𝑘 § Too large? – overfitting on the limited data, i.e., the model fits perfectly to the current data, but may perform poor on unseen data. § Too small? – not precise in prediction. 59 Regularization § Objective with Regularization: min Z 𝑟*+ − 𝑞* ⋅ 𝑝+ - + 𝜆& Z 𝑞* - + 𝜆- Z 𝑝+ - F,n _!#∈𝑹 *∈C +∈F Length of the vectors SSE 𝜆!, 𝜆" are the hyperparameters to balance SSE and the regularization terms 60 Stochastic Gradient Descent § Our loss function: L=i𝑟−𝑞⋅𝑝"+𝜆i𝑞"+𝜆i𝑝" *% * % ! * " % 6"#∈𝑹 *∈) %∈+ § Stochastic Gradient Decent (SGD) § Initialize 𝑃, 𝑄 and all the bias terms randomly § Do gradient descent: §𝑞-4'¬𝑞- −𝜂ÑG L- !5 !5 !' !# ÑGL(-)=−2𝑟!#−𝑞-⋅𝑝- 𝑝-+2𝜆'𝑞- !' ! # #5 !5 §𝑝-4'¬𝑝- −𝜂Ñ3 L- #5 #5 #' !# Ñ3L(-)=−2𝑟!#−𝑞-⋅𝑝- 𝑞-+2𝜆$𝑝- #' ! # !5 #5 The 𝑓-th factor of in the latent vector 𝑞𝑖 The 𝑓-th factor of in the latent vector 𝑝% 61 Incorporating bias to LFM § Consider predicting the rating of a user 𝑖 to an item 𝑗 with bias: 𝑟̂ = 𝑏 + 𝑏 + 𝑏 + 𝑞 ⋅ 𝑝 § Our goal: F,n,s!,s# _!#∈𝑹 - ik o i k i k min Z 𝑟*+ −𝑞* ⋅𝑝+ −𝑏J −𝑏* −𝑏+ 62 Regularization § Objective with Regularization: min ∑_!#∈𝑹 𝑟*+ −𝑞* ⋅𝑝+ −𝑏J −𝑏* −𝑏+ SSE +𝜆&Z 𝑞* -+𝜆-Z 𝑝+ -+𝜆tZ 𝑏* -+𝜆uZ 𝑏+ - *∈C +∈F *∈C +∈F Length of the vectors 𝜆!, 𝜆", 𝜆,, 𝜆# are the hyperparameters to balance SSE and the regularization terms F,n,s!,s# - 63 Stochastic Gradient Descent § Our loss function: L= i 𝑟 −𝑞3𝑝 −𝑏 −𝑏 −𝑏 "+𝜆 i 𝑞 "+𝜆 i 𝑝 "+𝜆 i 𝑏 "+𝜆 i 𝑏 " *% *% ; * % ! * " % , * # % 6"#∈𝑹 *∈) %∈+ *∈) %∈+ § Stochastic Gradient Decent (SGD) § Initialize 𝑃, 𝑄 and all the bias terms randomly § Run SGD for each rating: §Computethepredictedrating:𝑟̂- =𝑞-F𝑝(9)+𝑏;+𝑏9 +𝑏9 !#!% *% §𝑞-4'¬𝑞- −𝜂ÑG L- =𝑞- −𝜂[−2 𝑟!#−𝑟̂- 𝑝- +2𝜆'𝑞-] !5 !5 !'!# !5 !# #5 !5 §𝑝-4'¬𝑝-−𝜂Ñ3L-=𝑝-−𝜂[−2𝑟!#−𝑟̂- 𝑞-+2𝜆$𝑝-] #5 #5 #'!# #5 !# !5 #5 §𝑏-4'¬𝑏- −𝜂ÑHL- =𝑏- −𝜂[−2 𝑟!#−𝑟̂- +2𝜆(𝑏-] !!!!#! !#! §𝑏-4'¬𝑏- −𝜂ÑHL- =𝑏- −𝜂[−2 𝑟!#−𝑟̂- +2𝜆)𝑏-] ###!## !## 64 A probabilistic view of LFM § Probabilistic matrix factorization (A Mnih NIPS’08) § Assume the latent factors and ratings are continuous random variables. § 𝒒*~𝒩(𝒒*|0, 𝜆&w&𝑰) § 𝒑+~𝒩(𝒑+|0,𝜆w-&𝑰) § 𝑟*+~𝒩(𝑟*+|𝒒*𝒑+, 𝜎-) § This is equivalent to model the rating by the dot product of the two latent vectors plus a Gaussian error: 𝑟* + = 𝒒 * 𝒑 + + 𝜖 , 𝜖 ~ 𝒩 ( 0 , 𝜎 - ) 𝜆& 𝜆- 𝑝+ 𝑗 = 1,...,𝑚 𝑖 = 1,...,𝑛 𝑞* 𝑟*+ 𝜎 65 Probabilistic matrix factorization § The probabilistic matrix factorization (PMF) 𝒒*~𝒩(𝒒*|0, 𝜆&w&𝑰) 𝑟*+~𝒩(𝑟*+|𝒒*𝒑+, 𝜎-) § 𝑟*+~𝒩(𝑟*+|𝒒*𝒑+ +𝑏J +𝑏* +𝑏+,𝜎-) § Incorporating bias: § Flexibility of the model: 𝒑 + ~ 𝒩 ( 𝒑 + | 0 , 𝜆 w- & 𝑰 ) § Modeling different types of response, e.g., binary/counts, by replacing the Gaussian distribution of the ratings to others, e.g., categorical/Poisson distributions. § Putting constraints on the latent factors, e.g., non-negativity by replacing the Gaussian of the latent factors by rectified Gaussian. § But this lecture will focus only on the above formulation 66 Parameter estimation of PMF § Maximum A Posteriori (MAP) § Posterior 𝑝𝑄,𝑃𝑅 =𝑝𝑄,𝑃,𝑅 ∝𝑝𝑅𝑃,𝑄𝑝𝑃𝑝(𝑄) 𝑝(𝑅) 𝒒*~𝒩(𝒒*|0, 𝜆&w&𝑰) 𝒑 + ~ 𝒩 ( 𝒑 + | 0 , 𝜆 w- & 𝑰 ) 𝑟*+~𝒩(𝑟*+|𝒒*𝒑+, 𝜎-) log𝑝 𝑅 𝑃,𝑄 𝑝 𝑃 𝑝 𝑄 =log𝑝 𝑅 𝑃,𝑄 +log𝑝(𝑃)+log𝑝(𝑄) § The log-posterior: = log } 𝑝(𝑟*+|𝑞*, 𝑝+) + log } 𝑝(𝑞*) + log } 𝑝(𝑝+) *+ * + Log-prod = sum of log = Z log 𝑝(𝑟*+|𝑞*, 𝑝+) + Z log 𝑝(𝑞*) + Z log 𝑝(𝑝+) *+ * + 67 Parameter estimation of PMF § Maximum A Posteriori (MAP) § Log-posterior L = Z log 𝑝(𝑟*+|𝑞*, 𝑝+) + Z log 𝑝(𝑞*) + Z log 𝑝(𝑝+) *+ * + 𝒒*~𝒩(𝒒*|0, 𝜆&w&𝑰) 𝒑 + ~ 𝒩 ( 𝒑 + | 0 , 𝜆 w- & 𝑰 ) 𝑟*+~𝒩(𝑟*+|𝒒*𝒑+, 𝜎-) § Put the Gaussian distributions in and ignore constants, we have L=Z− 1 𝑟*+−𝑞*𝑝+ -+Z−𝜆& 𝑞* -+Z−𝜆- 𝑝+ - 2𝜎- 2 2 *+ * + Maximizing the above log-posterior is equivalent to minimizing the loss + regularization in LFM! 68 Summary § Content-based methods § Using item attributes and text content to profile items and users § Collaborative filtering § User-based & Item-based – k-nearest neighbors § Interpolation method § Matrix factorization and Latent factor model § Evaluation of recommender systems 69