程序代写代做代考 go Recommender Systems

Recommender Systems
COMPSCI 753 Kaiqi Zhao

Recap: Collaborative Filtering (CF)
§ Use the similarity to weight the ratings of items from other user 𝑢, 𝑟̂ = ∑B”∈C# 𝑤*,𝑟,+
*+
𝑤*, – similarity between two users
𝑈+ – set of most similar users who rated item 𝑝+
∑B”∈C# 𝑤*,
When predicting the rating, we can either consider 1) all the users or 2) only k most similar users
40

Modeling Bias
§ Use the similarity to weight the ratings of items from other user 𝑢,
𝑟̂ =𝑏 +∑B”∈C#𝑤*,(𝑟,+−𝑏,+)
*+ *+
∑B”∈C# 𝑤*, 𝑏*+ = 𝑏J + 𝑏* + 𝑏+
Problem: The similarity is based on arbitrary metric. Can we learn the similarities?
Solution: View the recommendation problem as an interpolation problem for missing values and learn the interpolation weights
𝑏9 – avg ratings over all transactions 𝑏 – avg ratings of user 𝑖 – 𝑏
! 9 𝑏# – avg ratings of item 𝑗 – 𝑏9
41

Learning the interpolation weights
§ Learn the interpolation weights 𝛼*, instead of arbitrary similarity 𝑟̂ =𝑏 + Z 𝛼 (𝑟 −𝑏 )
*+ *+
*,,+ ,+
B” ∈C#
§ 𝛼*, – the interpolation weight that models the impact of user 𝑘 to
user 𝑖.
§ Note that
§ 𝛼!” maynotsumupto1:∑7″∈+#𝛼!” ≠1 § 𝛼!” may not be symmetric: 𝛼!” ≠ 𝛼”!
42

Learning the interpolation weights
§ Our prediction model:
𝑟̂ =𝑏 + Z 𝛼 (𝑟 −𝑏 )
B” ∈C#
§ Question: How to learn the value of 𝜶𝒊𝒌?
§ Idea – minimize the difference between predicted and real ratings!
*+ *+
*,,+ ,+
𝑆𝑆𝐸= Z 𝑟̂ −𝑟
– *+ *+
(*,+)∈W
§ Merit – directly optimize the RMSE metric!
§ The weight 𝛼!” is then learned based on all the items purchased by both users. § The same approach can be applied to item-based method!
43

Learning the interpolation weights
§ How to minimize a function 𝑓(𝑥)?
§ The function describes a surface, e.g., a mountain § Find the steepest direction to go down
1. Take derivatives of the function 𝛻:𝑓
2. Gradient 𝛻:𝑓(𝑥 – ) – the value of 𝛻:𝑓 at point 𝑥(-)
3. Go along the reverse direction of gradient
4. 𝑥(-4′) ← 𝑥(-) − 𝛻:𝑓(𝑥(-))
44

Gradient descent
§ View from top and get a contour plot
The (minus) gradient points us to the local minima
45

Learning the interpolation weights
§ The optimization problem:
minL𝜶=minZ 𝑏*++Z𝛼*,(𝑟,+−𝑏,+)−𝑟*+

𝜶 𝜶 (*,+)∈W B”∈C#
§ Gradient descent – iterative learning method:
§ 𝜶(-4′) ← 𝜶(-) − 𝜂𝜵𝜶L – , where 𝜂 is called learning rate.
§ How to take derivative 𝜵𝜶L – w.r.t. to the matrix 𝜶
§ take partial derivative w.r.t. each 𝛼!” – 𝛻>!” L ? = @L $
§ @L $ =2∑#∈1[!,:][𝑏!# +∑7 ∈+ 𝛼 – 𝑟”# −𝑏”# −𝑟!#](𝑟”# −𝑏”#) @>!” ” # !”
Sum over all items rated by user 𝑖
@B!”
46

Stochastic Gradient Descent
§ Gradient:
YL* =∑ 2[𝑏*++∑B∈C𝛼” 𝑟,+−𝑏,+ −𝑟*+](𝑟,+−𝑏,+)
Y[!” +∈W[*,:] ” # *,
The gradient w.r.t. one rating 𝑟*%
§ Idea – compute gradient w.r.t. each individual rating, denoted by Ñ𝜶L ”
and make a step to update the parameters *+ §Foreachrating𝑟*+,update𝛼”^& ←𝛼” −𝜂Ñ[ L”
*, *, !” *+
§ SGD runs several passes (epochs) on the full training dataset
§GD:𝜶”^& ←𝜶” −𝜂[∑_ ∈WÑ𝜶L” ] SGDNeedsmorestepstoconverge !# *+ but each step is efficient!
§SGD:𝜶”^& ←𝜶” −𝜂Ñ𝜶L” *+
47

SGD vs. GD
§ Convergence of GD vs. SGD
step
Figure from mmds.org
48
SGD may detour but often goes faster towards the optimal
Value of the objective function