Recommender Systems
COMPSCI 753 Kaiqi Zhao
Evaluation of RS
§ How to get labeled data?
§ Hold-out some ratings/transactions § Random ratings
§ Latest ratings
§ Cold-start scenarios
1. In-matrix prediction – Interaction between
existing items for existing users
2. User cold-start – Existing items for new users
3. Item cold-start – New items for existing users
4. User & Item cold-start – New items for new users
Dataset split
34
Evaluation of RS
§ Rating Prediction Error
§ Mean Square Error (MSE)
𝑀𝑆𝐸=1 :𝑟−𝑟̂
User Item Predicted Groundtruth 𝑟̂ 𝑟
$𝑢! 3.5 𝑢” 1 𝑢” 5 𝑢, 𝑖” 3.6 4
𝑖,
3.3
𝑖!
4
𝑖-
4.9
|𝑅-./-| !,#∈1$%&$
!# !#
§ Root Mean Square Error (RMSE)
𝑅𝑀𝑆𝐸= 1 : 𝑟 −𝑟̂ $
𝑀𝑆𝐸=14 0.2$+3$+0.1$+0.4$ ≈2.3
|𝑅-./-| !,#∈1$%&$
!# !#
𝑅𝑀𝑆𝐸 =
𝑀𝑆𝐸 ≈ 1.5
Netflix Competition
35
Evaluation of RS
§ Top-N recommendation – return N items with the best scores § Pecision@N and Recall@N
Recommended
Not recommended
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = #𝑡𝑝 #𝑡𝑝 + #𝑓𝑝
𝑅𝑒𝑐𝑎𝑙𝑙 = #𝑡𝑝 #𝑡𝑝 + #𝑓𝑛
True positive (tp)
False negative (fn)
False positive (fp)
True negative (tn)
Preferred
Not preferred
Top-3 recommendations for user 𝑢
Items that 𝑢 purchased/highly rated in test set
For user 𝒖 Precision@3= !
Recal@3= ! !0,
= 0.33 = 0.25
𝑖”
𝑖-
𝑖/
𝑖!!
𝑖,
𝑖-
𝑖.
The overall Precision@N and Recall@N are average on all users
!0″
36
Evaluation of RS
§ Top-N recommendation – return N items with the best scores § Receiver Operating Characteristic (ROC) curve
§ True positive ratio (TPR): #-3 #-34#56
§ False positive ratio (FPR): #53 #534#-6
Recommended Not recommended
Preferred Not preferred
Varying the size of recommendation set, i.e., 𝑁 to get the curve
tp
fn
fp
tn
FPR
37
TPR
Evaluation of RS
§ Top-N recommendation – return N items with the best scores § Area Under Curve (AUC)
Item rank
𝑁=3
N
TPR
FPR
Area increment
1
1/7
0/8
1/7 * 1 = 1/7
2
2/7
0/8
1/7 * 1 = 1/7
3
3/7
0/8
1/7 * 1 = 1/7
4
3/7
1/8
0
5
4/7
1/8
1/7 * (1-1/8) = 1/8
6
5/7
1/8
1/7 * (1-1/8) = 1/8
7
6/7
1/8
1/7 * (1-1/8) = 1/8
8
6/7
2/8
0
9
6/7
3/8
0
10
6/7
4/8
0
11
7/7
4/8
1/7 * (1-4/8) = 1/14
TP
FP
FPR
𝐴𝑈𝐶=3+3+ 1 =7 7 8 14 8
38
TPR
Evaluation of RS
§ Top-N recommendation – return N items with the best scores
§ Area Under Curve (AUC) 𝐴𝑈𝐶(𝑢) = ∑#∈%’& ∑#(∈%)& 𝟏 S + TS +(
F&’ ⋅|F&)|
o 𝑃74 – the groundtruth set of items preferred by user
Item rank
𝑁=3
o 𝑃78 – the groundtruth set of TP items NOT preferred by user
F P
FPR
𝑃’0 = 7 , 𝑃’1 = 8
39
TPR