AutoSVD++: An Efficient Hybrid Collaborative Filtering Model via Contractive Auto-encoders
Shuai Zhang University of New South Wales Sydney, NSW 2052, Australia shuai.zhang@student.unsw.edu.au
ABSTRACT
Collaborative filtering (CF) has been successfully used to provide users with personalized products and services. However, dealing with the increasing sparseness of user-item matrix still remains a challenge. To tackle such issue, hybrid CF such as combining with content based filtering and leveraging side information of users and items has been extensively studied to enhance performance. However, most of these approaches depend on hand-crafted feature engineering, which is usually noise-prone and biased by different feature extraction and selection schemes. In this paper, we propose a new hybrid model by generalizing contractive auto-encoder para- digm into matrix factorization framework with good scalability and computational efficiency, which jointly models content information as representations of effectiveness and compactness, and leverage implicit user feedback to make accurate recommendations. Ex- tensive experiments conducted over three large-scale real datasets indicate the proposed approach outperforms the compared methods for item recommendation.
CCS CONCEPTS
•Information systems → Recommender systems;
KEYWORDS
collaborative filtering; deep learning; contractive auto-encoders
1 INTRODUCTION
With the increasing amounts of online information, recommender systems have been playing more indispensable role in helping people overcome information overload, and boosting sales for e- commerce companies. Among different recommendation strategies, Collaborative Filtering (CF) has been extensively studied due to its effectiveness and efficiency in the past decades. CF learns user’s preferences from usage patterns such as user-item historical interac- tions to make recommendations. However, it still has limitation in dealing with sparse user-item matrix. Hence, hybrid methods have been gaining much attention to tackle such problem by combining content-based and CF-based methods [7].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
SIGIR ’17, August 07-11, 2017, Shinjuku, Tokyo, Japan
©2017ACM. 978-1-4503-5022-8/17/08…$15.00 DOI: http://dx.doi.org/10.1145/3077136.3080689
Lina Yao University of New South Wales Sydney, NSW 2052, Australia lina.yao@unsw.edu.au
Xiwei Xu
Data61, CSIRO Sydney, NSW 2015, Australia Xiwei.Xu@data61.csiro.au
However, most of these approaches are either relying hand- crafted advanced feature engineering, or unable to capture the non- triviality and non-linearity hidden in interactions between content information and user-item matrix very well. Recent advances in deep learning have demonstrated its state-of-the-art performance in revolutionizing recommender systems [4], it has demonstrated the capability of learning more complex abstractions as effective and compact representations in the higher layers, and capture the com- plex relationships within data. Plenty of research works have been explored on introducing deep learning into recommender systems to boost the performance [2, 9, 10, 12]. For example, Salakhutdi- nov et al. [9] applies the restricted Boltzmann Machines (RBM) to model dyadic relationships of collaborative filtering models. Li et al. [6] designs a model that combines marginalized denoising stacked auto-encoders with probabilistic matrix factorization.
Although these methods integrate both deep learning and CF techniques, most of them do not thoroughly make use of side in- formation (e.g., implicit feedback), which has been proved to be effective in real-world recommender system [3, 7]. In this paper, we propose a hybrid CF model to overcome such aforementioned shortcoming, AutoSVD++, based on contractive auto-encoder par- adigm in conjunction with SVD++ to enhance recommendation performance. Compared with previous work in this direction, our contributions of this paper are summarized as follows:
• OurmodelnaturallyleveragesCFandauto-encoderframe- work in a tightly coupled manner with high scalability. The proposed efficient AutoSVD++ algorithm can significantly improve the computation efficiency by grouping users that shares the same implicit feedback together;
• By integrating the Contractive Auto-encoder, our model can catch the non-trivial and non-linear characteristics from item content information, and effectively learn the semantic representations within a low-dimensional embed- ding space;
• Our model effectively makes use of implicit feedback to further improve the accuracy. The experiments demon- strate empirically that our model outperforms the com- pared methods for item recommendation.
2 PRELIMINARIES
Before we dive into the details of our models, we firstly discuss the preliminaries as follows.
2.1 Problem Definition
Givenuseru=[1,…,N]anditemi=[1,…,M],theratingrui ⊂
R ∈ RN ×M is provided by user u to item i indicating user’s prefer-
ences on items, where most entries are missing. Let r ˆ denote the ui
arXiv:1704.00551v3 [cs.IR] 13 Jun 2017
predicted value of rui , the set of known ratings is represented as K = {(u,i)|rui is known}. The goal is to predict the ratings of a set of items the user might give but has not interacted yet.
2.2 Latent Factor Models
2.2.1 Biased SVD. Biased SVD [5] is a latent factor model, un- like conventional matrix factorization model, it is improved by introducing user and item bias terms:
rˆ =μ+b +b +VTU (1) ui uiiu
where μ is the global average rating, bu indicates the observed deviations of user u, bi is the bias term for item i, Uu ∈ Rk and Vi ∈ Rk represent the latent preference of user u and latent property of item i respectively, k is the dimensionality of latent factor.
2.2.2 SVD++. SVD++ [5] is a variant of biased SVD. It extends the biased SVD model by incorporating implicit information. Gen- erally, implicit feedback such as browsing activity and purchasing history, can help indicate user’s preference, particular when explicit feedback is not available. Prediction is done by the following rule:
Figure 1: Illustration of AutoSVD (remove the implicit feed- back) and AutoSVD++.
3.1 AutoSVD
Suppose we have a set of items, each item has many properties or side information, the feature vector of which can be very high- dimensional or even redundant. Traditional latent factor model like SVD is hard to extract non-trivial and non-linear feature represen- tations [12]. Instead, we propose to utilize CAE to extract compact and effective feature representations:
cae(Ci)=sf(W ·Ci +bh) (5)
whereCi ∈Rdc representstheoriginalfeaturevector,cae(Ci)∈Rk denotes the low-dimensional feature representation. In order to integrate the CAE into our model, the proposed hybrid model is formulated as follows:
rˆ =μ+b +b +(β·cae(C)+ε)TU (6) uiui iiu
Similar to [11], we divide item latent vector Vi into two parts, one is the feature vector cae(Ci ) extracted from item-based con- tent information, the other part εi ∈ Rk (i = 1…n) denotes the latent item-based offset vector. β is a hyper-parameter to normalize cae(Ci ) . We can also decompose the user latent vector in a similar way. However, in many real-world systems, user’s profiles could be incomplete or unavailable due to privacy concern. Therefore, it is more sensible to only include items side information.
3.2 AutoSVD++
While the combination of SVD and contractive auto-encoders is capable of interpreting effective and non-linear feature representa- tions, it is still unable to produce satisfying recommendations with sparse user-item matrix. We further propose a hybrid model atop contractive auto-encoders and SVD++ , which takes the implicit feedback into consideration for dealing with sparsity problem. In many practical situations, recommendation systems should be cen- tered on implicit feedback [3]. Same as AutoSVD, we decompose the item latent vectors into two parts. AutoSVD++ is formulated as the following equation:
T−1 rˆ=μ+b+b+V(U+|N(u)|2 y) (2)
uiuiiu j j∈N(u)
where yj ∈ Rf is the implicit factor vector. The set N (u) contains the items for which u provided implicit feedback, N(u) can be replaced by R(u) which contains all the items rated by user u [7], as implicit feedback is not always available. The essence here is that users implicitly tells their preference by giving ratings, regardless of how they rate items. Incorporating this kind of implicit information has been proved to enhance accuracy [5]. This model is flexible to be integrated various kinds of available implicit feedback in practice.
2.3 Contractive Auto-encoders
Contractive Auto-encoders (CAE) [8] is an effective unsupervised learning algorithm for generating useful feature representations. The learned representations from CAE are robust towards small perturbations around the training points. It achieves that by using the Jacobian norm as regularization:
cae(θ)= (L(x,д(f(x)))+λ∥Jf(x)∥F2) (3) x ∈Dn
where x ∈ Rdx is the input, Dn is the training set, L is the recon- ′
structionerror,theparametersθ = W,W ,bh,by ,д(f(x))isthe reconstruction of x, where:
′ д(f(x))=sд(Wsf(Wx+bh)+by) (4)
sf is a nonlinear activation function, sд is the decoder’s activation function, bh ∈ Rdh and by ∈ Rdx are bias vectors, W ∈ Rdh ×dx
and W ′
W =W .Thenetworkcanbetrainedbystochasticgradientdescent algorithm.
′
∈ Rdh ×dx are weight matrixes, same as [8], we define
3 PROPOSED METHODOLOGY
In this section, we introduce our proposed two hybrid models, namely AutoSVD and AutoSVD++, respectively.
T−1
rˆ =μ+b +b +(β·cae(C )+ε ) (U +|N(u)| 2 y ) (7)
uiuiiiu j j∈N(u)
Figure 1 illustrates the structure of AutoSVD and AutoSVD++.
3.3 Optimization
Welearntheparametersbyminimizingtheregularizedsquared error loss on the training set:
min
22 2 22 freд=bu+bi+∥εi∥+∥Uu∥+ yj (9)
j∈N(u)
The regularization for AutoSVD is identical to AutoSVD++ with
the implicit factor yj removed.
In this paper, we adopt a sequential optimization approach. We
first obtain the high-level feature representations from CAE, and then integrated them into the AutoSVD and AutoSVD++ model. An alternative optimization approach, which optimizes CAE and AutoSVD (AutoSVD++) simultaneously, could also apply [6]. How- ever, the later approach need to recompute all the item content feature vectors when a new item comes, while in the sequential situation, item feature representations only need to be computed once and stored for reuse.
The model parameters are learned by stochastic gradient descent (SGD). First, SGD computes the prediction error:
def
e =r −rˆ (10) ui ui ui
then modify the parameters by moving in the opposite direction of the gradient. We loop over all known ratings in K. Update rules for AutoSVD are as follows:
Algorithm 1 Efficient training algorithm for AutoSVD++
1: procedureUpdateParameters
2: for all user u do 2
(r −rˆ)2+λ·f (8) ui ui reд
3: pim←|N(u)|−1 ·j∈N(u)yj
b∗,ε∗,U ∗,y∗
where freд is the regularization terms to prevent overfitting. The
4: 5: 6: 7: 8: 9:
10: 11: 12: 13:
pold ←pim
for all training samples of user u do
upadate other parameters
pim ←pim +γ2(eui ·(β ·cae(Ci)+εi)−λ2 ·pim) end for
(u,i)∈K freд for AutoSVD++ is as follows:
for all i in items rated by u do
−1 im old
yi←yi+|N(u)|2·(p −p ) end for
end for end procedure
Table 1: Datasets Statistics
#items #users #ratings density(%)
dataset
MovieLens 100k MovieLens 1M MovieTweetings
1682 943
3706 6040
27851 48852
100000 6.30 1000209 4.46
bu ←bu +γ1(eui −λ1 ·bu)
bi ←bi +γ1(eui −λ1 ·bi)
εi ←εi +γ2(eui ·Uu −λ2 ·εi) (13)
Uu ←Uu +γ2(eui ·(β ·cae(Ci)+εi)−λ2 ·Uu) (14) Update rules for AutoSVD++ are:
4.1.1 Dataset Description. We evaluate the performance of our AutoSVD and AutoSVD++ models on the three public accessible
1
datasets.MovieLens isamovieratingdatasetthathasbeenwidely
used on evaluating CF algorithms, we use the two stable benchmark datasets, Movielens-100k and Movielens-1M. MovieTweetings[1] is also a new movie rating dataset, however, it is collected from social media, like twitter. It consists of realistic and up-to-date data, and incorporates ratings from twitter users for the most recent and popular movies. Unlike the former two datasets, the ratings scale of MovieTweetings is 1-10, and it is extremely sparse. The content information for Movielens-100K consists of genres, years,
2 countries, languages, which are crawled from the IMDB website .
For Movielens-1M and Movietweetings, we use genres and years as the content information. The detailed statistics of the three datasets are summarized in Table 1.
4.1.2 EvaluationMetrics. WeemploythewidelyusedRootMean Squared Error (RMSE) as the evaluation metric for measuring the prediction accuracy. It is defined as
−1 εi←εi+γ2(eui·(Uu+|N(u)| 2 ·
j∈N(u)
yj)−λ2·εi)
(15)
(11) (12)
4.1 Experimental Setup
603401
0.049
−1
∀j∈N(u):yj ←yj+γ2(eui·|N(u)| 2 ·(β·cae(Ci)+εi)−λ2·yj)
(16) Where γ1 and γ2 are the learning rates, λ1 and λ2 are the regular- isation weights. Update rule for Uu of AutoSVD++ is identical to
equation (14).
Although AutoSVD++ can easily incorporate implicit informa-
tion, it’s very costly when updating the parameter y. To accelerate the training process, similar to [13], we devise an efficient training algorithm, which can significantly decrease the computation time of AutoSVD++ while preserving good performance. The algorithm for AutoSVD++ is shown in Algorithm 1.
4 EXPERIMENTS
In this section, extensive experiments are conducted on three real- world datasets to demonstrate the effectiveness of our proposed models.
1
|T| ui ui
RMSE=
where |T | is the number of ratings in the testing dataset, ruˆi denotes
(rˆ −r )2 (17)
(u,i)∈T
the predicted ratings for T , and rui is the ground truth.
4.2 Evaluation Results
4.2.1 Overall Comparison. Except three baseline methods in- cluding NMF, PMF and BiasedSVD, four very recent models closely relevant to our work are included in our comparison.
1
2
https://grouplens.org/datasets/movielens http://www.imdb.com
Table 2: Average RMSE for Movielens-100k and Movielens- 1M from compared models with different training data per- centages.
1.5
1.49
1.48
1.47
1.46
1.45
1.44
1.43
1.42
1.41
1.4
Biased SVD
SVD++ AutoSVD AutoSVD++
12 11 10 9 8 7 6
Biased SVD
SVD++
AutoSVD
Original AutoSVD++
Efficient AutoSVD++
Methods ML-100K
Methods
NMF
PMF
U-AutoRec
RBM-CF
Biased SVD
SVD++
AutoSVD AutoSVD++ 0.848
90%
50%
90%
50%
0.927 0.890 0.911 0.901 0.889 0.884 0.877 0.875
ML-1M
NMF 0.958
PMF 0.952 NNMF(3HL) 0.907 mSDA-CF *
Biased SVD 0.911
SVD++ 0.913 AutoSVD 0.901 AutoSVD++ 0.904 0.926
0.997 0.977 * 0.931 0.936 0.938
0.915 0.883 0.874 0.854 0.876 0.855 0.864
(a) (b)
Figure 2: (a) Average RMSE Comparson on MovieTweetings Dataset (the lower the better). (b) Comparison of training time of one epoch on Movielens-100K (the lower the better).
AutoSVD++, which significantly speeds up the training process. We conduct a comprehensive set of experiments on three real-world datasets. The results show that our proposed models perform better than the compared recent works.
There are several extensions to our model that we are currently pursuing .
• First,wewillleveragetheabundantitemcontentinforma- tion such as textual, visual information and obtain richer feature representations through stacked Contractive Auto- encoders;
• Second, we can further improve the proposed model by incorporating temporal dynamics and social network in- formation.
REFERENCES
[1] Simon Dooms, Toon De Pessemier, and Luc Martens. 2013. MovieTweetings: a Movie Rating Dataset Collected From Twitter. In Workshop on Crowdsourcing and Human Computation for Recommender Systems, CrowdRec at RecSys 2013.
[2] Gintare Karolina Dziugaite and Daniel M Roy. 2015. Neural network matrix factorization. arXiv preprint arXiv:1511.06443 (2015).
[3] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for im- plicit feedback datasets. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. Ieee, 263–272.
[4] Alexandros Karatzoglou, Bala ́zs Hidasi, Domonkos Tikk, Oren Sar-Shalom, Hag- gai Roitman, and Bracha Shapira. 2016. RecSys’ 16 Workshop on Deep Learning for Recommender Systems (DLRS). In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 415–416.
[5] Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 426–434.
[6] Sheng Li, Jaya Kawale, and Yun Fu. 2015. Deep collaborative filtering via marginal- ized denoising auto-encoder. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 811–820.
[7] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2010. Recom- mender Systems Handbook. Springer-Verlag New York, Inc., NY, USA.
[8] Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. 2011. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th international conference on machine learning (ICML-11). 833–840.
[9] Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. 2007. Restricted Boltz- mann machines for collaborative filtering. In Proceedings of the 24th international conference on Machine learning. ACM, 791–798.
[10] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web. ACM, 111–112.
[11] Chong Wang and David M Blei. 2011. Collaborative topic modeling for recom- mending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 448–456.
[12] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1235–1244.
[13] Diyi Yang, Tianqi Chen, Weinan Zhang, Qiuxia Lu, and Yong Yu. 2012. Local implicit feedback mining for music recommendation. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 91–98.
90%
50%
Training Size
5
10 20 30 40 50 60 70 80 90
training size (%)
0.925
• RBM-CF [9], RBM-CF is a generative, probabilistic col- laborative filtering model based on restricted Boltzmann machines.
• NNMF(3HL)[2],thismodelcombinesathree-layerfeed- forward neural network with the traditional matrix factor- ization.
• mSDA-CF[6],mSDA-CFisamodelthatcombinesPMF with marginalized denoising stacked auto-encoders.
• U-AutoRec[10],U-AutoRecisnovelCFmodelbasedon the autoencoder paradigm. Same as [10], we set the number of hidden units to 500.
We use the following hyper-parameter configuration for AutoSVD in this experiment, γ1 = γ2 = 0.01, λ1 = λ2 = 0.1, β = 0.1 . For AutoSVD++, we set γ1 = γ2 = 0.007, λ1 = 0.005, λ2 = 0.015, and β = 0.1. For all the comprison models, we set the dimension of latent factors k = 10 if applicable. We execute each experiment for five times, and take the average RMSE as the result.
According to the evaluation results in Table 2 and Figure 2(a), our proposed model AutoSVD and AutoSVD++ consistently achieve better performance than the baseline and compared recent meth- ods. On the ML-100K dataset, AutoSVD performs slightly better than AutoSVD++, while on the other two datasets, AutoSVD++ outperforms other approaches.
4.2.2 Scalability. Figure 2(b) shows CPU time comparison in log scale. Compared with traditional SVD++ and Original AutoSVD++, our efficient training algorithm achieves a significant reduction in time complexity. Generally, the optimized AutoSVD++ performs R ̄ times better than original AutoSVD++, where R ̄ denotes the average number of items rated by users[13]. Meanwhile, compared with biased SVD model, the incorporated items Cae(Ci ) and offset εi does not drag down the training efficiency. This result shows our proposed models are easy to be scaled up over larger datasets without harming the performance and computational cost.
5 CONCLUSIONS AND FUTURE WORK
In this paper, we present two efficient hybrid CF models, namely AutoSVD and AutoSVD++. They are able to learn item content representations through CAE, and AutoSVD++ further incorporates the implicit feedback. We devise an efficient algorithm for training
log(t[ms])
RMSE