Heterogeneous Information Network Embedding for Recommendation
Chuan Shi, Member, IEEE, Binbin Hu, Wayne Xin Zhao Member, IEEE and Philip S. Yu, Fellow, IEEE
Abstract—Due to the flexibility in modelling data heterogeneity, heterogeneous information network (HIN) has been adopted to characterize complex and heterogeneous auxiliary data in recommender systems, called HIN based recommendation. It is challenging to develop effective methods for HIN based recommendation in both extraction and exploitation of the information from HINs. Most of HIN based recommendation methods rely on path based similarity, which cannot fully mine latent structure features of users and items. In this paper, we propose a novel heterogeneous network embedding based approach for HIN based recommendation, called HERec. To embed HINs, we design a meta-path based random walk strategy to generate meaningful node sequences for network embedding. The learned node embeddings are first transformed by a set of fusion functions, and subsequently integrated into an extended matrix factorization (MF) model. The extended MF model together with fusion functions are jointly optimized for the rating prediction task. Extensive experiments on three real-world datasets demonstrate the effectiveness of the HERec model. Moreover, we show the capability of the HERec model for the cold-start problem, and reveal that the transformed embedding information from HINs can improve the recommendation performance.
Index Terms—Heterogeneous information network, Network embedding, Matrix factorization, Recommender system. 3
1
1 INTRODUCTION
IN recent years, recommender systems, which help users dis- cover items of interest from a large resource collection, have been playing an increasingly important role in various online ser- vices [1], [2]. Traditional recommendation methods (e.g., matrix factorization) mainly aim to learn an effective prediction function for characterizing user-item interaction records (e.g., user-item rat- ing matrix). With the rapid development of web services, various kinds of auxiliary data (a.k.a., side information) become available in recommender systems. Although auxiliary data is likely to contain useful information for recommendation [3], it is difficult to model and utilize these heterogeneous and complex information in recommender systems. Furthermore, it is more challenging to develop a relatively general approach to model these varying data in different systems or platforms.
As a promising direction, heterogeneous information network (HIN), consisting of multiple types of nodes and links, has been proposed as a powerful information modeling method [4]–[6]. Due to its flexility in modeling data heterogeneity, HIN has been adopted in recommender systems to characterize rich auxiliary data. In Fig. 1(a), we present an example for movie recommen- dation characterized by HINs. We can see that the HIN contains multiple types of entities connected by different types of relations. Under the HIN based representation, the recommendation problem can be considered as a similarity search task over the HIN [4]. Such a recommendation setting is called as HIN based recom- mendation [7]. HIN based recommendation has received much attention in the literature [7]–[12]. The basic idea of most existing HIN based recommendation methods is to leverage path based semantic relatedness between users and items over HINs, e.g., meta-path based similarities, for recommendation.
Although HIN based methods have achieved performance improvement to some extent, there are two major problems for these methods using meta-path based similarities. First, meta-path based similarities rely on explicit path reachability, and may not
be reliable to be used for recommendation when path connections are sparse or noisy. It is likely that some links in HINs are accidentally formed which do not convey meaningful semantics. Second, meta-path based similarities mainly characterize semantic relations defined over HINs, and may not be directly applicable to recommender systems. It is likely that the derived path based similarities have no explicit impact on the recommendation per- formance in some cases. Existing methods mainly learn a linear weighting mechanism to combine the path based similarities [11] or latent factors [7], which cannot learn the complicated map- ping mechanism of HIN information for recommendation. The two problems essentially reflect two fundamental issues for HIN based recommendation, namely effective information extraction and exploitation based on HINs for recommendation.
For the first issue, it is challenging to develop a way to effectively extract and represent useful information for HINs due to data heterogeneity. Unlike previous studies using meta- path based similarities [4], [7], our idea is to learn effective heterogeneous network representations for summarizing important structural characteristics and properties of HINs. Following [13], [14], we characterize nodes from HINs with low-dimensional vectors, i.e., embeddings. Instead of relying on explicit path con- nection, we would like to encode useful information from HINs with latent vectors. Compared with meta-path based similarity, the learned embeddings are in a more compact form that is easy to use and integrate. Also, the network embedding approach itself is more resistant to sparse and noisy data. However, most existing network embedding methods focus on homogeneous networks only consisting of a single type of nodes and links, and cannot directly deal with heterogeneous networks consisting of multiple types of nodes and links. Hence, we propose a new heteroge- neous network embedding method. Considering heterogeneous characteristics and rich semantics reflected by meta-paths, the proposed method first uses a random walk strategy guided by
arXiv:1711.10730v1 [cs.SI] 29 Nov 2017
2
e(1)
e(V) e(l)
(b1) Meta-path based (b2) Transform and fuse random walk
+
(a) An Example of HIN (b) HIN Embedding (c) Recomendation
Fig. 1. The schematic illustration of the proposed HERec approach.
meta-paths to generate node sequences. For each meta-path, we learn a unique embedding representation for a node by maximiz- ing its co-occurrence probability with neighboring nodes in the sequences sampled according to the given meta-path. We fuse the multiple embeddings w.r.t. different meta-paths as the output of HIN embedding.
After obtaining the embeddings from HINs, we study how to integrate and utilize such information in recommender systems. We don’t assume the learned embeddings are naturally applicable in recommender systems. Instead, we propose and explore three fusion functions to integrate multiple embeddings of a node into a single representation for recommendation, including simple linear fusion, personalized linear fusion and non-linear fusion. These fu- sion functions provide flexible ways to transform HIN embeddings into useful information for recommendation. Specially, we empha- size that personalization and non-linearity are two key points to consider for information transformation in our setting. Finally, we extend the classic matrix factorization framework by incorporating the fused HIN embeddings. The prediction model and the fusion function are jointly optimized for the rating prediction task.
By integrating the above two parts together, this work presents a novel HIN embedding based recommendation approach, called HERec for short. HERec first extracts useful HIN based informa- tion using the proposed HIN embedding method, and then utilizes the extracted information for recommendation using the extended matrix factorization model. We present the overall illustration for the proposed approach in Fig. 1. Extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed approach. We also verify the ability of HERec to alleviate cold- start problem and examine the impact of meta-paths on perfor- mance. The key contributions of this paper can be summarized as follows:
• We propose a heterogeneous network embedding method guided by meta-paths to uncover the semantic and structural information of heterogeneous information networks. Moreover, we propose a general embedding fusion approach to integerate different embeddings based on different meta-paths into a single representation.
• We propose a novel heterogeneous information network embedding for recommendation model, called HERec for short. HERec can effectively integrate various kinds of embedding information in HIN to enhance the recommendation performance. In addition, we design a set of three flexible fusion functions to effectively transform HIN embeddings into useful information for recommendation.
• Extensive experiments on three real-world datasets demon- strate the effectiveness of the proposed model. Moreover, we show the capability of the proposed model for the cold-start prediction problem, and reveal that the transformed embedding information from HINs can improve the recommendation performance.
The remainder of this paper is organized as follows. Section 2 introduces the related works. Section 3 describes notations used in the paper and presents some preliminary knowledge. Then, We propose the heterogeneous network embedding method and the HERec model in Section 4. Experiments and detailed analysis are reported in Section 5. Finally, we conclude the paper in Section 6.
2 RELATED WORK
In this section, we will review the related studies in three as- pects, namely recommender systems, heterogeneous information networks and network embedding.
In the literature of recommender systems, early works mainly adopt collaborative filtering (CF) methods to utilize historical interactions for recommendation [3]. Particularly, the matrix fac- torization approach [15], [16] has shown its effectiveness and efficiency in many applications, which factorizes user-item rating matrix into two low rank user-specific and item-specific matri- ces, and then utilizes the factorized matrices to make further predictions [2]. Since CF methods usually suffer from cold- start problem, many works [8], [17], [18] attempt to leverage additional information to improve recommendation performance. For example, Ma et al. [19] integrate social relations into matrix factorization in recommendation. Ling et al. [20] consider the in- formation of both ratings and reviews and propose a unified model to combine content based filtering with collaborative filtering for rating prediction task. Ye et al. [21] incorporate user preference, social influence and geographical influence in the recommendation and propose a unified POI recommendation framework. More recently, Sedhain et al. [22] explain drawbacks of three popular cold-start models [23]–[25] and further propose a learning based approach for the cold-start problem to leverage social data via randomised SVD. And many works begin to utilize deep models (e.g., convolutional neural network, auto encoder) to exploit text information [26], image information [27] and network structure information [28] for better recommendation. In addition, there are also some typical frameworks focusing on incorporating auxiliary information for recommendation. Chen et al. [29] propose a typical SVDFeature framework to efficiently solve the feature based matrix factorization. And Rendle [30] proposes factorization
…
…
Group
Director
Author
3
User
Movie
Actor
(a) Douban Movie
Type User
Book Year
Publishe r
(b) Douban Book
Compliment
User
Category
Business City
(c) Yelp
Fig. 2. Network schemas of heterogeneous information networks for the used three datasets. In our task, users and items are our major focus, denoted by large-sized circles, while the other attributes are denoted by small-sized circles.
machine, which is a generic approach to combine the generality of feature engineering.
As a newly emerging direction, heterogeneous information network [5] can naturally model complex objects and their rich re- lations in recommender systems, in which objects are of different types and links among objects represent different relations [31]. And several path based similarity measures [4], [32], [33] are proposed to evaluate the similarity of objects in heterogeneous information network. Therefore, some researchers have began to be aware of the importance of HIN based recommendation. Wang et al. [8] propose the OptRank method to alleviate the cold- start problem by utilizing heterogeneous information contained in social tagging system. Furthermore, the concept of meta-path is introduced into hybrid recommender systems [34]. Yu et al. [7] utilize meta-path based similarities as regularization terms in the matrix factorization framework. Yu et al. [9] take advantage of different types of entity relationships in heterogeneous information network and propose a personalized recommendation framework for implicit feedback dataset. Luo et al. [35] propose a col- laborative filtering based social recommendation method using heterogeneous relations. More recently, Shi et al. [10] propose the concept of weighted heterogeneous information network and design a meta-path based collaborative filtering model to flexibly integrate heterogeneous information for personalized recommen- dation. In [11], [12], [36], the similarities of users and items are both evaluated by path based similarity measures under different semantic meta-paths and a matrix factorization based on dual regularization framework is proposed for rating prediction. Most of HIN based methods rely on the path based similarity, which may not fully mine latent features of users and items on HINs for recommendation.
On the other hand, network embedding has shown its potential in structure feature extraction and has been successfully applied in many data mining tasks [37], [38], such as classification [39], clustering [40], [41] and recommendation [42]. Deepwalk [13] combines random walk and skip-gram to learn network repre- sentations. Furthermore, Grover and Leskovec [14] propose a more flexible network embedding framework based on a biased random walk procedure. In addition, LINE [43] and SDNE [44] characterize the second-order link proximity, as well as neighbor relations. Cao et al. [45] propose the GraRep model to capture higher-order graph proximity for network representations. Besides leaning network embedding from only the topology, there are also many works [46]–[48] leveraging node content information and
other available graph information for the robust representations. Unfortunately, most of network embedding methods focus on homogeneous networks, and thus they cannot directly be applied for heterogeneous networks. Recently, several works [49]–[53] at- tempt to analyze heterogeneous networks via embedding methods. Particularly, Chang et al. [49] design a deep embedding model to capture the complex interaction between the heterogeneous data in the network. Xu et al. [51] propose a EOE method to encode the intra-network and inter-network edges for the coupled heterogeneous network. Dong et al. [53] define the neighbor of nodes via meta-path and learn the heterogeneous embedding by skip-gram with negative sampling. Although these methods can learn network embeddings in various heterogeneous network, their representations of nodes and relations may not be optimum for recommendation.
To our knowledge, it is the first attempt which adopts the network embedding approach to extract useful information from heterogeneous information network and leverage such information for rating prediction. The proposed approach utilizes the flexibility of HIN for modeling complex heterogeneous context information, and meanwhile borrows the capability of network embedding for learning effective information representation. The final rat- ing prediction component further incorporates a transformation mechanism implemented by three flexible functions to utilize the learned information from network embedding.
3 PRELIMINARY
A heterogeneous information network is a special kind of infor- mation network, which either contains multiple types of objects or multiple types of links.
Definition 1. Heterogeneous information network [54]. A HIN
is denoted as G = {V, E} consisting of a object set V and a link set E. A HIN is also associated with an object type mapping function φ : V → A and a link type mapping function ψ : E → R. A and R denote the sets of predefined object and link types, where |A| + |R| > 2.
The complexity of heterogeneous information network drives us to provide the meta level (e.g., schema-level) description for understanding the object types and link types better in the network. Hence, the concept of network schema is proposed to describe the meta structure of a network.
Definition 2. Network schema [31], [55]. The network schema is
denoted as S = (A, R). It is a meta template for an information
network G = {V, E} with the object type mapping φ : V → A and the link type mapping ψ : E → R, which is a directed graph defined over object types A, with edges as relations from R.
Example 1. As shown in Fig. 1(a), we have represented the setting of movie recommender systems by HINs. We further present its corresponding network schema in Fig. 2(a), consisting of multiple types of objects, including User (U), Movie (M), Di- rector (D). There exist different types of links between objects to represent different relations. A user-user link indicates the friendship between two users, while a user-movie link indi- cates the rating relation. Similarly, we present the schematic network schemas for book and business recommender systems in Fig. 2(b) and Fig. 2(c) respectively.
In HINs, two objects can be connected via different semantic paths, which are called meta-paths.
Definition 3. Meta-path [4]. A meta-path ρ is defined on a network schema S = (A,R) and is denoted as a path in
R1 R2 Rl
the form of A −→ A −→ ··· −→ A (abbreviated
Notation
G V E S A R U I ru,i ev Nu ρ
P
e(U),e(I) ui
d
D xu , yi
γ(U), γ(I) ui α, β M(l)
TABLE 1 Notations and explanations.
Explanation
heterogeneous information network object set
link set
network schema
object type set
link type set
user set
item set
predicted rating user u gives to item i low-dimensional representation of node v neighborhood of node u
a meta-path
meta-path set
final representations of user u, item i dimension of HIN embeddings dimension of latent factors latent factors of user u, item i
latent factors for pairing HIN embedding of user u, item i parameters for integrating HIN embeddings transformation matrix w.r.t the l-th mete-path
bias vector w.r.t the l-th mete-path preference weight of user u over the l-th meta-path
parameters of fusion functions for users, items regularization parameter
learning rate
4
12 l+1 u
as A1 A2 · · · Al+1 ), which describes a composite relation R = R1 ◦R2 ◦···◦Rl between object A1 and Al+1, where ◦ denotes the composition operator on relations.
Example 2. Taking Fig. 2(a) as an example, two objects can be connected via multiple meta-paths, e.g., “User – User” (UU) and “User – Movie – User” (UMU). Different meta- paths usually convey different semantics. For example, the U U path indicates friendship between two users, while the UMU path indicates the co-watch relation between two users, i.e., they have watched the same movies. As will be seen later, the detailed meta-paths used in this work is summarized in Table 3.
Recently, HIN has become a mainstream approach to model various complex interaction systems [5]. Specially, it has been adopted in recommender systems for characterizing complex and heterogenous recommendation settings.
Definition 4. HIN based recommendation. In a recommender system, various kinds of information can be modeled by a HIN G = {V, E}. On recommendation-oriented HINs, two kinds of entities (i.e., users and items) together with the relations between them (i.e., rating relation) are our focus. Let U ⊂ V and I ⊂ V denote the sets of users and items respectively, a triplet ⟨u, i, ru,i ⟩ denotes a record that a user u assigns a rating of ru,i to an item i, and R = {⟨u, i, ru,i ⟩} denotes the set of rating records. We have U ⊂ V, I ⊂ V and R ⊂ E. Given the HIN G = {V, E}, the goal is to predict the rating score ru,i′ of u ∈ U to a non-rated item i′ ∈ I.
Several efforts have been made for HIN based recommenda- tion. Most of these works mainly leverage meta-path based simi- larities to enhance the recommendation performance [7], [9]–[11]. Next, we will present a new heterogeneous network embedding based approach to this task, which is able to effectively exploit the information reflected in HINs. The notations we will use throughout the article are summarized in Table 1.
Θ(U), Θ(I) λ
η
b(l) w(l)
4 THE PROPOSED APPROACH
In this section, we present a Heterogeneous network Embedding based approach for Recemendation, called HERec. For addressing the two issues introduced in Section 1, the proposed HERec approach consists of two major components. First, we propose a new heterogeneous network embedding method to learn the user/item embeddings from HINs. Then, we extend the classic matrix factorization framework by incorporating the learned em- beddings using a flexible set of fusion functions. We present an overall schematic illustration of the proposed approach in Fig. 1. After the construction of the HINs (Fig. 1(a)), two major steps are presented, namely HIN embedding (Fig. 1(b)) and recommenda- tion (Fig. 1(c)). Next, we will present detailed illustration of the proposed approach.
4.1 Heterogeneous Network Embedding
Inspired by the recent progress on network embedding [13], [14], we adopt the representation learning method to extract and represent useful information of HINs for recommendation. Given a HIN G = {V, E}, our goal is to learn a low-dimensional repre- sentation ev ∈ Rd (a.k.a., embedding) for each node v ∈ V. The learned embeddings are expected to highly summarize informative characteristics, which are likely to be useful in recommender systems represented on HINs. Compared with meta-path based similarity [4], [10], it is much easier to use and integrate the learned representations in subsequent procedures.
However, most of the existing network embedding methods mainly focus on homogeneous networks, which are not able to effectively model heterogeneous networks. For instance, the pioneering study deepwalk [13] uses random walk to generate node sequences, which cannot discriminate nodes and edges with different types. Hence, it requires a more principled way to traverse the HINs and generate meaningful node sequences.
User Movie Director
u4 u5
UMU
u1 m1 u2 m2 u3 m2 u4
… Filtering
… u1 u2 u3 u4
… u1 u3
m1 m2 m3
constructed using meta-paths with heterogeneous types, the final representations are learned using homogeneous neighborhood. We embed the nodes with the same type in the same space, which relaxes the challenging goal of representing all the heterogeneous objects in a unified space. Second, given a fixed-length window, a node is able to utilize more homogeneous neighbors that are more likely to be relevant than others with different types.
Example 4. As shown in Fig. 3, in order to learn effective representations for users and items, we only consider the meta-paths in which the starting type is user type or item type. In this way, we can derive some meta-paths, such as UMU, UMDMU and MUM. Take the meta-path of UMU as an instance. We can generate a sampled sequence “u1 →m1 →u2 →m2 →u3 →m2 →u4”according to Eq. 1. Once a sequence has been constructed, we further remove the nodes with a different type compared with the starting node. In this way, we finally obtain a homogeneous node sequence “u1 → u2 → u3 → u4”.
The connections between homogeneous nodes are essentially constructed via the heterogeneous neighborhood nodes. After this step, our next focus will be how to learn effective representations for homogeneous sequences.
4.1.3 Optimization Objective
Given a meta-path, we can construct the neighborhood Nu for node u based on co-occurrence in a fixed-length window. Follow- ing node2vec [14], we can learn the representations of nodes to optimize the following objective:
max logPr(Nu|f(u)), (2) f u∈V
where f : V → Rd is a function (aiming to learn) mapping each node to d-dimensional feature space, and Nu ⊂ V represents the neighborhood of node u, w.r.t. a specific meta-path. We can learn the embedding mapping function f(·) by applying stochastic gra- dient descent (SGD) to optimize this objective. A major difference between previous methods and ours lies in the construction of Nu. Our method selects homogeneous neighbors using meta-path based random walks. The whole algorithm framework is shown in Algorithm 1.
4.1.4 Embedding Fusion
Heterogeneous network embedding provides a general way to extract useful information from HINs. For our model, given a node
(l) |P|
v ∈ V, we can obtain a set of representations {ev }l=1, where P
(l)
denotes the set of meta-paths, and ev denotes the representation
of v w.r.t. the l-th meta-path. It requires a principled fusion way to transform node embeddings into a more suitable form that is useful to improve recommendation performance. Existing studies usually adopt a linear weighting mechanism to combine the information mined from HINs (e.g., meta-path based similarities), which may not be capable of deriving effective information representations for recommendation. Hence, we propose to use a general function g(·), which aims to fuse the learned node embeddings for users and items:
e(U) ← g({e(l)}), (3) uu
e(I) ← g({e(l)}), ii
5
u1 u2
m1 d1 u3 m2 d2
UMDMU
…
u1 m1 d2 m2 u3
d3
m3
MDM … m1 d2 m2 d3 m3
…
Fig. 3. An illustrative example of the proposed meta-path based random walk. We first perform random walks guided by some selected meta- paths, and then filter the node sequences not with the user type or item type.
4.1.1 Meta-path based Random Walk
To generate meaningful node sequences, the key is to design an
effective walking strategy that is able to capture the complex
semantics reflected in HINs. In the literature of HINs, meta-path
is an important concept to characterize the semantic patterns for
HINs [4]. Hence, we propose to use the meta-path based random
walk method to generate node sequences. Giving a heterogeneous
R1 Rt network G = {V,E} and a meta-path ρ : A1 −→ ···At −→
Rl
At+1 ··· −→ Al+1, the walk path is generated according to the
following distribution:
P(nt+1 = x|nt = v,ρ) 1
(1)
=
|NAt+1 (v)| 0,
,
(v, x) ∈ E and φ(x) = A
otherwise,
t+1
;
where nt is the t-th node in the walk, v has the type of At, and N(At+1)(v) is the first-order neighbor set for node v with the type of At+1. A walk will follow the pattern of a meta-path repetitively until it reaches the pre-defined length.
Example 3. We still take Fig. 1(a) as an example, which repre- sents the heterogeneous information network of movie recom- mender systems. Given a meta-path U M U , we can generate two sample walks (i.e., node sequences) by starting with the user node of Tom: (1) TomUser → The TerminatorMovie → MaryUser, and (2) TomUser → AvaterMovie → BobUser → The TerminatorM ovie → MaryU ser . Similarly, given the meta- path U M DM U , we can also generate another node sequence: TomUser → The TerminatorMovie → CameronDirector → AvaterM ovie → MaryU ser . It is intuitive to see that these meta-paths can lead to meaningful node sequences correspond- ing to different semantic relations.
4.1.2 Type Constraint and Filtering
Since our goal is to improve the recommendation performance, the main focus is to learn effective representations for users and items, while objects with other types are of less interest in our task. Hence, we only select meta-paths starting with user type or item type. Once a node sequence has been generated using the above method, it is likely to contain nodes with different types. We further remove the nodes with a type different from the starting type. In this way, the final sequence will only consist of nodes with the starting type. Applying type filtering to node sequences has two benefits. First, although node sequences are
Algorithm 1 HIN embedding algorithm for a single meta-path. Input: the heterogeneous information network G = {V, E}; the
(U) (I)
HIN embeddings eu and ei respectively, and α and β are the
tuning parameters to integrate the three terms. For Eq. 5, we need to note two points. First, e(U) and e(I) are the output of function
1: 2: 3: 4: 5: 6: 7: 8: 9:
10: 11: 12: 13: 14: 15: 16: 17:
path, denoted by e
Initialize e by standard normal distribution; paths = []; foreachv∈Vandφ(v)==At do
for i = 1 to r do path = [];
while wl > 0 do
walk to node x according to Eq. 1; if φ(x) == At then
append node x into path;
wl ← wl − 1; end if
end while
Add path to paths; end for
end for
e=SGD(paths,d,ns); return e.
•
Simple linear fusion. We assume each user has the same preference for each meta-path, and therefore, we assign each meta-path with a unified weight (i.e., average value) for each user. Moreover, we linearly transform embeddings to target space.
given meta-path ρ; the target node type A ; the dimension of tui
embedding d; the walk length wl; the neighborhood size ns;
the number of walks per node r.
Output: The embedding of target node type w.r.t the single meta-
g(·) in Eq. 3. We assume that the derived embeddings after transformation by function g(·) are applicable in MF. Second, we
(U) (I)
don’t directly pair eu with ei . Recall the proposed embedding
method indeed characterizes the relatedness between objects with
(U) (I) the same type. We incorporate new latent factors γu and γi
(U) (I)
to relax the assumption that eu and ei have to be in the same
space, which increases the flexility to the prediction model.
4.2.2 Setting the Fusion Function
Previously, we assume the function g(·) has been given in a general form. Now, we study how to set the fusion function, which transforms HIN embeddings into a form that is useful in recommender systems. We only discuss the function for fusing user embeddings, and it is similar to fuse item embeddings. We propose to use three fusion functions to integrate embeddings:
6
(U) (I) where eu and ei
1 |P|
g({e(l)}) = (M(l)e(l) + b(l)), (6)
are the final representations for a user u and an item i respectively, called HIN embeddings. Since users and items are our focus, we only learn the embeddings for users and items. At the current stage, we do not specify the form of function g(·). Instead, we believe that a good fusion function should be learned according to the specific task. Hence, we leave the formulation and optimization of the fusion function in our
recommendation model.
4.2 Integrating Matrix Factorization with Fused HIN Embedding for Recommendation
Previously, we have studied how to extract and represent useful in- formation from HINs for recommendation. With HIN embedding,
(U)
we can obtain user embeddings {eu }u∈U and item embeddings
u |P| u l=1
w(l)(M(l)e(l) + b(l)), (7) (I) uuu
{ei }i∈I, which are further specified by a function g(·) that is to learn. Now we study how to utilize the learned embeddings for recommendation.
4.2.1 Rating Predictor
We build our rating predictor based on the classic matrix factoriza- tion (MF) model [56], which factorizes the user-item rating matrix into user-specific and item-specific matrices. In MF, the rating of a user u on an item i is simply defined as follows:
r = x⊤ · y , (4) u,i u i
where xu ∈ RD and yi ∈ RD denote the latent factors corresponding to user u and item i. Since we have also obtained the representations for user u and item i, we further incorporate them into the rating predictor as below
(l) where wu
l=1
is the preference weight of user u over the l-th
r =x⊤ ·y +α·e(U)⊤ ·γ(I) +β·γ(U)⊤ ·e(I), u,i u i u i u i
(5)
4.2.3
where σ(·) is a non-linear function, i.e., sigmoid function in our work. Although we only use two non-linear trans- formations, it is flexible to extend to multiple non-linear layers, e.g., Multi-Layer Perceptrons.
Model Learning
(U) (I) (U) (I) where eu and ei are the fused embeddings, γu and γi
are user-specific and item-specific latent factors to pair with the
We blend the fusion function into matrix factorization framework for learning the parameters of the proposed model. The objective can be formulated as follows:
•
where P is the set of considered meta-paths, M(l) ∈ RD×d and b(l) ∈ RD are the transformation matrix and bias vector w.r.t. the l-th meta-path.
Personalized linear fusion. The simple linear fusion cannot model users’ personalized preference over the meta-paths. So we further assign each user with a weight vector on meta-paths, representing user’s personalized preference for each meta-path. It is more reasonable that each user has his/her personal interest preference in many real appli- cations.
•
meta-path.
Personalized non-linear fusion. Linear fusion has limited expressive power in modeling complex data relations. Hence, we use non-linear function to enhance the fusion ability.
g({e(l)}) =
|P|
|P|
g({e(l)}) = σ w(l)σM(l)e(l) + b(l)
uuu l=1
, (8)
Θ(U) u,l
←
(U) Θ(U) − η · (−α(ru,i − ru,i)γ(I) ∂eu
+ λΘΘ(U)), u,l
(I) i set Pl according to Algorithm 1;
6: end for
7: Initialize x, y, γ(U), γ(I), Θ(U), Θ(I) by standard
distribution;
8: while not convergence do
9: Randomly select a triple ⟨u, i, ru,i ⟩ ∈ R;
10: Update x , y by typical MF; ui
11: forl=1to|P(U)|do
u
+ ∥γ(U)∥ + ∥γ(I)∥ + ∥Θ(U)∥ + ∥Θ(I)∥ ), (9)
⟨u,i,ru,i ⟩∈R u2i222
factors to pair HIN embedding of users and items, γ(U) and (I)
where r is the predicted rating using Eq. 5 by the proposed u,i
model, λ is the regularization parameter, and Θ(U) and Θ(I) are the parameters of the function g(·) for users and items respec- tively. We adopt SGD to efficiently optimize the final objective. The update of original latent factors {xu} and {yi} are the same as that of standard MF in Eq. 4. The parameters of the proposed model will be updated as follows:
γ ; the parameters of the fusion function for users and items, Θ(U) and Θ(I)
γ(U) ← γ(U) − η · (−β(ru,i − ru,i)e(I) + λγγ(U)), uuiu
Θ(I) i,l
←
Θ(I) − η · (−β(ru,i − ru,i)γ(U) i,l u
∂e(I) i
(I) ∂ Θi,l
+ λΘΘ(I)), i,l
(10) (11)
normal
1: forl=1to|P(U)|do
2: Obtainusers’embeddings{eu }basedonmeta-pathPl
according to Algorithm 1; 3: end for
4: forl=1to|P(I)|do
5: Obtain items’ embeddings {e(l)} based on the meta-path
(l) (U)
7
£= (r −r)2+λ(∥x∥+∥y∥ u,i u,i u2 i2
Algorithm 2 The overall learning algorithm of HERec.
Input: the rating matrix R; the learning rate η; the adjustable parameters α,β; the regularization parameter λ; the meta-
path sets for users and items, P(U) and P(I).
Output: the latent factors for users and items, x and y; the latent
u,l i ∂Θ(U) u,l
(12) γi ← γi −η·(−α(ru,i−ru,i)eu +λγγi ), (13)
where η is the learning rate, λΘ is the regularization for param- eters Θ(U ) and Θ(I ) , and λγ is the regularization for parameters
∂e(U)
u by Eq. 14;
γ
(U) (I)
and γ . In our work, we utilize the sigmod function
(I)
(I) (U)
(I)
∂Θ(U) u,l
for non-linear transformation, and we can take advantage of the
properties of sigmod function for ease of derivative calculation. It
is worth noting that the symbol Θ denotes all the parameters in the
fusion function, and the calculation of ∂ei will be different for ∂ Θi,l
different parameters in Θ. Next, we present the detailed derivation
∂Θ(I) i,l
12: Calculate
13: Update Θ(U) by Eq. 10;
15: Update γu by Eq. 11; 16: forl=1to|P(I)|do
∂e(I)
17: Calculate i by Eq. 14;
18: Update Θ(I) by Eq. 12; i,l
19: end for
20: Update γ (I ) by Eq. 13;
21: end while
22: returnx,y,γ(U),γ(I),Θ(U),Θ(I).
u,l 14: endfor (U)
of
∂ei for personalized non-linear fusion function ∂ Θi,l
∂ei = ∂Θi,l
w(l)σ(Z )σ(Z )(1 − σ(Z ))(1 − σ(Z ))e(l), isfsfi
i
(14)
Θ = M;
Θ = b;
the above way for both users and items. We omit the derivations for linear fusion functions, since it is relatively straightforward. The whole algorithm framework is shown in Algorithm 2. In lines 1-6, we perform HIN embedding to obtain the representations of users and items. And in lines 8-19, we adopt the SGD algorithm to optimize the parameters in the fusion function and rating function.
4.2.4 Complexity Analysis
HERec contains two major parts: (1) HIN embedding. The com- plexity of deepwalk is O(d · |V |), where d is the embedding dimension and |V| is the number of nodes in the network. Therefore it takes O(d · |U|) and O(d · |I|) to learn users’ and items’ embeddings according to a single meta-path, respectively. And the total complexity of HIN embedding is O(|P|·d·(|U|+|I|))
since the number of selected meta-paths is |P|. It is worth noting
that HIN embedding can be easily trained in parallel and we
will implement it using a multi-thread mode in order to improve
the efficiency of model. (2) Matrix factorization. For each triplet
(U) (I)
5 EXPERIMENTS
In this section, we will demonstrate the effectiveness of HERec by performing experiments on three real datasets compared to the state-of-the-art recommendation methods.
5.1 Evaluation Datasets
We adopt three widely used datasets from different domains, con- sisting of Douban Movie dataset 1 from movie domain, Douban
1 http://movie.douban.com
w(l)σ(Zs)σ(Zf )(1 − σ(Zs))(1 − σ(Zf )), i
(U) (I)
⟨u,i,r ⟩, updating x , y , γu , γ takes O(D) time, where
σ(Zs)σ(Zf )(1 − σ(Zs)),
where Zs = |P| w(l)σM(l)e(l) + b(l)
Θ = w, Zf =
u,i ui i
D is the number of latent factors. And updating Θu , Θi takes O(|P| · D · d) time to learn the transformation matrices M for all meta-paths. In the proposed approach, |P| is generally small, and d and D are at most several hundreds, which makes the proposed method efficient in large datasets. Specially, SGD has very good practice performance, and we have found that it has a fast convergence rate on our datasets.
and
M(l)e(l) + b(l). The derivations of ∂ei can be calculated in
l=1i i
i ∂Θi,l
TABLE 2
Statistics of the three datasets.
8
Dataset (Density)
Relations (A-B)
Number of A
Number of B
Number of (A-B)
Ave. degrees of A
Ave. degrees of B
Douban Movie (0.63%)
User-Movie
13,367
12,677
1,068,278
79.9
84.3
User-User
2,440
2,294
4,085
1.7
1.8
User-Group
13,337
2,753
570,047
42.7
207.1
Movie-Director
10,179
2,449
11,276
1.1
4.6
Movie-Actor
11,718
6,311
33,587
2.9
5.3
Movie-Type
12,678
38
27,668
2.2
728.1
Douban Book (0.27%)
User-Book
13,024
22,347
792,026
60.8
35.4
User-User
12,748
12,748
169,150
13.3
13.3
Book-Author
21,907
10,805
21,905
1.0
2.0
Book-Publisher
21,773
1,815
21,773
1.0
11.9
Book-Year
21,192
64
21,192
1.0
331.1
Yelp (0.08%)
User-Business
16,239
14,284
198,397
12.2
13.9
User-User
10,580
10,580
158,590
15.0
15.0
User-Compliment
14,411
11
76,875
5.3
6988.6
Business-City
14,267
47
14,267
1.0
303.6
Business-Category
14,180
511
40,009
2.8
78.3
TABLE 3
The selected meta-paths for three datasets in our work.
Book dataset 2 from book domain, and Yelp dataset 3 from business domain. Douban Movie dataset includes 13,367 users and 12,677 movies with 1,068,278 movie ratings ranging from 1 to 5. Moreover, the dataset also includes social relations and attribute information of users and movies. As for the Douban Book dataset, it includes 13,024 users and 22,347 books with 792,026 ratings ranging from 1 to 5. Yelp dataset records user ratings on local businesses and contains social relations and attribute information of businesses, which includes 16,239 users and 14,282 local businesses with 198,397 ratings raging from 1 to 5. The detailed descriptions of the three datasets are shown in Table 2, and the network schemas of the three datasets are plotted in Fig. 2. Besides the domain variation, these three datasets also have different rating sparsity degrees: the Yelp dataset is very sparse, while the Douban Movie dataset is much denser.
5.2 Metrics
We use the widely used mean absolute error (MAE) and root mean square error (RMSE) to measure the quality of recommendation of different models. The metrics MAE and RMSE are defined as follows:
MAE=1|r−r|,(15) |Dtest | i,j i,j
(r −r)2, (16) i,j i,j
where r is the actual rating that user i assigns to item j, r is i,j i,j
the predicted rating from a model, and Dtest denotes the test set of rating records. From the definition, we can see that a smaller MAE or RMSE value indicates a better performance.
5.3 Methods to Compare
We consider the following methods to compare:
• PMF [56]: It is the classic probabilistic matrix factorization
model by explicitly factorizing the rating matrix into two low- dimensional matrices.
• SoMF [19]: It incorporates social network information into the recommendation model. The social relations are characterized by a social regularization term, and integrated into the basic matrix factorization model.
• FMHIN : It is the context-aware factorization machine [57], which is able to utilize various kinds of auxiliary information. In our experiments, we extract heterogeneous information as context features and incorporate them into the factorization machine for rating prediction. We implement the model with the code from the authors 4.
• HeteMF [7]: It is a MF based recommendation method, which utilizes entity similarities calculated over HINs using meta- path based algorithms.
• SemRec [10]: It is a collaborative filtering method on weighted heterogeneous information network, which is con- structed by connecting users and items with the same ratings. It flexibly integrates heterogeneous information for recommendation by weighted meta-paths and a weight ensemble method. We implement the model with the code from the authors 5.
• DSR [12]: It is a MF based recommendation method with dual similarity regularization, which imposes the constraint on users and items with high and low similarities simultaneously.
•HERecdw: It is a variant of HERec, which adopts the homo- geneous network embedding method deepwalk [13] and ignores the heterogeneity of nodes in HINs. We use the code of deepwalk
6 provided by the authors .
4 http://www.libfm.org/
5 https://github.com/zzqsmall/SemRec 6 https://github.com/phanein/deepwalk
Dataset
Meta-paths
Douban Movie
UMU, UMDMU, UMAMU, UMTMU MUM, MAM, MDM, MTM
Douban Book
UBU, UBABU, UBPBU, UBYBU BUB, BPB, BYB
Yelp
UBU, UBCiBU, UBCaBU BUB, BCiB, BCaB
RMSE=
(i,j )∈Dtest 1
|Dtest| (i,j)∈Dtest
2 http://book.douban.com
3 http://www.yelp.com/dataset-challenge
TABLE 4
Results of effectiveness experiments on three datasets. A smaller MAE or RMSE value indicates a better performance. For ease of reading the results, we also report the improvement w.r.t. the PMF model for each other method. A larger improvement ratio indicates a better performance.
9
Dataset
Training
Metrics PMF
SoMF
FMHIN
HeteMF
SemRec
DSR HERecdw
HERecmp HERec
Douban Movie
80%
MAE 0.5741 Improve
0.5817 -1.32%
0.5696 +0.78%
0.5750 -0.16%
0.5695 +0.80%
0.5681 0.5703 +1.04% +0.66%
0.5515 0.5519 +3.93% +3.86%
RMSE 0.7641 Improve
0.7680 -0.07%
0.7248 +5.55%
0.7556 +1.53%
0.7399 +3.58%
0.7225 0.7446 +5.85% +2.97%
0.7121 0.7053 +7.20% +8.09%
60%
MAE 0.5867 Improve
0.5991 -2.11%
0.5769 +1.67%
0.5894 -0.46%
0.5738 +2.19%
0.5831 0.5838 +0.61% +0.49%
0.5611 0.5587 +4.36% +4.77%
RMSE 0.7891 Improve
0.7950 -0.75%
0.7842 +0.62%
0.7785 +1.34%
0.7551 +4.30%
0.7408 0.7670 +6.12% +2.80%
0.7264 0.7148 +7.94% +9.41%
40%
MAE 0.6078 Improve
0.6328 -4.11%
0.5871 +3.40%
0.6165 -1.43%
0.5945 +2.18%
0.6170 0.6073 -1.51% +0.08%
0.5747 0.5699 +5.44% +6.23%
RMSE 0.8321 Improve
0.8479 -1.89%
0.7563 +9.10%
0.8221 +1.20%
0.7836 +5.82%
0.7850 0.8057 +5.66% +3.17%
0.7429 0.7315 +10.71% +12.09%
20%
MAE 0.7247 Improve
0.6979 +3.69%
0.6080 +16.10%
0.6896 +4.84%
0.6392 +11.79%
0.6584 0.6699 +9.14% +7.56%
0.6063 0.5900 +16.33% +18.59%
RMSE 0.9440 Improve
0.9852 -4.36%
0.7878 +16.55%
0.9357 +0.88%
0.8599 +8.91%
0.8345 0.9076 +11.60% +3.86%
0.7877 0.7660 +16.56% +18.86%
Douban Book
80%
MAE 0.5774 Improve
0.5756 +0.31%
0.5716 +1.00%
0.5740 +0.59%
0.5675 +1.71%
0.5740 0.5875 +0.59% -1.75%
0.5591 0.5502 +3.17% +4.71%
RMSE 0.7414 Improve
0.7302 +1.55%
0.7199 +2.94%
0.7360 +0.77%
0.7283 +1.81%
0.7206 0.7450 +2.84% -0.44%
0.7081 0.6811 +4.53% +8.17%
60%
MAE 0.6065 Improve
0.5903 +2.67%
0.5812 +4.17%
0.5823 +3.99%
0.5833 +3.83%
0.6020 0.6203 +0.74% -2.28%
0.5666 0.5600 +6.58% +7.67%
RMSE 0.7908 Improve
0.7518 +4.93%
0.7319 +7.45%
0.7466 +5.59%
0.7505 +5.10%
0.7552 0.7905 +4.50% +0.04%
0.7318 0.7123 +7.46% +9.93%
40%
MAE 0.6800 Improve
0.6161 +9.40%
0.6028 +11.35%
0.5982 +12.03%
0.6025 +11.40%
0.6271 0.6976 +7.78% -2.59%
0.5954 0.5774 +12.44% +15.09%
RMSE 0.9203 Improve
0.7936 +13.77%
0.7617 +17.23%
0.7779 +15.47%
0.7751 +15.78%
0.7730 0.9022 +16.01% +1.97%
0.7703 0.7400 +16.30% +19.59%
20%
MAE 1.0344 Improve
0.6327 +38.83%
0.6396 +38.17%
0.6311 +38.99%
0.6481 +37.35%
0.6300 1.0166 +39.10% +1.72%
0.6785 0.6450 +34.41% +37.65%
RMSE 1.4414 Improve
0.8236 +42.86%
0.8188 +43.19%
0.8304 +42.39%
0.8350 +42.07%
0.8200 1.3205 +43.11% +8.39%
0.8869 0.8581 +38.47% +40.47%
Yelp
90%
MAE 1.0412 Improve
1.0095 +3.04%
0.9013 +13.44%
0.9487 +8.88%
0.9043 +13.15%
0.9054 1.0388 +13.04% +0.23%
0.8822 0.8395 +15.27% +19.37%
RMSE 1.4268 Improve
1.3392 +6.14%
1.1417 +19.98%
1.2549 +12.05%
1.1637 +18.44%
1.1186 1.3581 +21.60% +4.81%
1.1309 1.0907 +20.74% +23.56%
80%
MAE 1.0791 Improve
1.0373 +3.87%
0.9038 +16.25%
0.9654 +10.54%
0.9176 +14.97%
0.9098 1.0750 +15.69% +0.38%
0.8953 0.8475 +17.03% +21.46%
RMSE 1.4816 Improve
1.3782 +6.98%
1.1497 +22.40%
1.2799 +13.61%
1.1771 +20.55%
1.1208 1.4075 +24.35% +5.00%
1.1516 1.1117 +22.27% +24.97%
70%
MAE 1.1170 Improve
1.0694 +4.26%
0.9108 +18.46%
0.9975 +10.70%
0.9407 +15.78%
0.9429 1.1196 +15.59% -0.23%
0.9043 0.8580 +19.04% +23.19%
RMSE 1.5387 Improve
1.4201 +7.71%
1.1651 +24.28%
1.3229 +14.02%
1.2108 +21.31%
1.1582 1.4632 +24.73% +4.91%
1.1639 1.1256 +24.36% +26.85%
60%
MAE 1.1778 Improve
1.1135 +5.46%
0.9435 +19.89%
1.0368 +11.97%
0.9637 +18.18%
1.0043 1.1691 +14.73% +0.74%
0.9257 0.8759 +21.40% +25.63%
RMSE 1.6167 Improve
1.4748 +8.78%
1.2039 +25.53%
1.3713 +15.18%
1.2380 +23.42%
1.2257 1.5182 +24.19% +6.09%
1.1887 1.1488 +26.47% +28.94%
•HERecmp: It can be view as a variant of HERec, which combines the heterogeneous network embedding of metap- ath2vec++ [53] and still preserves the superiority of HINs. We use the code of metapath2vec++ provided by the authors 7.
• HERec: It is the proposed recommendation model based on heterogeneous information network embedding. In the effective- ness experiments, we utilize the personalized non-linear fusion function, which is formulated as Eq. 8. And the performance of different fusion function will be reported in the later section.
The selected baselines have a comprehensive coverage of existing rating prediction methods. PMF and SoMF are classic MF based rating prediction methods; FMHIN , HeteMF, SemRec and DSR are HIN based recommendation methods using meta- path based similarities, which represent state-of-the-art HIN based methods. The proposed approach contains an HIN embedding
7 https://ericdongyx.github.io/metapath2vec/m2v.html
component, which can be replaced by other network embed- ding methods. Hence, two variants HERecdw and HERecmp are adopted as baselines.
Among these methods, HIN based methods need to specify the used meta-paths. We present the used meta-paths in Table 3. Fol- lowing [4], we only select short meta-paths of at most four steps, since long meta-paths are likely to introduce noisy semantics. We set parameters for HERec as follows: the embedding dimension number d = 64 and the tuning coefficients α = β = 1.0. Following [13], the path length for random walk is set to 40. For HERecdw and HERecmp, we set the embedding dimension number d = 64 for fairness. Following [11], [12], we set the number of latent factors for all the MF based methods to 10. Other baseline parameters either adopt the original optimal setting or are optimized by using a validation set of 10% training data.
5.4 Effectiveness Experiments
For each dataset, we split the entire rating records into a training set and a test set. For Double Movie and Book datasets, we set four training ratios as in {80%, 60%, 40%, 20%}; while for Yelp dataset, following [10], we set four larger training ratios as in {90%, 80%, 70%, 60%}, since it is too sparse. For each ratio, we randomly generate ten evaluation sets. We average the results as the final performance. The results are shown in Table 4. The major findings from the experimental results are summarized as follows:
(1) Among these baselines, HIN based methods (HeteMF, SemRec, FMH I N and DSR) perform better than traditional MF based methods (PMF and SoMF), which indicates the usefulness of the heterogeneous information. It is worthwhile to note that the FMH I N model works very well among the three HIN based baselines. An intuitive explanation is that in our datasets, most of the original features are the attribute information of users or items, which are likely to contain useful evidence to improve the recommendation performance.
(2) The proposed HERec method (corresponding to the last column) is consistently better than the baselines, ranging from PMF to DSR. Compared with other HIN based methods, HERec adopts a more principled way to leverage HINs for improving recommender systems, which provides better information extrac- tion (a new HIN embedding model) and utilization (an extended MF model). Moreover, the superiority of the proposed HERec becomes more significant with less training data. In particular, the improvement ratio of HERec over the PMF is up to 40% with 20% training data on the Douban Book dataset, which indicates a significant performance boost. As mentioned previously, the Yelp dataset is very sparse. In this case, even with 60% training data, the HERec model improves over PMF by about 26%. As a comparison, with the same training ratio (i.e., 60%), the improvement ratios over PMF are about 29% in terms of RMSE scores. These results indicate the effectiveness of the proposed approach, especially in a sparse dataset.
(3) Considering the two HERec variants HERecdw and HERecmp, it is easy to see HERecdw performs much worse than HERecmp. The major difference lies in the network em- bedding component. HERecmp adopts the recently proposed HIN embedding method metapath2vec [53], while HERecdw ignores data heterogeneity and casts HINs as homogeneous networks for learning. It indicates HIN embedding methods are important to HIN based recommendation. The major difference between the proposed embedding method and metapath2vec (adopted by HERecmp) lies in that we try to transform heterogeneous network information into homogeneous neighborhood, while metapath2vec tries to map all types of entities into the same representation space using heterogeneous sequences. Indeed, metapath2vec is a general-purpose HIN embedding method, while the proposed embedding method aims to improve the recommendation perfor- mance. Our focus is to learn the embeddings for users and items, while objects of other types are only used as the bridge to construct the homogeneous neighborhood. Based on the results (HERec > HERecmp), we argue that it is more effective to perform task-specific HIN embedding for improving the recommendation performance.
5.5 Detailed Analysis of The Proposed Approach
In this part, we perform a series of detailed analysis for the proposed approach.
5.5.1 Selection of Different Fusion Functions
HERec requires a principled fusion way to transform node em- beddings into a more suitable form that is useful to enhance rec- ommendation performance. Therefore, we will discuss the impact of different fusion functions on the recommendation performance. For convenience, we call the HERec variant with the simple linear fusion function (Eq. 6) as HERecsl, the variant with personalized linear fusion function (Eq. 7) as HERecpl, and the variant with personalized non-linear fusion function (Eq. 8) as HERecpnl. We present the performance comparison of the three variants of HERec on the three datasets in Fig. 4.
As shown in Fig. 4, we can find that overall performance ranking is as follows: HERecpnl > HERecpl > HERecsl. Among the three variants, the simple linear fusion function performs the worst, as it ignores the personalization and non-linearity. Indeed, users are likely to have varying preferences over meta-paths [10], which should be considered in meta-path based methods. The personalization factor improves the performance significantly. As a comparison, the performance improvement of HERecpnl over HERecpl is relatively small. A possible reason is that the incorpo- ration of personalized combination parameters increases the capa- bility of linear models. Nonetheless, HERecpnl still performs the best by considering personalization and non-linear transformation. In a complicated recommender setting, HIN embeddings may not be directly applicable in recommender systems, where a non-linear mapping function is preferred.
Since HERecpnl is the best variant of the proposed model, in what follows, HERec will use the personalized non-linear fusion function, i.e., HERecpnl by default.
5.5.2 Cold-start Prediction
HINs are particularly useful to improve cold-start prediction, where there are fewer rating records but heterogeneous context information is available. We consider studying the performance w.r.t. different cold-start degrees, i.e., the rating sparsity. To test it, we first categorize “cold” users into three groups according to the numbers of their rating records, i.e., (0, 5], (5, 15] and (15, 30]. It is easy to see that the case for the first group is the most difficult, since users from this group have fewest rating records. Here, we only select the baselines which use HIN based information for recommendation, including SoMF, HeteMF, SemRec, DSR and FMHIN. We present the performance comparison of different methods in Fig. 5. For convenience, we report the improvement ratios w.r.t. PMF. Overall, all the comparison methods are better than PMF (i.e., a positive y-axis value). The proposed method performs the best among all the methods, and the improvement over PMF becomes more significant for users with fewer rat- ing records. The results indicate that HIN based information is effective to improve the recommendation performance, and the proposed HERec method can effectively utilize HIN information in a more principled way.
5.5.3 Impact of Different Meta-Paths
As shown in Table 3, the proposed approach uses a selected set of meta-paths. To further analyze the impact of different meta-paths, we gradually incorporate these meta-paths into the proposed approach and check the performance change. In Fig. 6, we can observe that generally the performance improves (i.e., becoming smaller) with the incorporation of more meta-paths. Both meta-paths starting with the user type and item type are
10
0.8 0.78 0.76 0.74 0.72 0.7
20%
0.6 0.59 0.58 0.57 0.56 0.55
Fig. 4. Performance comparison of different fusion functions on three datasets.
60%
40%
20%
0
60%
40%
20%
0
SoMF 60% HeteMF
SemRec
DSR 40% FM
SoMF
HeteMF
SemRec
DSR 40% FM
FM HERec
All
SoMF HeteMF SemRec DSR
0.72
0.71
0.7
RMSE
MAE
0.554
0.553
0.552
0.7
0.69
0.68
RMSE
MAE
0.552
0.551
0.55
1.18 0.89 RMSE
MAE
1.16 0.88
1.14 0.87
1.12 0.86
1.1 0.85
(c) Yelp
40%
Training Ratio
80% 20%
0.65
60%
40%
Training Ratio
HERecpnl
0.8 0.75 0.7 0.65
HERec
HERec sl sl sl
0.9 1.2
HERec sl sl sl
HERec HERec
HERecpl 0.85
HERecpl
HERecpnl 1.15
1.1
1.05 80% 60%
0.92
HERecpl HERecpnl
90%
HERecpl HERecpl 0.9
HERecpl
HERec
pnl
90%
SoMF HeteMF SemRec DSR
HERec
HERec pnl pnl
0.6
0.55 0.82 20% 40% 60% 80% 20% 40% 60% 80% 60%
70%
Training Ratio
(c) Yelp
(0,5]
(0,5]
(5,15]
#Rating of Users
20%
0
SoMF
HeteMF 60% SemRec
DSR 40% FM
HIN
HERec 20%
0
20%
0
SoMF
HeteMF
SemRec
DSR 40% FM
Training Ratio Training Ratio
(a) Douban Movie (b) Douban Book
(5,15]
#Rating of Users
(5,15]
#Rating of Users
(5,15]
#Rating of Users
(c) Yelp
(a) Douban Movie
(b) Douban Book
(15,30] All
(0,5]
(0,5]
(5,15]
#Rating of Users
(0,5]
(0,5]
(5,15]
#Rating of Users
(15,30] All
(15,30] All
(15,30]
HIN HERec
Fig. 5. Performance comparison of different methods for cold-start prediction on three datasets. y-axis denotes the improvement ratio over PMF.
(a) Douban Movie
(b) Douban Book
Fig. 6. Performance change of HERec when gradually incorporating meta-paths.
60%
70%
Training Ratio
(15,30] All
(15,30]
HERec
HIN HERec
HIN
HIN HERec
FM HERec
All
0.88 0.86 0.84
20%
0
80%
80%
HIN
11
RMSE
MAE Improvement
RMSE Improvment
MAE RMSE
MAE
RMSE
MAE Improvement
RMSE Improvement
MAE RMSE
MAE
RMSE
MAE Improvement
RMSE Improvement
MAE RMSE
MAE
+UMU
+UMDMU
+UMTMU +UMAMU
+MDM +MUM
+MTM +MAM
+UBU
+UBPBU +UBABU
+BUB +UBYPU
+BYB +BPB
+UBU
+UBCiBU
+BUB +UBCaBU
+BCiB
+BCaB
useful to improve the performance. However, it does not always yield the improvement with more meta-paths, and the performance slightly fluctuates. The reason is that some meta-paths may contain noisy or conflict information with existing ones. Another useful observation is that the performance quickly achieves a relatively good performance with the incorporation of only a few meta- paths. This confirms previous finding [10]: a small number of high-quality meta-paths are able to lead to large performance improvement. Hence, as mentioned before, we can effectively control the model complexity by selecting just a few meta-paths.
5.5.4 Parameter Tuning
For matrix factorization based methods, an important parameter to tune is the number of latent factors. The proposed model also involves such a parameter. We vary it from 5 to 40 with a step of 5, and examine how the performance changes w.r.t the number of latent factors. We present the tuning results in Fig. 7. As we can see, using 10 latent factors yields the best performance, indicating that the number of latent factors should be set to a small number.
Next, we fix the number of latent factors as ten, and tune two other parameters α and β (Eq. 5), which are used to integrate dif- ferent terms as weights. Now we examine how they influence the model performance. For both parameters, we vary them in the set of {0.1, 0.5, 1, 2}. As shown in Fig. 8, the optimal performance is obtained near (1, 1), i.e., both α and β are around 1. The results show that the HIN embeddings from both the user and item sides are important to improve the prediction performance. Overall, the change trend is smooth, indicating that the proposed model is not very sensitive to the two parameters.
Finally, we study the performance change w.r.t. the number of iterations. As shown in Fig. 9, we can see that the proposed model has a fast convergence rate, and about 40 to 60 iterations are required for dense datasets (i.e., Douban Movie and Book), while about 20 iterations are required for sparse datasets (i.e., Yelp).
6 CONCLUSION
In this paper, we proposed a novel heterogeneous information network embedding based approach (i.e., HERec) to effectively utilizing auxiliary information in HINs for recommendation. We designed a new random walk strategy based on meta-paths to derive more meaningful node sequences for network embedding. Since embeddings based on different meta-paths contain different semantic, the learned embeddings were further integrated into an extended matrix factorization model using a set of fusion functions. Finally, the extended matrix factorization model to- gether with fusion functions were jointly optimized for the rating prediction task. HERec aimed to learn useful information rep- resentations from HINs guided by the specific recommendation task, which distinguished the proposed approach from existing HIN based recommendation methods. Extensive experiments on three real datasets demonstrated the effectiveness of HERec. We also verified the ability of HERec to alleviate cold-start problem and examine the impact of meta-paths on performance.
As future work, we will investigate into how to apply deep learning methods (e.g., convolutional neural networks, auto en- coder) to better fuse the embeddings of multiple meta-paths. In addition, we only use the meta-paths which have the same starting and ending types to effectively extract network structure features in this work. Therefore, it is interesting and natural to extend
the proposed model to learn the embeddings of any nodes with arbitrary meta-paths. As a major issue of recommender systems, we will also consider how to enhance the explainablity of the recommendation method based on the semantics of meta-paths.
REFERENCES
[1] M. B. Dias, D. Locher, M. Li, W. El-Deredy, and P. J. Lisboa, “The value of personalised recommender systems to e-business: a case study,” in Proceedings of the 2nd ACM Conference on Recommender Systems, 2008, pp. 291–294.
[2] Y. Koren and R. Bell, “Advances in collaborative filtering,” in Recom- mender systems handbook, 2015, pp. 77–118.
[3] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, in The Adaptive Web, pp. 291–324.
[4] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path- based top-k similarity search in heterogeneous information networks,” Proceedings of the 37th Very Large Data Bases, vol. 4, no. 11, pp. 992– 1003, 2011.
[5] C. Shi, Y. Li, J. Zhang, Y. Sun, and P. S. Yu, “A survey of heterogeneous information network analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 1, pp. 17–37, 2017.
[6] C. Shi and P. S. Yu, “Heterogeneous information network analysis and applications,” in Data Analytics. Springer, 2017, pp. 1–227.
[7] X. Yu, X. Ren, Q. Gu, Y. Sun, and J. Han, “Collaborative filtering with entity similarity regularization in heterogeneous information networks,” Proceedings of 1st International Joint Conference on Artificial Intelli- gence Workshop on Heterogeneous Information Network Analysis, 2013.
[8] W. Feng and J. Wang, “Incorporating heterogeneous information for personalized tag recommendation in social tagging systems,” in Proceed- ings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 1276–1284.
[9] X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han, “Personalized entity recommendation: A heterogeneous informa- tion network approach,” in Proceedings of the 7th ACM International Conference on Web Search and Data Mining, 2014, pp. 283–292.
[10] C. Shi, Z. Zhang, P. Luo, P. S. Yu, Y. Yue, and B. Wu, “Semantic path based personalized recommendation on weighted heterogeneous information networks,” in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015, pp. 453–462.
[11] C.Shi,J.Liu,F.Zhuang,P.S.Yu,andB.Wu,“Integratingheterogeneous information via flexible regularization framework for recommendation,” Knowledge and Information System, vol. 49, no. 3, pp. 835–859, 2016.
[12] J. Zheng, J. Liu, C. Shi, F. Zhuang, J. Li, and B. Wu, “Recommendation in heterogeneous information network via dual similarity regularization,” International Journal of Data Science and Analytics, vol. 3, no. 1, pp. 35–48, 2017.
[13] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 701–710.
[14] A. Grover and J. Leskovec, “node2vec: scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 855– 864.
[15] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, 2009.
[16] Y. Shi, X. Zhao, J. Wang, M. Larson, and A. Hanjalic, “Adaptive diversification of recommendation results via latent factor portfolio,” in Proceedings of the 35th international ACM SIGIR Conference on Research and Development in Information Retrieval, 2012, pp. 175–184.
[17] H. Yin, Y. Sun, B. Cui, Z. Hu, and L. Chen, “Lcars: a location-content- aware recommender system,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 221–229.
[18] L.Hong,A.S.Doumith,andB.D.Davison,“Co-factorizationmachines: modeling user interests and predicting individual decisions in twitter,” in Proceedings of the 6th ACM International Conference on Web Search and Data Mining, 2013, pp. 557–566.
[19] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King, “Recommender systems with social regularization,” in Proceedings of the fourth ACM International Conference on Web Search and Data Mining, 2011, pp. 287–296.
[20] G. Ling, M. R. Lyu, and I. King, “Ratings meet reviews, a combined approach to recommend,” in Proceedings of the 8th ACM Conference on Recommender Systems, 2014, pp. 105–112.
12
0.715
1.12
0.71
0.705
0.7
0.695
0.71 0.7 0.69 0.68 0.67
510203040 510203040 Dimension of Latent Factors Dimension of Latent Factors
(a) Douban Movie (b) Douban Book
1.118
1.116
1.114
510 20 30 40
Fig. 7. Performance with respect to the dimension of latent factors on three datasets.
0.74
0.72
0.8 0.75 0.7
1.25 1.2 1.15
0.7 555
0.78
0.76
0.85
0.8
0.65
1.1
252525 112112112
β 0.5
(a) Douban Movie
β 0.5 0.5 0.1 0.1
(b) Douban Book
β 0.5
0.5 α
0.1 0.1
0.5
α
α
0.1 0.1 (c) Yelp
Fig. 8. Varying parameters α and β on the three datasets.
1.35 1.3 1.25 1.2 1.15 1.1
Number of Iterations Number of Iterations Number of Iterations
(a) Douban Movie (b) Douban Book (c) Yelp
0.74 0.75 0.72 0.7
0.7
0.65
1 20 40 60 1 20 40 60 1 10 20 30
Fig. 9. Performance with respect to the number of iterations on three datasets.
[21] M. Ye, P. Yin, W.-C. Lee, and D.-L. Lee, “Exploiting geographical influence for collaborative point-of-interest recommendation,” in Pro- ceedings of the 34th international ACM SIGIR conference on Research and Development in Information Retrieval, 2011, pp. 325–334.
[22] S. Sedhain, A. K. Menon, S. Sanner, L. Xie, and D. Braziunas, “Low-rank linear cold-start recommendation from social data.” in Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017, pp. 1502–1508.
[23] Z. Gantner, L. Drumond, C. Freudenthaler, S. Rendle, and L. Schmidt- Thieme, “Learning attribute-to-feature mappings for cold-start recom- mendations,” in Proceedings of the 10th IEEE International Conference on Data Mining series, 2010, pp. 176–185.
[24] A. Krohn-Grimberghe, L. Drumond, C. Freudenthaler, and L. Schmidt- Thieme, “Multi-relational matrix factorization using bayesian personal- ized ranking for social network data,” in Proceedings of the 5th ACM International Conference on Web Search and Data Mining, 2012, pp. 173–182.
[25] S. Sedhain, S. Sanner, D. Braziunas, L. Xie, and J. Christensen, “Social collaborative filtering for cold-start recommendations,” in Proceedings of the 8th ACM Conference on Recommender systems. ACM, 2014, pp. 345–348.
[26] L. Zheng, V. Noroozi, and P. S. Yu, “Joint deep modeling of users and items using reviews for recommendation,” in Proceedings of the 10th ACM International Conference on Web Search and Data Mining, 2017, pp. 425–434.
[27] R. He and J. McAuley, “Vbpr: Visual bayesian personalized ranking from implicit feedback.” in Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016, pp. 144–150.
[28] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W.-Y. Ma, “Collaborative knowledge base embedding for recommender systems,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 353–362.
[29] T. Chen, W. Zhang, Q. Lu, K. Chen, Z. Zheng, and Y. Yu, “Svdfeature: a toolkit for feature-based collaborative filtering,” Journal of Machine Learning Research, vol. 13, pp. 3619–3622, 2012.
[30] S. Rendle, “Factorization machines,” in Proceedings of the 10th IEEE International Conference on Data Mining series, 2010, pp. 995–1000.
[31] Y. Sun and J. Han, “Mining heterogeneous information networks: a structural analysis approach,” ACM SIGKDD Explorations Newsletter, vol. 14, no. 2, pp. 20–28, 2013.
[32] N. Lao and W. W. Cohen, “Relational retrieval using a combination of path-constrained random walks,” Machine Learning, vol. 81, no. 1, pp. 53–67, 2010.
[33] C. Shi, X. Kong, Y. Huang, P. S. Yu, and B. Wu, “Hetesim: A general framework for relevance measure in heterogeneous networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 10, pp. 2479–2492, 2014.
[34] X. Yu, X. Ren, Y. Sun, B. Sturt, U. Khandelwal, Q. Gu, B. Norick, and J. Han, “Recommendation in heterogeneous information networks with
Dimension of Latent Fctors
(c) Yelp
13
RMSE RMSE RMSE
RMSE RMSE RMSE
RMSE RMSE RMSE
implicit user feedback,” in Proceedings of the 7th ACM conference on
Recommender Systems, 2013, pp. 347–350.
[35] C. Luo, W. Pang, Z. Wang, and C. Lin, “Hete-cf: Social-based col-
laborative filtering recommendation using heterogeneous relations,” in
Proceedings of the 14th IEEE International Conference on Data Mining
series, 2014, pp. 917–922.
[36] J. Zheng, J. Liu, C. Shi, F. Zhuang, J. Li, and B. Wu, “Dual similarity
regularization for recommendation,” in Proceedings of the 20th Pacific- Asia Conference on Knowledge Discovery and Data Mining, 2016, pp. 542–554.
[37] P.D.Hoff,A.E.Raftery,andM.S.Handcock,“Latentspaceapproaches to social network analysis,” Journal of the American Statistical Associa- tion, vol. 97, no. 460, pp. 1090–1098, 2002.
[38] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: A general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intel- ligence, vol. 29, no. 1, pp. 40–51, 2007.
[39] C. Tu, W. Zhang, Z. Liu, and M. Sun, “Max-margin deepwalk: Dis- criminative learning of network representation.” in Proceedings of the 25th International Joint Conference on Artificial Intelligence, 2016, pp. 3889–3895.
[40] X. Wei, L. Xu, B. Cao, and P. S. Yu, “Cross view link prediction by learning noise-resilient representation consensus,” in Proceedings of the 26th International Conference on World Wide Web, 2017, pp. 1611–1619.
[41] S.Cao,W.Lu,andQ.Xu,“Deepneuralnetworksforlearninggraphrep- resentations.” in Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016, pp. 1145–1152.
[42] D. Liang, J. Altosaar, L. Charlin, and D. M. Blei, “Factorization meets the item embedding: Regularizing matrix factorization with item co- occurrence,” in Proceedings of the 10th ACM Conference on Recom- mender Systems, 2016, pp. 59–66.
[43] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 1067–1077.
[44] D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1225–1234.
[45] S. Cao, W. Lu, and Q. Xu, “Grarep: Learning graph representations with global structural information,” in Proceedings of the 24th ACM Interna- tional on Conference on Information and Knowledge Management, 2015, pp. 891–900.
[46] S. Pan, J. Wu, X. Zhu, C. Zhang, and Y. Wang, “Tri-party deep network representation,” Network, vol. 11, no. 9, p. 12, 2016.
[47] C. Yang, Z. Liu, D. Zhao, M. Sun, and E. Y. Chang, “Network rep- resentation learning with rich text information.” in Proceedings of the 24th International Joint Conference on Artificial Intelligence, 2015, pp. 2111–2117.
[48] D. Zhang, J. Yin, X. Zhu, and C. Zhang, “Homophily, structure, and content augmented network representation learning,” in Proceedings of the 16th IEEE International Conference on Data Mining series, 2016, pp. 609–618.
[49] S. Chang, W. Han, J. Tang, G. J. Qi, C. C. Aggarwal, and T. S. Huang, “Heterogeneous network embedding via deep architectures,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 119–128.
[50] J. Tang, M. Qu, and Q. Mei, “Pte: Predictive text embedding through large-scale heterogeneous text networks,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1165–1174.
[51] L. Xu, X. Wei, J. Cao, and P. S. Yu, “Embedding of embedding (eoe): Joint embedding for coupled heterogeneous networks,” in Proceedings of the 10th ACM International Conference on Web Search and Data Mining, 2017, pp. 741–749.
[52] T. Chen and Y. Sun, “Task-guided and path-augmented heterogeneous network embedding for author identification,” in Proceedings of the 10th ACM International Conference on Web Search and Data Mining, 2017, pp. 295–304.
[53] Y. Dong, N. V. Chawla, and A. Swami, “metapath2vec: Scalable rep- resentation learning for heterogeneous networks,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 135–144.
[54] Y. Sun and J. Han, “Mining heterogeneous information networks: prin- ciples and methodologies,” Synthesis Lectures on Data Mining and Knowledge Discovery, pp. 1–159, 2012.
[55] Y. Sun, Y. Yu, and J. Han, “Ranking-based clustering of heterogeneous information networks with star network schema,” in Proceedings of the
15th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, 2009, pp. 797–806.
[56] A. Mnih and R. R. Salakhutdinov, “Probabilistic matrix factorization,”
in Advances in Neural Information Processing Systems, 2008, pp. 1257–
1264.
[57] S. Rendle, “Factorization machines with libfm,” ACM Transactions on
Intelligent Systems and Technology, vol. 3, no. 3, p. 57, 2012.
Chuan Shi received the B.S. degree from the Jilin University in 2001, the M.S. degree from the Wuhan University in 2004, and Ph.D. degree from the ICT of Chinese Academic of Sciences in 2007. He joined the Beijing University of Posts and Telecommunications as a lecturer in 2007, and is a professor and deputy director of Beijing Key Lab of Intelligent Telecommunications Soft- ware and Multimedia at present. His research interests are in data mining, machine learning, and evolutionary computing. He has published
more than 40 papers in refereed journals and conferences.
Binbin Hu received the B.S. degree from Beijing University of Posts and Telecommunications in 2016. He is currently a master student in BUPT. His research interests are in data mining and machine learning.
Wayne Xin Zhao is currently an assistant pro- fessor at the School of Information, Renmin University of China. He received the Ph.D. de- gree from Peking University in 2014. His re- search interests are web text mining and nat- ural language processing. He has published several referred papers in international confer- ences journals such as ACL, EMNLP, COLING, ECIR, CIKM, SIGIR, SIGKDD, AAAI, ACM TOIS, ACM TKDD, ACM TIST, IEEE TKDE, KAIS and WWWJ.
Philip S. Yu is a Distinguished Professor in Computer Science at the University of Illinois at Chicago and also holds the Wexler Chair in Information Technology. Dr. Yu spent most of his career at IBM, where he was manager of the Software Tools and Techniques group at the Watson Research Center. His research inter- est is on big data, including data mining, data stream, database and privacy. He has published more than 920 papers in refereed journals and conferences. He holds or has applied for more
than 250 US patents. Dr. Yu is a Fellow of the ACM and the IEEE. He is the Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data. He is on the steering committee of the IEEE Conference on Data Mining and ACM Conference on Information and Knowledge Management and was a member of the IEEE Data Engineering steer- ing committee. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001-2004). Dr. Yu received the B.S. Degree in E.E. from National Taiwan University, the M.S. and Ph.D. degrees in E.E. from Stanford University, and the M.B.A. degree from New York University.
PLACE PHOTO HERE
14
PLACE PHOTO HERE
PLACE PHOTO HERE
PLACE PHOTO HERE