CS代考 KDD 2014.

Node Embeddings

Recap: Traditional ML For Graphs
Given an input graph, extract node, link and graph-level features, learn a model (SVM, neural network, etc.) that maps features to labels.

Copyright By PowCoder代写 加微信 powcoder

Input Structured Graph Features
Feature engineering (node-level, edge-level, graph- level features)
Learning Algorithm
Prediction
Downstream prediction task

Graph Representation Learning
Graph Representation Learning
alleviates
do feature
engineering
Input Graph
Feature Engineering
Structured Learning Features Algorithm
Prediction
Representation Learning — Downstream Automatically prediction task
learn the features

Graph Representation Learning
Goal: Efficient task-independent feature learning for machine learning with graphs!
𝑓: 𝑢 → R𝑑 𝑢
Feature representation, embedding

Why Embedding?
◾ Task: Map nodes into an embedding space
▪ Similarity of embeddings between nodes indicates their
similarity in the network.
▪ For example: Both nodes are close to each other (connected by an edge)
▪ Encode network information
▪ Potentially used for many downstream predictions
R𝑑 embeddings
• • • • • •
Node classification
Link prediction
Graph classification Anomalous node detection Clustering

Example Node Embedding
◾ 2D embedding of nodes of the Zachary’s Karate Club network:
Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.

Node Embeddings: Encoder and Decoder

◾ Assume we have a graph G:
▪ V is the vertex set.
▪ A is the adjacency matrix (assume binary).
▪ For simplicity: No node features or extra information is used
V: {1, 2, 3, 4}
1001 A=  0001

Embedding Nodes
◾ Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the graph

Embedding Nodes
similarity 𝑢,𝑣
in the original network
Need to define!

Node Embeddings Summary
1. Encoder ENC maps from nodes to embeddings
2. Define a node similarity function (i.e., a measure of similarity in the original network)
3. Decoder 𝐃𝐄𝐂 maps from embeddings to the similarity score
4. Optimize the parameters of the encoder so that:
similarity 𝑢,𝑣
in the original network
Similarity of the embedding
𝐃𝐄𝐂(𝐳Τ𝐳 ) 𝑣𝑢

Two Key Components
◾ Encoder: maps each node to a low-dimensional
d-dimensional ENC 𝑣 = 𝐳𝑣 embedding
node in the input graph
◾ Similarity function: specifies how the relationships in vector space map to the relationships in the original network
similarity 𝑢, 𝑣
Similarity of 𝑢 and 𝑣
in the original network
dot product between node embeddings

“Shallow” Encoding Simplest encoding approach:
Encoder is just an embedding-lookup
ENC(𝑣)=𝐳𝒗 =𝐙⋅𝑣
Matrix, each column is a node embedding [what we learn / optimize]
Indicator vector, all zeroes a one in
indicating node v

“Shallow” Encoding
Simplest encoding approach:
Encoder is just an embedding-lookup
embedding vector for a specific node
one column per node
embedding matrix
Dimension/size of embeddings

“Shallow” Encoding Simplest encoding approach:
Encoder is just an embedding-lookup
Each node is assigned a unique embedding vector
(i.e., we directly optimize the embedding of each node)

Framework Summary
◾ Encoder + Decoder Framework
▪ Shallow encoder: embedding lookup
▪ Parameters to optimize: 𝐙 which contains node embeddings 𝐳𝑢 for all nodes 𝑢 ∈ 𝑉
▪ We will cover deep encoders (GNNs)
▪ Decoder: based on node similarity.
▪ Objective: maximize 𝐳Τ𝐳 for node pairs (𝑢, 𝑣) 𝑣𝑢
that are similar

How to Define Node Similarity
◾ Key choice of methods is how they define
node similarity.
◾ Should two nodes have a similar embedding if they…
▪ are linked?
▪ share neighbors?
▪ have similar “structural roles”?

Random Walk Approaches for Node Embeddings

A Note on Node Embeddings
◾ We will now learn node similarity definition that uses random walks, and how to optimize embeddings for such a similarity measure.
◾ Random walks is unsupervised/self-supervised way of learning node embeddings.
▪ We are not utilizing node labels
▪ We are not utilizing node features
◾ These embeddings are task independent
▪ They are not trained for a specific task but can be used for any task.

Notation ◾ Vector 𝐳𝑢:
▪ The embedding of node 𝑢 (what we aim to find). ◾ Probability 𝑃 𝑣 𝐳𝑢) :
▪ The (predicted) probability of visiting node 𝑣 on random walks starting from node 𝑢.
Our model prediction based on 𝐳𝑢

Random Walk
Step 4 Step 5
Given a graph and a starting point, we select a neighbor of it at random, and move to this neighbor; then we select a neighbor of this point at random, and move to it, etc. The (random) sequence of points visited this way is a random walk on the graph.

Random Walk Embeddings
probability that u and v co-occur on a random walk over the graph

Random Walk Embeddings
1. Estimate probability of visiting node 𝒗 on a random walk starting from node 𝒖 using some random walk strategy 𝑹
2. Optimize embeddings to encode these random walk statistics:
Similarity in embedding space (Here:
dot product=cos(𝜃)) encodes random walk “similarity”

Why Random Walks
1. Expressivity: Flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information.
Idea: if random walk starting from node 𝑢 visits 𝑣 with high probability, 𝑢 and 𝑣 are similar (high-order multi-hop information).
2. Efficiency: Do not need to consider all node pairs when training; only need to consider pairs that co-occur on random walks.

Unsupervised Feature Learning
◾ Intuition: Find embedding of nodes in 𝑑-dimensional space that preserves similarity.
◾ Idea: Learn node embedding such that nearby nodes are close together in the network.
◾ Given a node 𝑢, how do we define nearby nodes?
▪ 𝑁𝑅 (𝑢) … neighbourhood of 𝑢 obtained by some random walk strategy 𝑅.

Feature Learning as Optimization
◾ Given 𝐺 = (𝑉, 𝐸),
◾ Our goal is to learn a mapping 𝑓:𝑢 → R𝑑:
◾ Log-likelihood objective:
▪ 𝑁𝑅(𝑢) is the neighborhood of node 𝑢 by strategy 𝑅
◾ Given node 𝑢, we want to learn feature
representations that are predictive of the nodes in
its random walk neighborhood 𝑁 (𝑢). 𝑅

Random Walk Optimization
1. Run short fixed-length random walks starting from each node 𝑢 in the graph using some random walk strategy R.
2. For each node 𝑢 collect 𝑁𝑅(𝑢), the multiset* of nodes visited on random walks starting from 𝑢.
3. Optimize embeddings according to: Given node 𝑢, predict its neighbors 𝑁R(𝑢).
Maximum likelihood objective
*𝑁 (𝑢) can have repeat elements since nodes can be visited multiple times on random walks 𝑅

Random Walks: Summary
1. Run short fixed-length random walks starting from each node on the graph
2. For each node 𝑢 collect 𝑁𝑅(𝑢), the multiset of nodes visited on random walks starting from 𝑢.
3. Optimize embeddings (using Stochastic Gradient Descent):

How should we Randomly walk?
◾ So far we have described how to optimize
embeddings given a random walk strategy R ◾ What strategies should we use to run these
random walks?
▪ Simplest idea: Just run fixed-length, unbiased random walks starting from each node
▪ (i.e., DeepWalk from Perozzi et al., 2013)
▪ The issue is that such notion of similarity is too constrained
◾ How can we generalize this?
Reference: Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD. 29

Overview of node2vec
◾ Goal: Embed nodes with similar network neighborhoods close in the feature space.
◾ We frame this goal as a maximum likelihood optimization problem, independent to the downstream prediction task.
◾ Key observation: Flexible notion of network neighborhood 𝑁 (𝑢) of node 𝑢 leads to rich
node embeddings
◾ Develop biased 2nd order random walk 𝑅 to
generate network neighborhood 𝑁 (𝑢) of
Reference: Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD.

Node2vec: Biased Walks
Idea: use flexible, biased random walks that can trade off between local and global views of the network (Grover and Leskovec, 2016).

Node2vec: Biased Walks
Two classic strategies to define a neighborhood 𝑵𝑹(𝒖) of a given node 𝒖:
Walk of length 3 (𝑁𝑅(𝑢) of size 3):
𝑁𝐵𝐹𝑆 𝑢 = { 𝑠1, 𝑠2, 𝑠3} Local microscopic view
𝑁𝐷𝐹𝑆 𝑢 ={𝑠4,𝑠5,𝑠6}
Global macroscopic view

BFS vs. DFS
Micro-view of neighbourhood
Macro-view of neighbourhood

Interpolating BFS and DFS
Biased fixed-length random walk 𝑹 that given a node 𝒖 generates neighborhood 𝑵𝑹(𝒖)
◾ Two parameters:
▪ Return parameter 𝒑:
▪Return back to the previous node
▪In-out parameter 𝒒:
▪Moving outwards (DFS) vs. inwards (BFS) ▪Intuitively, 𝑞 is the “ratio” of BFS vs. DFS

Biased Random Walks
Biased 2nd-order random walks explore network neighborhoods:
▪Rnd.walkjusttraversededge(𝑠 ,𝑤)andisnowat𝑤 1
▪ Insight: Neighbors of 𝑤 can only be:
s1 Back to 𝒔𝟏
Idea: Remember where the walk came from
Same distance to 𝒔𝟏 s2
w Farther from 𝒔𝟏

Biased Random Walks
◾ Walker came over edge (𝐬𝟏, 𝐰) and is at 𝐰. Where to go next?
1/𝑞 s3 1/𝑞
◾ 𝑝, 𝑞 model transition probabilities ▪ 𝑝 … return parameter
▪ 𝑞 … ”walk away” parameter
1/𝑝, 1/𝑞, 1 are unnormalized probabilities

Biased Random Walks
◾ Walker came over edge (𝐬𝟏, 𝐰) and is at 𝐰. Where to go next?
Prob. Dist. (𝒔𝟏, 𝒕) 0
▪ BFS-like walk: Low value of 𝑝 ▪ DFS-like walk: Low value of 𝑞
Unnormalized transition prob. segmented based on distance from 𝑠!
𝑁𝑅(𝑢) are the nodes visited by the biased walk

Node2vec Algorithm
1. Compute random walk probabilities
2. Simulate 𝑟 random walks of length 𝑙 starting
from each node 𝑢
3. Optimize the node2vec objective (using
Stochastic Gradient Descent)
Properties:
1) Linear-time complexity
2) All 3 steps are individually parallelizable

Summary so far
◾ Core idea: Embed nodes so that distances in embedding space reflect node similarities in the original network.
◾ Different notions of node similarity:
▪ Naïve: similar if two nodes are connected ▪ Neighborhood overlap (covered in Topic 5) ▪Random walk approaches

Summary so far
◾ No one method wins in all cases….
▪ E.g., node2vec performs better on node classification while alternative methods perform better on link prediction (Goyal and Ferrara, 2017 survey).
◾ Random walk approaches are generally more efficient.
So what method should I use..?
Choose definition of node similarity that matches your application.

Learning Outcomes
We discussed graph representation learning, a way to learn node and graph embeddings for downstream tasks, without feature engineering.
◾ Encoder-decoder framework: ▪ Encoder: embedding lookup
▪ Decoder: predict score based on embedding to match node similarity
◾ Node similarity measure: (biased) random walk ▪ Examples: Node2 : , Stanford University

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com