Traditional Methods for Machine Learning in Graphs
Graph Properties
how many friends do I have?
Copyright By PowCoder代写 加微信 powcoder
◾ Weights:
how strong are the ties?
how far am I from another vertex?
◾ Connectivity:
can I reach all other vertices?
◾ Diameter:
how dense are they?
◾ Centrality
(e.g., betweenness, closeness): Am I in the center of everyone?
Graph Analysis Problem
Existing Graph Analysis Problems
1. Clique identification
2. Shortest path
3. K-core decomposition
and more…
We need machine learning for graphs
Many Applications
Challenge:
finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models.
Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions).
Machine Learning Tasks on Graphs
◾ Node-level prediction ◾ Link-level prediction ◾ Graph-level prediction
Link-level ? A
Graph-level
Node-level ? B
Traditional ML Pipeline
◾ Design features for nodes/links/graphs
◾ Obtain features for all training data ∈ R𝐷
Link features
Graph features
Node features
Traditional ML Pipeline
◾ Train an ML model: ▪Random forest
▪ Neural network, etc.
◾ Test the model:
▪ Given a new node/link/graph, obtain its features and make a prediction
Feature Design
◾ Using effective features over graphs is the key to achieving good model performance.
◾ Traditional ML pipeline uses hand-designed features.
◾ In this lecture, we overview the traditional features for:
▪ Node-level prediction ▪ Link-level prediction
▪ Graph-level prediction
◾ For simplicity, we focus on undirected graphs.
ML in Graphs
Goal: Make predictions for a set of objects
Design choices:
◾ Features: d-dimensional vectors
◾ Objects:
Nodes, edges, sets of nodes, entire graphs
◾ Objective function:
What task are we aiming to solve?
ML in Graphs
Example: Node-level prediction ◾ Given:
◾ Learn a function:
How do we learn the function?
Node-level Tasks and Features
Node-level Tasks
Machine Learning
Node classification!
ML needs features.
Node-level Features: Overview
◾ Goal: Characterize the structure and position of a node in the network:
▪ Node degree
▪ Node centrality
▪ Clustering coefficient
Node feature?
▪ Graphlets
Node-level Features: Node Degree
◾ The degree 𝑘𝑣 of node 𝑣 is the number of edges (neighboring nodes) the node has.
◾ Treats all neighboring nodes equally. 𝑘𝐵 = 2
Node-level Features: Node Centrality ◾ Node degree counts the neighboring nodes
without capturing their importance.
◾ Node centrality 𝑐𝑣 takes the node importance in a graph into account
◾ Different ways to model importance: Engienvector centrality
▪ Eigenvector centrality
▪Betweenness centrality ▪Closeness centrality many others…
Node Centrality (1)
◾ Eigenvector centrality:
▪ A node 𝑣 is important if surrounded by important
neighboring nodes 𝑢 ∈ 𝑁(𝑣).
▪ We model the centrality of node 𝑣 as the sum of
the centrality of neighboring nodes:
𝜆 is normalization constant
(it will turn out to be the largest eigenvalue of A)
▪ Notice that the above equation models centrality in a recursive manner. How do we solve it?
Node Centrality (1)
◾ Eigenvector centrality:
▪ Rewrite the recursive equation in the matrix form.
𝜆 is normalization const (largest eigenvalue of A)
𝑨: Adjacency matrix 𝑨𝑢𝑣= 1 if 𝑢 ∈ 𝑁(𝑣)
𝒄: Centrality vector 𝜆: Eigenvalue
▪ We see that centrality 𝑐 is the eigenvector of 𝑨!
▪ The largest eigenvalue 𝜆𝑚𝑎𝑥 is always positive and
unique (by Perron-Frobenius Theorem).
▪ The eigenvector 𝒄𝑚𝑎𝑥 corresponding to 𝜆𝑚𝑎𝑥 is used for centrality.
Node Centrality (2)
◾ Betweenness centrality:
▪ A node is important if it lies on many shortest
paths between other nodes.
▪ Example:
𝑐𝐴 =𝑐𝐵 =𝑐𝐸 =0
(A-C-B, A-C-D, A-C-D-E)
(A-C-D-E, B-D-E, C-D-E)
Node Centrality (3)
◾ Closeness centrality:
▪ A node is important if it has small shortest path
lengths to all other nodes.
▪ Example: B
𝑐𝐴 =1/(2+1+2+3)=1/8
(A-C-B, A-C, A-C-D, A-C-D-E)
𝑐𝐷 =1/(2+1+1+1)=1/5 (D-C-A, D-B, D-C, D-E)
Node Features: Clustering Coefficient
◾ Measures how connected 𝑣′𝑠 neighboring nodes are:
◾ Examples: 𝑣
#(node pairs among 𝑘𝑣 neighboring nodes)
In our examples below the denominator is 6 (4 choose 2).
Node Features: Graphlets
◾ Observation: Clustering coefficient counts the #(triangles) in the ego-network
3 triangles (out of 6 node triplets)
◾ We can generalize the above by counting #(pre-specified subgraphs, i.e., graphlets).
Node Features: Graphlets
◾ Goal: Describe network structure around node 𝑢 ▪ Graphlets are small subgraphs that describe the
structure of node 𝑢’s network neighborhood
counts #(edges) that a node touches
◾ Clusteringcoefficient
counts #(triangles) that a node touches.
◾ GraphletDegreeVector(GDV): Graphlet-base features for nodes
▪ GDV counts #(graphlets) that a node touches
Node Features: Graphlets
◾ Considering graphlets of size 2-5 nodes we get:
◾ Vector of 73 coordinates is a signature of a node that describes the topology of node’s neighborhood
◾ Graphlet degree vector provides a measure of a node’s local network topology:
◾ Comparing vectors of two nodes provides a more detailed measure of local topological similarity than node degrees or clustering coefficient.
Isomorphism
◾ Def: Graph Isomorphism
◾ Two graphs which contain the same number of nodes connected in the same way are said to be isomorphic.
Isomorphic
Node mapping: (e2,c2), (e1, c5), (e3,c4), (e5,c3), (e4,c1)
Non-Isomorphic
The right graph has cycles of length 3 but he left graph does not, so the graphs cannot be isomorphic.
Induced Subgraph
◾ Def: Induced subgraph
◾ A graph, formed from a subset of vertices and all of the edges connecting the vertices
in that subset.
Induced subgraph:
Non-induced 𝑢 Subgraph:
Subgraph Isomorphism
• Let 𝐺 = (𝑉, 𝐸) be a data graph and 𝑝 = (𝑉 , 𝐸 )
be a pattern graph.
• A function 𝑓: 𝑉 → 𝑉 is called a homomorphism
of 𝑝 if for each edge (𝑢, 𝑢′) ∈ 𝐸 , we have
(𝑓(𝑢), 𝑓(𝑢’)) ∈ 𝐸.
• A homomorphism 𝑓 of 𝑝 is called a subgraph
isomorphism of 𝑝 if 𝑓 is injective such that 𝑓
never maps distinct nodes in 𝑉 to the same node
An Example
• Let 𝐺 = (𝑉, 𝐸) be a
data graph and 𝑝 =
(𝑉 , 𝐸 ) be a pattern 𝑝𝑝B
Pattern 𝑝 145
Data Graph 𝐺
Homorphism
• Afunction𝑓: 𝑉 →𝑉is 𝑝
called a homomorphism of
𝑝 if for each edge (𝑢, 𝑢′) ∈
𝐸 , we have 𝑝
(𝑓(𝑢), 𝑓(𝑢’)) ∈ 𝐸.
Data Graph 𝐺
Subgraph Isomorphism
• A function 𝑓: 𝑉 → 𝑉 is 𝑝
called a homomorphism
of 𝑝 if for each edge
(𝑢, 𝑢′) ∈ 𝐸 , we have B 𝑝
(𝑓(𝑢), 𝑓(𝑢’)) ∈ 𝐸.
• A homomorphism 𝑓 of 𝑝
is called a subgraph
isomorphism of 𝑝 if 𝑓 is
injective such that 𝑓
never maps distinct
nodes in 𝑉 to the same 𝑝
node in 𝑉.
C Pattern 𝑝 145
Data Graph 𝐺
Subgraph Isomorphism (Non-Induced)
• A function 𝑓: 𝑉 → 𝑉 is 𝑝
called a homomorphism
of 𝑝 if for each edge
(𝑢, 𝑢′) ∈ 𝐸 , we have B 𝑝
(𝑓(𝑢), 𝑓(𝑢’)) ∈ 𝐸.
• A homomorphism 𝑓 of 𝑝
is called a subgraph
isomorphism of 𝑝 if 𝑓 is
injective such that 𝑓
never maps distinct
nodes in 𝑉 to the same 𝑝
node in 𝑉.
C Pattern 𝑝 145
Data Graph 𝐺
Subgraph Isomorphism (Non-Induced)
• A function 𝑓: 𝑉 → 𝑉 is 𝑝
called a homomorphism
of 𝑝 if for each edge
(𝑢, 𝑢′) ∈ 𝐸 , we have B 𝑝
(𝑓(𝑢), 𝑓(𝑢’)) ∈ 𝐸.
• A homomorphism 𝑓 of 𝑝
is called a subgraph
isomorphism of 𝑝 if 𝑓 is
injective such that 𝑓
never maps distinct
nodes in 𝑉 to the same 𝑝
node in 𝑉.
C Pattern 𝑝 145
Data Graph 𝐺
Subgraph Isomorphism (Induced)
• A function 𝑓: 𝑉 → 𝑉 is 𝑝
called a homomorphism
of 𝑝 if for each edge
(𝑢, 𝑢′) ∈ 𝐸 , we have 𝑝
(𝑓(𝑢), 𝑓(𝑢’)) ∈ 𝐸.
• A homomorphism 𝑓 of 𝑝
is called a subgraph
isomorphism of 𝑝 if 𝑓 is
injective such that 𝑓
never maps distinct
nodes in 𝑉 to the same 𝑝
node in 𝑉.
C Pattern 𝑝 145
Data Graph 𝐺
Przulj et al., Bioinformatics 2004 B
There are 73 different graphlet of up to 5 nodes
Node Features: : Rooted connected induced non-isomorphic subgraphs:
Take some nodes and all the edges between them.
Graphlet id (Root of node 𝑢)
𝑢 has graphlets: 0, 1,2, 3, 5, 10, 11, …
Node Features: Graphlets
◾ Graphlet Degree Vector (GDV): A count vector of graphlets rooted at a given node.
◾ Example: 𝑢
Possible graphlets up to size 3
Graphlet instances of node u:
GDV of node 𝑢: 𝑎,𝑏,𝑐,𝑑 [2,1,0,2]
Node-level Feature: Summary
◾ We have introduced different ways to obtain node features.
◾ They can be categorized as:
▪Importance-based features:
▪Node degree
▪Different node centrality measures
▪Structure-based features: ▪Node degree ▪Clustering coefficient ▪Graphlet count vector
Node-level Feature: Summary
◾ Importance-based features: capture the importance of a node in a graph
▪ Node degree:
▪ Simply counts the number of neighboring nodes
▪ Node centrality:
▪ Models importance of neighboring nodes in a graph
▪ Different modeling choices: eigenvector centrality, betweenness centrality, closeness centrality
◾ Useful for predicting influential nodes in a graph
▪ Example: predicting celebrity users in a social network
Node-level Feature: Summary
◾ Structure-based features: Capture topological properties of local neighborhood around a node.
▪ Node degree:
▪ Counts the number of neighboring nodes
▪ Clustering coefficient:
▪ Measures how connected neighboring nodes are
▪ Graphlet degree vector:
▪ Counts the occurrences of different graphlets
◾ Useful for predicting a particular role a node plays in a graph:
▪ Example: Predicting protein functionality in a protein-protein interaction network.
Discussion
Different ways to label nodes of the network:
Node features defined so far would allow to distinguish nodes in the above example
However, the features defines so far would not allow for distinguishing the above node labelling
Link Prediction Task and Features
When can people be friends?
• In 1960s, interviewed people who had recently changed employers to learn how they discovered their new jobs.
• Surprising: Acquaintances rather than close friends help to find a new job.
• Triadic Closure: If two people in
a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in future.
Refer to Networks Crowds and Markets by D Easley and J. Kleinberg, and ’s lecture notes.
Link-level Prediction Task: Recap
◾ The task is to predict new links based on the existing links.
◾ At test time, node pairs (with no existing links) are ranked, and top 𝐾 node pairs are predicted.
◾ The key is to design features for a pair of nodes.
Link-level Prediction as a Task
Two formulations of the link prediction task:
◾ 1) Links missing at random:
▪ Remove a random set of links and
then aim to predict them
◾ 2) Links over time:
▪ Given 𝐺[𝑡0, 𝑡′ ] a graph defined by edges
𝐺[𝑡0,𝑡′ ] 0
𝐺[𝑡1,𝑡′ ] 1
up to time 𝑡′ , output a ranked list L 0′
of edges (not in 𝐺[𝑡0, 𝑡0]) that are ′ predicted to appear in time 𝐺[𝑡1,𝑡1]
Link Prediction via Proximity
◾ Methodology:
▪ For each pair of nodes (x,y) compute score c(x,y)
▪ For example, c(x,y) could be the # of common neighbors of x and y
▪ Sort pairs (x,y) by the decreasing score c(x,y) ▪ Predict top n pairs as new links
▪ See which of these links actually
appearin𝐺[𝑡 ,𝑡′] 11
Link-level Features: Overview
◾ Distance-based feature
◾ Local neighborhood overlap ◾ Global neighborhood overlap
Link feature
Distance-based Features
Shortest-path distance between two nodes ◾ Example:
B F𝑆=𝑆=𝑆=2 𝐵𝐻 𝐵𝐸 𝐴𝐵
𝑆𝐵𝐺 =𝑆𝐵𝐹 =3
◾ However, this does not capture the degree of neighborhood overlap:
▪ Node pair (B, H) has 2 shared neighboring nodes, while pairs (B, E) and (A, B) only have 1 such node.
Local Neighborhood Overlap
Captures # neighboring nodes shared between two nodes 𝒗𝟏 and 𝒗𝟐:
◾ Common neighbors: |𝑁 𝑣1 ∩ 𝑁 𝑣2 | ▪Example: 𝑁𝐴∩𝑁𝐵 = 𝐶 =1
◾ Jaccard’s coefficient: |𝑁 𝑣1 ∩𝑁 𝑣2 | |𝑁 𝑣1 ∪𝑁 𝑣2 |
▪Example:𝑁𝐴∩𝑁𝐵 = {𝐶} =1 𝑁 𝐴 ∪𝑁 𝐵 {𝐶,𝐷} 2
◾ Adamic-Adar index:
▪Example: 1 = 1 log(𝑘𝐶) log 4
σ𝑢∈𝑁 𝑣1 ∩𝑁 𝑣2
Global Neighborhood Overlap
◾ Limitation of local neighborhood features:
▪ Metric is always zero if the two nodes do not have
any neighbors in common.
𝑁𝐴 ∩ 𝑁𝐸 = 𝜙 |𝑁𝐴 ∩ 𝑁𝐸| = 0
▪ However, the two nodes may still potentially be connected in the future.
◾ Globalneighborhoodoverlapmetricsresolvethelimitation by considering the entire graph.
Global Neighborhood Overlap
◾ Katz index: count the number of walks of all lengths between a given pair of nodes.
◾ Q: How to compute #walks between two nodes?
◾ Use powers of the graph adjacency matrix!
Intuition: Powers of Adj Matrices
◾ Computing #walks between two nodes ▪Recall:𝑨𝑢𝑣 =1if𝑢∈𝑁(𝑣)
▪ Let 𝑷(𝑲) = #walks of length 𝑲 between 𝒖 and 𝒗 𝒖𝒗
▪ We will show 𝑷(𝑲) = 𝑨𝒌
▪ 𝑷(𝟏) = #walks of length 1 (direct neighborhood)
𝑷(𝟏) = 𝑨𝟏𝟐 𝟏𝟐
between 𝑢 and 𝑣 = 𝑨𝒖𝒗
Intuition: Powers of Adj Matrices
◾ How to compute ?
▪ Step 1: Compute #walks of length 1 between each
of 𝒖’s neighbor and 𝒗
▪ Step 2: Sum up these #walks across u’s neighbors
▪𝑷(𝟐)=Σ𝑨 ∗𝑷(𝟏)=Σ𝑨 ∗𝑨 =𝑨𝟐 𝒖𝒗 𝒊 𝒖𝒊 𝒊𝒗 𝒊 𝒖𝒊 𝒊𝒗 𝒖𝒗
Power of adjacency
Node 1’s neighbors
#walks of length 1 between Node 1’s neighbors and Node 2
𝑷(𝟐) = 𝑨2 𝟏𝟐 12
Global Neighborhood Overlap
◾ Katz index: count the number of walks of all lengths between a pair of nodes.
◾ How to compute #walks between two nodes? ◾ Use adjacency matrix powers!
▪ 𝑨𝑢𝑣 specifies #walks of length 1 (direct neighborhood) between 𝑢 and 𝑣.
▪ 𝑨𝟐 specifies #walks of length 2 (neighbor of 𝑢𝑣
neighbor) between 𝑢 and 𝑣.
▪ And, 𝑨𝒍 specifies #walks of length 𝒍. 𝑢𝑣
Global Neighborhood Overlap
◾ Katz index between 𝑣1 and 𝑣2 is calculated as Sum over all walk lengths
#walks of length 𝑙 between 𝑣1 and 𝑣2
0 < 𝛽 < 1: discount factor
◾ Katz index matrix is computed in closed-form:
=σ∞ 𝛽𝑖𝑨𝑖 𝑖=0
by geometric series of matrices
Link-level Features: Summary
◾ Distance-based features:
▪ Uses the shortest path length between two nodes but
does not capture how neighborhood overlaps.
◾ Local neighborhood overlap:
▪ Captures how many neighboring nodes are shared by two
▪ Becomes zero when no neighbor nodes are shared.
◾ Global neighborhood overlap:
▪ Uses global graph structure to score two nodes.
▪ Katz index counts #walks of all lengths between two nodes. 53
Learning Outcomes
◾ Traditional ML Pipeline
▪ Hand-crafted feature + ML model
◾ Hand-crafted features for graph data ▪ Node-level:
▪ Node degree, centrality, clustering coefficient, graphlets
▪ Link-level:
▪ Distance-based feature
▪ local/global neighborhood overlap
Acknowledgement: , Stanford University
Classic Graph ML Tasks Node classification:
Predict a property of a node
Example: Categorize online users / items Link prediction:
Predict whether there are missing links
between two nodes
Example: Knowledge graph completion
Graph classification:
Categorize different graphs
Example: Molecule property prediction
Different Types of Tasks
Protein Folding Problem
Computationally predict a protein’s 3D structure based solely on its amino acid sequence
Image credit: DeepMind
Spatial graph
AlphaFold: Impact
Image credit:
SingularityHub
Image credit: DeepMind
Example 2: Drug Side Effects
Polypharmacy is the use of drug combinations and is commonly used for treating complex and terminal diseases.
46% of people ages 70-79 take more than 5 drugs Many patients take more than 20 drugs to treat
heart disease, depression, insomnia, etc.
Despite its effectiveness in many cases, it poses high
risks of adverse side effects.
30% , 65% prob. prob.
Zitnik et al., Modeling Polypharmacy Side Effects with Graph Convolutional Networks, Bioinformatics 2018
Link Prediction: Biomedical Graph
Nodes: Drugs & Proteins Edges: Interactions
Query: How likely will Simvastatin and Ciprofloxacin, when taken together, break down muscle tissue?
Zitnik et al., Modeling Polypharmacy Side Effects with Graph Convolutional Networks, Bioinformatics 2018
Link Prediction: Biomedical Graph
Evidence found
Acknowledgement: , Stanford University
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com