The University of Sydney Page 1
Graph Convolutional
Networks
Dr Chang Xu
School of Computer Science
The University of Sydney Page 2
Background: Grid structured data
Natural language
processing (NLP)
Speech data
Grid games
The University of Sydney Page 3
Background: Graph data
A lot of real-world data does not ‘live’ on grids
Social networks
Citation networks
Communication networks
Multi-agent systems Molecules
Protein interaction
networksStandard deep learning architectures
like CNNs and RNNs don’t work here!
The University of Sydney Page 4
Background: Difficulties on graph data
processing
Fixed number of neighboring pixels
Implicit spatial order of neighboring pixels
Kernel with fixed size
Weights with implicit order
& &
The University of Sydney Page 5
Background: Difficulties on graph data
processing
Varying number of neighboring nodes
Non-implicit spatial order of neighboring nodes
Kernel with varying size
Weights with undecided order
&
&
3 neighboring nodes
2 neighboring nodes
The University of Sydney Page 6
Graph Convolutional Networks (GCN)
Extract features from the graph data.
The University of Sydney Page 7
Preliminaries: Representing graphs
Graphs consist of
u Nodes (or vertices)
u Edges connecting pairs of nodes
Formally
G = (V, E)
u V = {“!|$ = 1…(}: Node set, containing |V|=N nodes
u E = {*!”|”! $+ ,-..*,/*0 /- “”}: Edge set
a
cd
be
G
The University of Sydney Page 8
Preliminaries: Representing graphs
Edge representation
u Edge list
u Adjacency matrix
u Incidence matrix
1. Edge list: list of all edges
Edges:
(a,b),(a,d),(a,e),(b,c),(b,d),(b,
e),(c,d),(d,e)
a
cd
be
G
The University of Sydney Page 9
Preliminaries: Representing graphs
a
cd
be
G N = 5
2. Adjacency matrix
!=
a
b
c
d
e
a b c d e
0 1 0 1 1
1 0 1 1 1
0 1 0 1 0
1 1 1 0 1
1 1 0 1 0
The University of Sydney Page 10
Preliminaries: Representing graphs
a
cd
be
G N = 5
“!=
a
b
c
d
e
a b c d e
1 1 0 1 1
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
1 1 0 1 1
2. Adjacency matrix with self connections
The University of Sydney Page 11
Preliminaries: Representing graphs
a
cd
be
G N = 5
2. Adjacency matrix (weighted graph)
!!=
a
b
c
d
e
a b c d e
!!! !!” 0 !!# !!$
!”! !”” !”% !”# !”$
0 !%” !%% !%# 0
!#! !#” !#% !## !#$
!$! !$” 0 !$# !$$
The University of Sydney Page 12
Preliminaries: Representing graphs
a
cd
be
G N = 5
#=
a
b
c
d
e
a b c d e
3 0 0 0 0
0 4 0 0 0
0 0 2 0 0
0 0 0 4 0
0 0 0 0 3
Degree matrix: Denoting degree of each node
The University of Sydney Page 13
Preliminaries: Basic properties
Dense graph: ” ≈ $ &
Or ” = &( $ &)
Sparse graph: ” ≈ $
Or ” = &( $ )
A large fraction of pairs of nodes
are connected by edges
Each node has only a few edges
The University of Sydney Page 14
Preliminaries: Basic properties
Directed graph: Each edge has a
direction
Undirected graph: No direction is
associated with edges
a
cd
be
a
cd
be
The University of Sydney Page 15
Preliminaries: Representing graphs
N = 5
Adjacency matrix of directed graph could be asymmetric
a
cd
be
!=
a
b
c
d
e
a b c d e
0 0 0 0 0
1 0 1 1 1
0 0 0 1 0
1 0 0 0 1
1 0 0 0 0
The University of Sydney Page 16
Preliminaries: Basic properties
Connected component: A subgraph in which any two nodes are connected
to each other by paths.
Theorem: The nodes of a graph G can be partitioned into connected
components so that a node v is reachable from w if and only if they are in the
same connected components
4 connected components exist in the graph below
The University of Sydney Page 17
Preliminaries: Representing graphs
– G = (V, E)
– V = {*’|, = 1…/}: Node set, containing |V|=N nodes
– E = {1′(|*’ ,2 345513617 64 *(}: Edge set
– 8 ∈ :)×#: Node attribute matrix
– Adjacency matrix: ; ∈ :)×), ;'( ∈ {0,1}, denoting the existence of 1′(
– >): Identity matrix, denoting self-connections
– Adjacency matrix with self-connections: ?; = ; + >)
– Degree: number of edges connected to a node
– Degree matrix: A ∈ :)×) (computed from ;), diagonal matrix denoting the
degree of each node. And BA ∈ :)×) is computed from ?;
List of basics notations
The University of Sydney Page 18
Convolution in CNN
Update for a single pixel:
• Transform messages individually !!ℎ!
• Add everything up ∑!!!ℎ!
Single CNN layer
with 3×3 filter:
ℎ! ∈ %” are (hidden layer) activations of a pixel/node
ℎ#
$%& = 2(4′
$ ℎ’
$ +4&
$ ℎ&
$ +⋯+4(
$ ℎ(
$ )
Full update:
The University of Sydney Page 19
Convolution in GCN
– Adjacency matrix with self-connections: !” = ” + %C
– Degree: number of edges connected to a node
– Degree matrix: &’ ∈ )C×C (computed from !”)
!(“#$) = #(%&&
$
‘ ‘(%&&
$
‘!”)”)
The University of Sydney Page 20
Why !(“#$) = #(%&&$’ ‘(%&&$’!”)”)
The University of Sydney Page 21
Graph convolutional networks (Spatial approach)
Update
rule: ℎ’
+,- = F(ℎ’+ G. + + H
(∈)!
1
3′(
ℎ(+ G- + )
Consider this
undirected graph:
Calculate update for
node in red:
Desirable properties:
• Weight sharing over all locations
• Invariance to permutations
• Linear complexity O(E)
• Applicable both in transductive
and inductive settings
Limitations:
• Requires gating mechanism /
residual connections for depth
• Only indirect support for edge
features/’: neighbor indices 3′(: norm. constant
(fixed/trainable)
The University of Sydney Page 22
Graph convolutional networks: A concrete model
= ×
a
b
c
d
e
0 1 0 1 1
1 0 1 1 1
0 1 0 1 0
1 1 1 0 1
1 1 0 1 0
Focus on updated a’
a’
b’
c’
d’
e’
a‘=b+d+e
a
b
c
d
e
0
1
0
1
1
Only neighboring nodes of a are retained
Unconnected nodes are masked out
a
cd
be
G
I
The University of Sydney Page 23
Graph convolutional networks: A concrete model
= ×
a
b
c
d
e
1 1 0 1 1
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
1 1 0 1 1
a’
b’
c’
d’
e’
a‘=b+d+e
a
b
c
d
e
1
1
0
1
1
89 = 9 + 🙂
Self-loops are added into adjacency matrix
Central nodes are included in convolution
!(“#$) = #( ‘(!”)”)
The University of Sydney Page 24
!(“#$) = #( ‘(!”)”)
Normalizing the Feature Representations
The feature representations can be normalized by node degree by
transforming the adjacency matrix !” by multiplying it with the inverse
degree matrix &’.
1 1 0 1 1
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
1 1 0 1 1
0.25 0.25 0 0.25 0.25
0.20 0.20 0.20 0.20 0.20
4 0 0 0 0
0 5 0 0 0
0 0 3 0 0
0 0 0 5 0
0 0 0 0 4
!(“#$) = #(%&&$ ‘(!”)”)
-1
0 0.33 0.33 0.33 0
0.20 0.20 0.20 0.20 0.20
0.25 0.25 0 0.25 0.25
The University of Sydney Page 25
Normalizing the Feature Representations
1 1 0 1 1
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
1 1 0 1 1
4 0 0 0 0
0 5 0 0 0
0 0 3 0 0
0 0 0 5 0
0 0 0 0 4
“(;<=) = $(&'>
=
? ()&’>
=
?”;*;)
−12
4 0 0 0 0
0 5 0 0 0
0 0 3 0 0
0 0 0 5 0
0 0 0 0 4
−12
0.25 0.22 0 0.22 0.25
0.22 0.20 0.26 0.20 0.22
0 0.26 0.33 0.26 0
0.22 0.20 0.26 0.20 0.22
0.25 0.22 0 0.22 0.25
A symmetric normalization
The University of Sydney Page 26
Normalizing the Feature Representations
&’LM.O !”&’LM.O+ P
= &’LM.O !” P&’
LM.O+
= ,
Q
&’PQ
LM.O !”P &’LM.O+
= &’PP
LM.O,
R
!”PR,
Q
&’PQ
LM.O+R
= &’PP
LM.O,
R
!”PR&’RR
LM.O+R
=,
R
1
&’PP&’RR
!”PR+R
The University of Sydney Page 27
Graph convolutional networks: A concrete model
Semi-Supervised Classification with Graph Convolutional Networks (Kipf & Welling)
Input: Feature matrix 8 ∈ :)×0, preprocessed adjacency matrix ?;
Hidden layer Hidden layer
Input
ReLU
Output
ReLU
8 = S . T = S )
24U6VWX(Y1)
Node classification:
Graph classification:
Link prediction:
24U6VWX(H
1
Y1)
e.g. Duvenaud et al. (NIPS 2015)
e.g. Kipf & Welling (ICLR 2017)
Z ;'( = F Y’2Y(
Kipf & Welling (NIPS BDL 2016)
“Graph Auto-Encoders””(;<=) = $(&'>
=
? ()&’>
=
?”;*;)
The University of Sydney Page 28
Spectral approach
The University of Sydney Page 29
Graph convolutional networks (Spectral approach)
@ = A − 9Graph Laplacian:
@′ = 🙂 − A
*&+9A*
&
+Normalized Graph Laplacian:
https://en.wikipedia.org/wiki/Laplacian_matrix
The University of Sydney Page 30
Convolution theorem: Under suitable conditions, the Fourier (Laplace)
transform of a convolution of two signals is the pointwise product of
their Fourier (Laplace) transforms.
Fourier transform
Given a graph . and a convolutional filter (with trainable parameters)
ℎ, the convolution can be calculated by
$ ∗ ℎ = ℱ”#[ “$(+)-ℎ(+)]
The University of Sydney Page 31
Fourier transform
The classic Fourier transform
is the expansion of . in terms of the complex exponentials (0[\P]^);
the expansion results are the eigenfunctions of 1-d Laplace operator
△:
+, = -,
The University of Sydney Page 32
Fourier transform on the graph
The classic Fourier transform
Graph Fourier transform !.: of any . ∈ ℝC, of all vertices of 3,
expansion of . :
!. 4_ = ., 6_ =,
P
C
. 7 6_∗(7)
:ℎ 4_ = ℎ, 6_ =,
P
C
ℎ 7 6_∗(7)
Similarly,
‘. = /(.
0ℎ = /(ℎ
The inverse graph Fourier transform is then given by:
. 7 = ,
_aM
CLb
!. 4_ 6_(7) . = / ‘.
The University of Sydney Page 33
Convolution theorem: Under suitable conditions, the Fourier (Laplace)
transform of a convolution of two signals is the pointwise product of
their Fourier (Laplace) transforms.
Given a graph . and a convolutional filter (with trainable parameters)
ℎ, the convolution can be calculated by
$ ∗ ℎ = ℱ”#[ “$(+)-ℎ(+)]
$ ∗ ℎ $ = / /%$ ⨀ /%ℎ
‘. = /(. 0ℎ = /(ℎ . = / ‘.
Elementwise multiplication
Graph convolution
The University of Sydney Page 34
$ ∗ ℎ $ = / /%$ ⨀ /%ℎ
Version 1.0 (Spectral Networks and Locally Connected Networks on Graphs)
2 = # /3) Λ /(5
1& Λ =
3# 0 0
0 ⋱ 0
0 0 3’
– Calculate the
multiplication between
;,
0M1M3
– There are only K parameters to train, and usually > ≪ @. The
complexity has been decreased.
– No need to conduct eigen decomposition. Explicitly depend on
the Laplacian matrix L.
– Spatial localization. In particular, K denotes the receptive field.
Graph convolutional network
The University of Sydney Page 37
K=0, the convolution filter is
K=1, the convolution filter is
K=2, the convolution filter is
Graph convolutional network
Local Connectivity + Parameter Sharing
The University of Sydney Page 38
With Chebyshev expansion, we have
e34 ≈H
(5.
6
f(4g((hΛ)
Chebyshev polynomials: g( X = 2Xg(7- X − g(7&(X),
with g. X = 1 and g- X = X
Recall:
hΛ = 2Λj8!9
− >)where
k = F le3 Λ l2X = F lH
(5.
6
f(4g((hΛ)l2X = F H
(5.
6
f(4g((lhΛl2) X
= F H
(5.
6
f(4g((mn) X
A = B ;