COMP5329 Deep Learning Week 10 – Graph Convolutional Networks
Graph Convolutional Networks (GCNs) are to learn a function of signals/features on a graph
G = (V , E), which takes as input:
• A feature description xi for every node i; summarized in a N ×D feature matrix X (N :
number of nodes, D: number of input features)
• A representative description of the graph structure in matrix form; typically in the form
of an adjacency matrix A (or some function thereof).
Every neural network layer can then be written as a non-linear function
(1) H l+1 = f(H l, A),
with H0 = X and H l = Z (or z for graph-level outputs), L being being the number of layers.
The specific models then differ only in how f(·, ·) is chosen and parameterized.
1. Spacial Graph ConvNets
As an example, let’s consider the following very simple form of a layer-wise propagation rule:
(2) f(H l, A) = σ(AH lW l),
where W l is a weight matrix for the l-th neural network layer and σ(·) is a non-linear activation
function like the ReLU.
Multiplication with A means that, for every node, we sum up all the feature vectors of all
neighboring nodes but not the node itself (unless there are self-loops in the graph). We can
“fix” this by enforcing self-loops in the graph: we simply add the identity matrix to A.
Another limitation of Eq. (2) is that A is typically not normalized and therefore the multipli-
cation with A will completely change the scale of the feature vectors (we can understand that
by looking at the eigenvalues of A). Normalizing A such that all rows sum to one, i.e. D−1A,
where D is the diagonal node degree matrix, gets rid of this problem. Multiplying with D−1A
now corresponds to taking the average of neighboring node features. In practice, dynamics get
more interesting when we use a symmetric normalization, i.e. D−1/2AD−1/2 (as this no longer
amounts to mere averaging of neighboring nodes). Combining these two tricks, we essentially
arrive at the propagation rule introduced in [1]:
(3) f(H l, A) = σ(D̂−1/2ÂD̂−1/2H lW l),
where  = A+ I, where I is the the identity matrix and D̂ is the diagonal node degree matrix
of Â.
2. Spectral Graph ConvNets
We define the graph (normalized) Laplacian as
(4) ∆ = I −D−1/2AD−1/2.
The Laplacian is interpreted as the measurement of smoothness of graph, in other words, the
difference between the local value node hi and its neighborhood average value of node hj’s.
The following is the eigen-decomposition of graph Laplacian,
(5) ∆ = ΦTΛΦ,
where Φ contains column vectors, or Lap eigenvectors φi to φn, each of size n × 1, and those
are also called Fourier functions. And Fourier functions form an orthonormal basis, Φ =
[φ1, · · · , φn] and ΦTΦ = I. Λ is a diagonal matrix with Laplacian eigenvalues, and on the diag-
onal are λ1 to λn.
1
The Fourier transform is basically projecting a function h on the Fourier functions, and the
result are the coefficients of the Fourier series,
(6) F(h) = ΦTh = ĥ.
Inverse fourier transform gives
(7) F−1(ĥ) = Φĥ = ΦΦTh = h
Fourier transform of the convolution of two functions is the pointwise product of their Fourier
transforms.
We define a graph spectral convolutional layer such that given layer hl, the activation of the
next layer is:
(8) hl+1 = σ(wl ∗ hl),
where σ represents a nonlinear activation and wl is a spatial filter. wl∗hl is equivalent to ŵl(∆)hl,
where ŵl represents a spectral filter and ∆ is the Laplacian. We can further decompose it into
Φŵl(Λ)ΦThl, where Φ is the eigenvector matrix and Λ is the eigenvalues. This yields the final
activation equation as below.
(9) hl+1 = σ(Φŵl(Λ)ΦThl)
The objective is to learn the spectral filter ŵl using backpropagation instead of hand crafting.
References
[1] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint
arXiv:1609.02907, 2016.
2
1. Spacial Graph ConvNets
2. Spectral Graph ConvNets
References