CS计算机代考程序代写 deep learning The University of Sydney Page 1

The University of Sydney Page 1

Graph Convolutional
Networks

Dr Chang Xu

School of Computer Science

The University of Sydney Page 2

Background: Grid structured data

Natural language
processing (NLP)

Speech data

Grid games

The University of Sydney Page 3

Background: Graph data
A lot of real-world data does not ‘live’ on grids

Social networks
Citation networks
Communication networks
Multi-agent systems Molecules

Protein interaction
networksStandard deep learning architectures

like CNNs and RNNs don’t work here!

The University of Sydney Page 4

Background: Difficulties on graph data
processing

Fixed number of neighboring pixels

Implicit spatial order of neighboring pixels

Kernel with fixed size

Weights with implicit order

& &

The University of Sydney Page 5

Background: Difficulties on graph data
processing

Varying number of neighboring nodes

Non-implicit spatial order of neighboring nodes

Kernel with varying size

Weights with undecided order

3 neighboring nodes

2 neighboring nodes

The University of Sydney Page 6

Graph Convolutional Networks (GCN)

Extract features from the graph data.

The University of Sydney Page 7

Preliminaries: Representing graphs

Graphs consist of
u Nodes (or vertices)

u Edges connecting pairs of nodes

Formally

G = (V, E)

u V = {“!|$ = 1…(}: Node set, containing |V|=N nodes

u E = {*!”|”! $+ ,-..*,/*0 /- “”}: Edge set

a

cd

be

G

The University of Sydney Page 8

Preliminaries: Representing graphs

Edge representation
u Edge list

u Adjacency matrix

u Incidence matrix

1. Edge list: list of all edges

Edges:

(a,b),(a,d),(a,e),(b,c),(b,d),(b,
e),(c,d),(d,e)

a

cd

be

G

The University of Sydney Page 9

Preliminaries: Representing graphs

a

cd

be

G N = 5

2. Adjacency matrix

!=

a
b
c
d
e

a b c d e

0 1 0 1 1

1 0 1 1 1

0 1 0 1 0

1 1 1 0 1

1 1 0 1 0

The University of Sydney Page 10

Preliminaries: Representing graphs

a

cd

be

G N = 5

“!=

a
b
c
d
e

a b c d e

1 1 0 1 1

1 1 1 1 1

0 1 1 1 0

1 1 1 1 1

1 1 0 1 1

2. Adjacency matrix with self connections

The University of Sydney Page 11

Preliminaries: Representing graphs

a

cd

be

G N = 5

2. Adjacency matrix (weighted graph)

!!=

a
b
c
d
e

a b c d e

!!! !!” 0 !!# !!$
!”! !”” !”% !”# !”$
0 !%” !%% !%# 0
!#! !#” !#% !## !#$
!$! !$” 0 !$# !$$

The University of Sydney Page 12

Preliminaries: Representing graphs

a

cd

be

G N = 5

#=

a
b
c
d
e

a b c d e

3 0 0 0 0

0 4 0 0 0

0 0 2 0 0

0 0 0 4 0

0 0 0 0 3

Degree matrix: Denoting degree of each node

The University of Sydney Page 13

Preliminaries: Basic properties

Dense graph: ” ≈ $ &
Or ” = &( $ &)

Sparse graph: ” ≈ $
Or ” = &( $ )

A large fraction of pairs of nodes
are connected by edges

Each node has only a few edges

The University of Sydney Page 14

Preliminaries: Basic properties

Directed graph: Each edge has a
direction

Undirected graph: No direction is
associated with edges

a

cd

be

a

cd

be

The University of Sydney Page 15

Preliminaries: Representing graphs

N = 5

Adjacency matrix of directed graph could be asymmetric

a

cd

be

!=

a
b
c
d
e

a b c d e

0 0 0 0 0

1 0 1 1 1

0 0 0 1 0

1 0 0 0 1

1 0 0 0 0

The University of Sydney Page 16

Preliminaries: Basic properties
Connected component: A subgraph in which any two nodes are connected
to each other by paths.

Theorem: The nodes of a graph G can be partitioned into connected
components so that a node v is reachable from w if and only if they are in the
same connected components

4 connected components exist in the graph below

The University of Sydney Page 17

Preliminaries: Representing graphs

– G = (V, E)

– V = {*’|, = 1…/}: Node set, containing |V|=N nodes
– E = {1′(|*’ ,2 345513617 64 *(}: Edge set
– 8 ∈ :)×#: Node attribute matrix
– Adjacency matrix: ; ∈ :)×), ;'( ∈ {0,1}, denoting the existence of 1′(
– >): Identity matrix, denoting self-connections
– Adjacency matrix with self-connections: ?; = ; + >)
– Degree: number of edges connected to a node

– Degree matrix: A ∈ :)×) (computed from ;), diagonal matrix denoting the
degree of each node. And BA ∈ :)×) is computed from ?;

List of basics notations

The University of Sydney Page 18

Convolution in CNN

Update for a single pixel:
• Transform messages individually !!ℎ!
• Add everything up ∑!!!ℎ!

Single CNN layer
with 3×3 filter:

ℎ! ∈ %” are (hidden layer) activations of a pixel/node

ℎ#
$%& = 2(4′

$ ℎ’
$ +4&

$ ℎ&
$ +⋯+4(

$ ℎ(
$ )

Full update:

The University of Sydney Page 19

Convolution in GCN

– Adjacency matrix with self-connections: !” = ” + %C
– Degree: number of edges connected to a node

– Degree matrix: &’ ∈ )C×C (computed from !”)

!(“#$) = #(%&&
$
‘ ‘(%&&

$
‘!”)”)

The University of Sydney Page 20

Why !(“#$) = #(%&&$’ ‘(%&&$’!”)”)

The University of Sydney Page 21

Graph convolutional networks (Spatial approach)

Update
rule: ℎ’

+,- = F(ℎ’+ G. + + H
(∈)!

1
3′(
ℎ(+ G- + )

Consider this
undirected graph:

Calculate update for
node in red:

Desirable properties:

• Weight sharing over all locations

• Invariance to permutations

• Linear complexity O(E)

• Applicable both in transductive

and inductive settings

Limitations:

• Requires gating mechanism /

residual connections for depth

• Only indirect support for edge

features/’: neighbor indices 3′(: norm. constant
(fixed/trainable)

The University of Sydney Page 22

Graph convolutional networks: A concrete model

= ×

a
b
c
d
e

0 1 0 1 1

1 0 1 1 1

0 1 0 1 0

1 1 1 0 1

1 1 0 1 0

Focus on updated a’

a’
b’
c’
d’
e’

a‘=b+d+e

a
b
c
d
e

0

1

0

1

1

Only neighboring nodes of a are retained

Unconnected nodes are masked out

a

cd

be

G

I

The University of Sydney Page 23

Graph convolutional networks: A concrete model

= ×

a
b
c
d
e

1 1 0 1 1

1 1 1 1 1

0 1 1 1 0

1 1 1 1 1

1 1 0 1 1

a’
b’
c’
d’
e’

a‘=b+d+e

a
b
c
d
e

1

1

0

1

1

89 = 9 + 🙂

Self-loops are added into adjacency matrix

Central nodes are included in convolution
!(“#$) = #( ‘(!”)”)

The University of Sydney Page 24

!(“#$) = #( ‘(!”)”)

Normalizing the Feature Representations

The feature representations can be normalized by node degree by

transforming the adjacency matrix !” by multiplying it with the inverse
degree matrix &’.

1 1 0 1 1

1 1 1 1 1

0 1 1 1 0

1 1 1 1 1

1 1 0 1 1

0.25 0.25 0 0.25 0.25

0.20 0.20 0.20 0.20 0.20

4 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 5 0

0 0 0 0 4

!(“#$) = #(%&&$ ‘(!”)”)

-1

0 0.33 0.33 0.33 0

0.20 0.20 0.20 0.20 0.20

0.25 0.25 0 0.25 0.25

The University of Sydney Page 25

Normalizing the Feature Representations

1 1 0 1 1

1 1 1 1 1

0 1 1 1 0

1 1 1 1 1

1 1 0 1 1

4 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 5 0

0 0 0 0 4

“(;<=) = $(&'>
=
? ()&’>

=
?”;*;)

−12
4 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 5 0

0 0 0 0 4

−12

0.25 0.22 0 0.22 0.25

0.22 0.20 0.26 0.20 0.22

0 0.26 0.33 0.26 0

0.22 0.20 0.26 0.20 0.22

0.25 0.22 0 0.22 0.25

A symmetric normalization

The University of Sydney Page 26

Normalizing the Feature Representations

&’LM.O !”&’LM.O+ P
= &’LM.O !” P&’

LM.O+

= ,
Q
&’PQ

LM.O !”P &’LM.O+

= &’PP
LM.O,

R
!”PR,

Q
&’PQ

LM.O+R

= &’PP
LM.O,

R
!”PR&’RR

LM.O+R

=,
R

1
&’PP&’RR

!”PR+R

The University of Sydney Page 27

Graph convolutional networks: A concrete model

Semi-Supervised Classification with Graph Convolutional Networks (Kipf & Welling)

Input: Feature matrix 8 ∈ :)×0, preprocessed adjacency matrix ?;
Hidden layer Hidden layer

Input

ReLU

Output

ReLU

8 = S . T = S )

24U6VWX(Y1)
Node classification:

Graph classification:

Link prediction:

24U6VWX(H
1
Y1)

e.g. Duvenaud et al. (NIPS 2015)

e.g. Kipf & Welling (ICLR 2017)

Z ;'( = F Y’2Y(
Kipf & Welling (NIPS BDL 2016)
“Graph Auto-Encoders””(;<=) = $(&'>

=
? ()&’>

=
?”;*;)

The University of Sydney Page 28

Spectral approach

The University of Sydney Page 29

Graph convolutional networks (Spectral approach)

@ = A − 9Graph Laplacian:

@′ = 🙂 − A
*&+9A*

&
+Normalized Graph Laplacian:

https://en.wikipedia.org/wiki/Laplacian_matrix

The University of Sydney Page 30

Convolution theorem: Under suitable conditions, the Fourier (Laplace)
transform of a convolution of two signals is the pointwise product of

their Fourier (Laplace) transforms.

Fourier transform

Given a graph . and a convolutional filter (with trainable parameters)
ℎ, the convolution can be calculated by

$ ∗ ℎ = ℱ”#[ “$(+)-ℎ(+)]

The University of Sydney Page 31

Fourier transform

The classic Fourier transform

is the expansion of . in terms of the complex exponentials (0[\P]^);
the expansion results are the eigenfunctions of 1-d Laplace operator

△:

+, = -,

The University of Sydney Page 32

Fourier transform on the graph
The classic Fourier transform

Graph Fourier transform !.: of any . ∈ ℝC, of all vertices of 3,
expansion of . :

!. 4_ = ., 6_ =,
P

C
. 7 6_∗(7)

:ℎ 4_ = ℎ, 6_ =,
P

C
ℎ 7 6_∗(7)

Similarly,

‘. = /(.

0ℎ = /(ℎ
The inverse graph Fourier transform is then given by:

. 7 = ,
_aM

CLb
!. 4_ 6_(7) . = / ‘.

The University of Sydney Page 33

Convolution theorem: Under suitable conditions, the Fourier (Laplace)
transform of a convolution of two signals is the pointwise product of

their Fourier (Laplace) transforms.

Given a graph . and a convolutional filter (with trainable parameters)
ℎ, the convolution can be calculated by

$ ∗ ℎ = ℱ”#[ “$(+)-ℎ(+)]

$ ∗ ℎ $ = / /%$ ⨀ /%ℎ

‘. = /(. 0ℎ = /(ℎ . = / ‘.

Elementwise multiplication

Graph convolution

The University of Sydney Page 34

$ ∗ ℎ $ = / /%$ ⨀ /%ℎ

Version 1.0 (Spectral Networks and Locally Connected Networks on Graphs)

2 = # /3) Λ /(5

1& Λ =
3# 0 0
0 ⋱ 0
0 0 3’

– Calculate the
multiplication between
;, =

0M1M3

– There are only K parameters to train, and usually > ≪ @. The
complexity has been decreased.

– No need to conduct eigen decomposition. Explicitly depend on
the Laplacian matrix L.

– Spatial localization. In particular, K denotes the receptive field.

Graph convolutional network

The University of Sydney Page 37

K=0, the convolution filter is

K=1, the convolution filter is

K=2, the convolution filter is

Graph convolutional network
Local Connectivity + Parameter Sharing

The University of Sydney Page 38

With Chebyshev expansion, we have

e34 ≈H
(5.

6
f(4g((hΛ)

Chebyshev polynomials: g( X = 2Xg(7- X − g(7&(X),
with g. X = 1 and g- X = X

Recall:

hΛ = 2Λj8!9
− >)where

k = F le3 Λ l2X = F lH
(5.

6
f(4g((hΛ)l2X = F H

(5.

6
f(4g((lhΛl2) X

= F H
(5.

6
f(4g((mn) X

A = B ;