Graph Anomaly Detection
COMP90073 Security Analytics
, CIS Semester 2, 2021
Outline
• Whygraphs?
• Extractingfeaturesfromgraphs
• Randomwalk
• GraphConvolutionalNetworks(GCNs)
COMP90073 Security Analytics © University of Melbourne 2021
All Real-world Data Does Not “Live” on Grid
Internet
IoT networks
COMP90073 Security Analytics © University of Melbourne 2021
Outliers vs. Graph Anomalies
• Anomaliesanswerthequestionofwhatisinterestingaboutanetwork • Cannotalwaysbetreatedaspointslyinginamulti-dimensionalspace
independently
• Theymayexhibitinter-dependencies
Figure. Graph-based anomaly detection (left) vs. Point-based anomaly detection (right).
COMP90073 Security Analytics © University of Melbourne 2021
Automating Botnet Detection with Graph Neural Networks [3]
Figure: An example of CAIDA networks embedded with a synthetic P2P botnet. Most of the red botnet nodes are able to reach the rest of the botnet within several hops. The botnet has a faster mixing rate than the background network.
COMP90073 Security Analytics © University of Melbourne 2021
Why Graphs?
• Inter-dependentnatureofthedata:Dataobjectsareoftenrelatedtoeach other and exhibit dependencies. Most relational data can be thought of as inter- dependent, which necessitates to account for related objects in finding anomalies.
• Powerfulrepresentation:Themultiplepathslyingbetweenobjectseffectively capture their long-range correlations. Moreover, a graph representation facilitates the representation of rich datasets enabling the incorporation of node and edge attributes/types
• Relationalnatureofproblemdomains:Thenatureofanomaliescouldexhibit themselves as relational.
• Robustmachinery:Onecouldarguethatgraphsserveasmoreadversarially robust tools.
COMP90073 Security Analytics © University of Melbourne 2021
Graph-specific Challenges
• Inter-dependentObjects:Therelationalnatureofthedatamakesit challenging to quantify the anomalousness of graph objects. While in traditional anomaly detection, the objects or data points are treated as independent and identically distributed (i.i.d.) from each other, the objects in graph data have long-range correlations.
• VarietyofDefinitions:Thedefinitionsofanomaliesingraphsaremuchmore diverse than in traditional anomaly detection, given the rich representation of graphs.
• SizeofSearchSpace:Theenumerationofpossiblesubstructuresis combinatorial which makes the problem of finding out the anomalies a much harder task. This search space is enlarged even more when the graphs are attributed as the possibilities span both the graph structure and the attribute space.
COMP90073 Security Analytics © University of Melbourne 2021
Machine Learning Lifecycle
Raw Data Feature Learning Model Algorithm
COMP90073 Security Analytics © University of Melbourne 2021
Features From Graphs
Aim: Efficient task-independent feature learning for machine learning with graphs!
COMP90073 Security Analytics © University of Melbourne 2021
Features From Graphs
Aim: Efficient task-independent feature learning for machine learning with graphs!
COMP90073 Security Analytics © University of Melbourne 2021
Features From Graphs
Aim: Efficient task-independent feature learning for machine learning with graphs!
Basic approaches to extract graph features: • Adjacencymatrix
• node:degree
• pairs:#ofcommonneighbours
• groups:clusterassignments
COMP90073 Security Analytics © University of Melbourne 2021
Features From Graphs
Aim: Efficient task-independent feature learning for machine learning with graphs!
Basic approaches to extract graph features: • Adjacencymatrix
• node:degree
• pairs:#ofcommonneighbours
• groups:clusterassignments
COMP90073 Security Analytics © University of Melbourne 2021
Node Embedding (Representation Learning)
• Idea:Mapeachnodeinanetworkintoalow-dimensionalspace
𝑢
𝑓: 𝑢 → R!
𝑢
𝑣
R!
vector representation
𝑣
COMP90073 Security Analytics © University of Melbourne 2021
Why Node Embedding?
• Encodenetworkinformationandgeneratenoderepresentation
• Distributedrepresentationsfornodes
• Similarityofembeddingsbetweennodesindicatestheirnetworksimilarity
COMP90073 Security Analytics © University of Melbourne 2021
Illustrative Example of Node Embedding
Input
Output
COMP90073 Security Analytics © University of Melbourne 2021
Why Node Embedding is Hard?
• ModernmachinelearningtoolboxesaredesignedfordatadefinedonEuclidean domains (with simple sequences or grids).
COMP90073 Security Analytics © University of Melbourne 2021
Why Node Embedding is Hard?
• Butgraphsarefarmorecomplex!
• Complextopographicalstructure(i.e.,nospatiallocalitylikegrids)
• Nofixednodeorderingorreferencepoint(i.e.,theisomorphismproblem) • Oftendynamicandhavemultimodalfeatures.
COMP90073 Security Analytics © University of Melbourne 2021
Applications of Graph Embeddings
• NodeRelatedApplications:NodeClassificationandsemi-supervised learning, Anomaly detection, Node Recommendation/Retrieval/Ranking, Clustering and community detection.
• EdgeRelatedApplications:LinkPredictionandGraphReconstruction.
• GraphRelatedApplication:GraphClassification,Visualizationandpattern
discovery, Network compression.
Input Graph Node Embedding Edge Embedding Graph Embedding
COMP90073 Security Analytics © University of Melbourne 2021
Node Embedding
• Aim:Encodenodessothatsimilarityintheembeddingspace(e.g.,dot product) approximates similarity in the original graph
𝑢
𝑣
ENC(𝑢) ENC(𝑣)
𝑍𝑢 𝑍𝑣
Original graph
Encode nodes
Embedding space
COMP90073 Security Analytics © University of Melbourne 2021
Node Embedding
𝑢
𝑣
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑢, 𝑣) ≈ 𝑍, 0 𝑍-
ENC(𝑢) ENC(𝑣)
𝑍𝑢 𝑍𝑣
Original graph
Encode nodes
Embedding space
COMP90073 Security Analytics © University of Melbourne 2021
Learning Node Embeddings
1. Define a node similarity function (i.e., a measure of similarity in the original graph)
– Similarityfunction:specifieshowtherelationshipsinvectorspacemap to the relationships in the original graph
2. Define an encoder (i.e., a mapping from nodes to embeddings) – Encoder:mapseachnodetoalowdimensionalvector
ENC𝑢 =𝑍”
d dimensional
3. Optimize the parameters of the encoder so that: 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑢, 𝑣) ≈ 𝑍) 0 𝑍*
COMP90073 Security Analytics © University of Melbourne 2021
Random Walk
Assume we have a graph 𝐺:
• 𝑉isthevertexset.
• 𝑨istheadjacencymatrix(assumebinary).
• Nonodefeaturesorextrainformationisused!
General Idea of Random Walk:
• Givenagraphandastartingpoint,
• Selectaneighbourofitatrandom,
• Movetothisneighbour;
• Selectaneighbourofthispointatrandom,andmovetoit,etc.
• Estimateprobabilityofvisitingnode𝑣onarandomwalkstartingfromnode𝑢 using some random walk strategy 𝑅
𝑃+(𝑣|𝑢)
COMP90073 Security Analytics © University of Melbourne 2021
Random Walk Embedding
Aim: Build an embedding lookup, where each node is assigned to a unique d- dimensional embedding vector that preserves similarity.
• Given𝐺=(𝑉,𝐸)
• Runshortfixed-lengthrandomwalksstartingfromeachnode𝑢onthegraph
using some strategy 𝑅
• Foreachnode𝑢collect𝑁”(𝑢),thesetofnodesvisitedonrandomwalks
starting from 𝑢
• Goalistolearnamappingz:𝑢→R!
• Givennode𝑢,wewanttolearnfeaturerepresentationsthatarepredictiveof the nodes in its neighbourhood 𝑁”(𝑢)
COMP90073 Security Analytics © University of Melbourne 2021
Random Walk Embedding Optimization
Intuition: Optimize embeddings to maximize likelihood of random walk co- occurrences (using Stochastic Gradient Descent)
L = 7 7 − log(𝑃(𝑣|𝒛#)) #∈% &∈’#(#)
• Parameterize 𝑃 𝑣 𝒛# using softmax
𝑃𝑣𝒛# = exp(𝒛#B𝒛&) ∑*∈% exp(𝒛# B 𝒛*)
Predicted probability of 𝑢 and 𝑣 co-occurring on random walk
COMP90073 Security Analytics © University of Melbourne 2021
Enhancing Random Walk Optimization
L=7 7 −log #∈% &∈’#(#)
exp(𝒛# B 𝒛&) ∑*∈% exp(𝒛# B 𝒛*)
• Nestedsumovernodesresultin𝑂(𝑉+)complexity!
COMP90073 Security Analytics © University of Melbourne 2021
Enhancing Random Walk Optimization
exp(𝒛# B 𝒛&) • Nestedsumovernodesresultin𝑂(𝑉+)complexity!
L=7 7 −log #∈% &∈’#(#)
∑*∈% exp(𝒛# B 𝒛*)
COMP90073 Security Analytics © University of Melbourne 2021
Enhancing Random Walk Optimization
exp(𝒛# B 𝒛&) • Nestedsumovernodesresultin𝑂(𝑉+)complexity!
Negative sampling [7]: Instead of normalizing w.r.t. all nodes, just normalize against 𝑘 random “negative samples” 𝑛,
exp(𝒛# B 𝒛&) /
log ∑*∈%exp(𝒛#B𝒛&) ≈log𝜎𝒛#⋅𝒛& −7log𝜎𝒛#⋅𝒛* ,𝑛,~𝑃%
L=7 7 −log #∈% &∈’#(#)
Sigmoid
random distribution over all nodes
∑*∈% exp(𝒛# B 𝒛*)
,-.
COMP90073 Security Analytics © University of Melbourne 2021
Advantages of Random Walk
• Unsupervised:
– learnsfeaturesthatcapturethegraphstructureindependentofthelabels’
distribution. • Expressivity:
– Flexiblestochasticdefinitionofnodesimilarity • Efficiency:
– Doesnotneedtoconsiderallnodepairswhentraining(onlyneedto consider pairs that co-occur on random walks)
– Easytoparallelize.Severalrandomwalkers(indifferentthreads, processes, or machines) can simultaneously explore different parts of the same graph.
– Relyingoninformationobtainedfromshortrandomwalksmakeitpossible to accommodate small changes in the graph structure without the need for global re-computation.
COMP90073 Security Analytics © University of Melbourne 2021
Drawbacks of Random Walk
• O(𝑉)parametersareneeded:
– Everynodehasitsownuniqueembedding – Nosharingofparametersbetweennodes
• Inherently“transductive”:
– Cannotgenerateembeddingsfornodesthatarenotseenduringtraining
• Donotincorporatenodefeatures:
– Manygraphshavefeaturesthatwecanandshouldleverage
COMP90073 Security Analytics © University of Melbourne 2021
Features Graph Embedding
Assume we have a graph 𝐺:
• 𝑉isthevertexset.
• 𝑨istheadjacencymatrix(assumebinary). • 𝑿 ∈ R0×|%| is a matrix of node features
Aim: Generate node embeddings 𝒁 ∈ R*×3 based on local network neighbourhoods
• Capturingstructure
• Borrowingfeatureinformation
COMP90073 Security Analytics © University of Melbourne 2021
Where Should We Begin?
• Real-worldgraphs:
COMP90073 Security Analytics © University of Melbourne 2021
Naïve Approach
• Joinadjacencymatrixandfeatures
• Feedthemintoadeepneuralnet:
a bd
c
a 0101 01 b 1011 01 c 0100 11 d 1100 01
𝑨𝑿
Issues with this approach:
• Largenumberofparameters
• Notapplicabletographsofdifferentsizes • Notinvarianttonodeordering
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of Graph Convolutional Neural Network (GCN)
• RevisitingCNN:
• Goalistogeneralizeconvolutionsbeyondsimplelattices • Leveragenodefeatures/attributes(e.g.,text,images)
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of GCN: From Images to Graphs
Image: Single CNN layer with 3×3 filter:
Graph: Transform information at the neighbours and combine it: • Transform “messages” h, from neighbours: 𝑊,h,
• Addthemup:∑𝑊,h,
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of GCN: From Images to Graphs [4]
Figure. 2D Convolution vs. Graph Convolution
COMP90073 Security Analytics © University of Melbourne 2021
Multi-layer Graph Convolutional Network (GCN)
COMP90073 Security Analytics © University of Melbourne 2021
GCN
• Idea:Node’sneighbourhooddefinesitscomputationgraph
• Learnhowtopropagateinformationacrossthegraphtocomputenodefeatures
A BD
C
Input graph
G
E
F
COMP90073 Security Analytics © University of Melbourne 2021
GCN
• Idea:Node’sneighbourhooddefinesitscomputationgraph
• Learnhowtopropagateinformationacrossthegraphtocomputenodefeatures
A BD
C
Input graph
G
E
F
A
COMP90073 Security Analytics © University of Melbourne 2021
GCN
• Idea:Node’sneighbourhooddefinesitscomputationgraph
• Learnhowtopropagateinformationacrossthegraphtocomputenodefeatures
AB
G
E
F
Input graph
BD C
A
D
COMP90073 Security Analytics © University of Melbourne 2021
GCN
• Idea:Node’sneighbourhooddefinesitscomputationgraph
• Learnhowtopropagateinformationacrossthegraphtocomputenodefeatures
A ACB
G
E
F
Input graph
D
A
BD E
G
BD C
A
COMP90073 Security Analytics © University of Melbourne 2021
Neighbourhood Aggregation
Model can be of arbitrary depth:
• Nodeshaveembeddingsateachlayer
• Layer-0embeddingofnode𝑢isitsinputfeature,𝑥#
• Layer-KembeddinggetsinformationfromnodesthatareKhopsaway
Layer 0 Layer 1 Layer 2 A
ACB
G
E
F
Input graph
D
A
BD E
G
BD C
A
COMP90073 Security Analytics © University of Melbourne 2021
Information Aggregation
• Intuition:Nodesaggregateinformationfromtheirneighboursusingneural networks
• Everynodedefinesacomputationgraphbasedonitsneighbourhood! Neural networks
A ACB
G
E
F
Input graph
D
A
BD E
G
BD C
A
COMP90073 Security Analytics © University of Melbourne 2021
Neighbourhood Aggregation Functions
• Neighbourhoodaggregation:Keydistinctionsareinhowdifferentapproaches aggregate information across the layers
A
A C?B
G
E
F
Input graph
D A
BD C
?
A
B?D E
G
COMP90073 Security Analytics © University of Melbourne 2021
Neighbourhood Aggregation Functions
• Basicapproach:Averageinformationfromneighbourandapplyaneural network
Average messages from neighbours
A ACB
G
E
F
Input graph
D
A
BD E
BD C
A
G
Apply neural network
COMP90073 Security Analytics © University of Melbourne 2021
Deep Encoder Formulation
• Basicapproach:Averageneighbourmessagesandapplyaneuralnetwork h 4& = x &
/ h/#5. /5. h&=𝜎W/ 7|𝑁(𝑣)|+B/h&
#∈'(&) z & = h 6&
,∀𝑘∈1,…,𝐾
A C D
B
A
A BD E
G
COMP90073 Security Analytics © University of Melbourne 2021
Deep Encoder Formulation
• Basicapproach:Averageneighbourmessagesandapplyaneuralnetwork
h 4& = x &
Initial 0-th layer embeddings are equal to node features
/ h/#5. /5. h&=𝜎W/ 7|𝑁(𝑣)|+B/h&
#∈'(&) z & = h 6&
,∀𝑘∈1,…,𝐾
A C D
A B E
G
B
A
D
COMP90073 Security Analytics © University of Melbourne 2021
Deep Encoder Formulation
• Basicapproach:Averageneighbourmessagesandapplyaneuralnetwork
h 4& = x & /
Initial 0-th layer embeddings are equal to node features
h/#5. /5. h&=𝜎W/ 7|𝑁(𝑣)|+B/h&
Previous layer h0 embedding of 𝑣 ,∀𝑘∈1,…,𝐾
#∈'(&) z & = h 6&
A C D
A B E
G
B
D
A
COMP90073 Security Analytics © University of Melbourne 2021
Deep Encoder Formulation
• Basicapproach:Averageneighbourmessagesandapplyaneuralnetwork
h 4& = x & /
Initial 0-th layer embeddings are equal to node features
h/#5. /5. h&=𝜎W/ 7|𝑁(𝑣)|+B/h&
Previous layer h0 embedding of 𝑣 ,∀𝑘∈1,…,𝐾
z& = h6&
#∈'(&)
Average of neighbour’s previous layer embeddings
A C D
A B E
G
B
D
A
COMP90073 Security Analytics © University of Melbourne 2021
Deep Encoder Formulation
• Basicapproach:Averageneighbourmessagesandapplyaneuralnetwork
h 4& = x &
Initial 0-th layer embeddings are equal to node features
Previous layer h0 embedding of v
h/#5. /5. h&=𝜎W/ 7|𝑁(𝑣)|+B/h&
/
z& = h6& Embedding after K
,∀𝑘∈1,…,𝐾
#∈'(&)
Average of neighbour’s previous layer embeddings
A C D
A BD E
G
B
layers of neighbourhood aggregation
A
COMP90073 Security Analytics © University of Melbourne 2021
Training Model Parameters
• Wecanfeedtheseembeddingsintoanylossfunctionandrunstochastic gradient descent to train the weight parameters
h /& = 𝜎 W / 7 h /# 5 . + B / h /& 5 . #∈'(&) |𝑁(𝑣)|
Trainable parameters
• Manyaggregationscanbeperformedefficientlyby(sparse)matrixoperations.
Let 𝐻/5. = [h./5. … h*/5.]
– Mean rule: 𝐻/ = 𝐷5.𝐴𝐻/5.
– Spectral rule: 𝐻/ = 𝐷5./+𝐴𝐷./+𝐻/5.
COMP90073 Security Analytics © University of Melbourne 2021
Inductive Capability
• Manyapplicationsettingsconstantlyencounterpreviouslyunseennodes
• InductivenodeembeddingGeneralizetoentirelyunseengraphs
COMP90073 Security Analytics © University of Melbourne 2021
Variational Graph Autoencoder (VGAE) [4]
COMP90073 Security Analytics © University of Melbourne 2021
Deep Anomaly Detection on Attributed Networks [7]
COMP90073 Security Analytics © University of Melbourne 2021
One-Class Graph Neural Networks for Anomaly Detection in Attributed Networks [5]
COMP90073 Security Analytics © University of Melbourne 2021
Summary
• Whyitisimportanttomaintaindatagraphstructure?
• Howtouserandomwalkforgeneratinggraphembedding? • Howembedevolvingfeaturedgraphs?
• Howusegraphembeddingforanomalydetection?
Next: Contrast Data Mining
COMP90073 Security Analytics © University of Melbourne 2021
References
1. , Hanghang Tong, , “Graph-based Anomaly Detection and Description: A Survey”, Data Mining and Knowledge Discovery, 2015.
2. , Rami Al-Rfou, and . “Deepwalk: Online learning of social representations.” In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701- 710. 2014.
3. , , . Rush, and . “Automating Botnet Detection with Graph Neural Networks.” arXiv preprint arXiv:2003.06344 (2020).
4. . Kipf, and . “Variational graph auto-encoders.” arXiv preprint arXiv:1611.07308, 2016. code https://github.com/tkipf/gae
5. Xuhong Wang, , , Ping Cui, and Yupu Yang, “One-Class Graph Neural Networks for Anomaly Detection in Attributed Networks”, arXiv preprint arXiv:2002.09594v2 (2020)
COMP90073 Security Analytics © University of Melbourne 2021
References
6. Kaize Ding, Jundong Li, , and . “Deep anomaly detection on attributed networks.” In Proceedings of the SIAM International Conference on Data Mining, pp. 594-602, 2019.
7. , . Zheng, and – . “A comprehensive survey of graph embedding: Problems, techniques, and applications.” IEEE Transactions on Knowledge and Data Engineering 30, no. 9 (2018): 1616-1637.
COMP90073 Security Analytics © University of Melbourne 2021