程序代写代做代考 data structure database hbase flex chain SQL COMP5338 – Advanced Data Models

COMP5338 – Advanced Data Models

Ying Zhou
School of Information Technologies

COMP5338 – Advanced Data Models
Week 7: Graph Data and Neo4j Introduction

Administrative
 The group project instruction and data set will be published this week

(week 7)
 Group can have up to 2 students

 Self-enrolled groups will be set up on Canvas
 Group name indicates the lab you will demo
 Ok to form groups across labs

 Project overview
 You are given a data set (in a few tsv files) and some target queries
 Design schema for two storage options

 MongoDB based
 Neo4j based

 Load data, set up index, run target queries and observe performance
 Describe your schema, query and performance in a report
 Demo your solution to the tutor in week 10 lab

 Individual contribution will be assessed and members may get different
marks

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-2

Outline
 Brief Review of Graphs
 Examples of Graph Data
 Modelling Graph Data
 Property Graph Model
 Cypher Query

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-3

Graphs
 A graph is just a collection of vertices and edges

 Vertex is also called Node
 Edge is also called Arc/Link

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-4

Type of Graphs
 Undirected graphs

 Edges have no orientation (direction)
 (a, b) is the same as (b, a)

 Directed graphs
 Edges have orientation (direction)
 (a, b) is not the same as (b, a)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-5

Representing Graph Data
 Data structures used to store graphs in programs

 Adjacency list
 Adjacency matrix

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-6

Adjacency List

Directed Graph

Undirected Graph

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-7

Adjacency matrix – Directed Graph

Directed Graph

Undirected Graph

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-8

Outline
 Brief Review of Graphs
 Examples of Graph Data
 Modelling Graph Data
 Introduction to Neo4j
 Cypher Query

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-9

Examples of graphs
 Social graphs

 Organization structure
 Facebook, LinkedIn, etc.

 Computer Network topologies
 Data centre layout
 Network routing tables

 Road, Rail and Airline networks

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-10

Social Graphs and extension

Page 2 of the graph database book

A small social graph

Related information can be
captured in the graph

Page 3 of the graph database book

Nodes do not have to be of
same type

Edges have different
meanings

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-11

Social Graph with Various Relationships

Page 19 of the graph database book

Multiple edges between the
same pair of nodes

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-12

Transaction information

Page 23 of the graph database book

This graph shows three
types of entities

And various types of
relationships among
those entities

User PLACED order

User MOST_RECENT
order

Order CONTAINS Item

Order PREVIOUS Order

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-13

Outline
 Brief Review of Graphs
 Examples of Graph Data
 Modelling Graph Data
 Property Graph Model
 Cypher Query

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-14

RDBMS to store graph

 Who are Bob’s friends?
SELECT p1.Person
FROM Person p1 JOIN PersonFriend pf ON pf.FriendID = p1.ID

JOIN Person p2 ON pf.PersonID = p2.ID
WHERE p2.Person = ”Bob”

Page 13 of the graph database book

1

2

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-15

RDBMS to store Graphs
 Who are friends with Bob?

SELECT p1.Person
FROM Person p1 JOIN PersonFriend pf ON pf.PersonID = p1.ID

JOIN Person p2 ON pf.FriendID = p2.ID
WHERE

p2.Person = ”Bob”

1

2

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-16

RDBMS to store Graphs
 Who are Alice’s friends-of-friends?

SELECT p1.Person AS PERSON, p2.Person AS FRIEND_OF_FRIEND
FROM PersonFriend pf1 JOIN Person p1 ON pf1.PersonID = p1.ID

JOIN PersonFriend pf2 ON pf2.PersonID = pf1.FriendID
JOIN Person p2 ON pf2.FriendID = p2.ID

WHERE p1.Person = ”Alice” AND pf2.FriendID <> p1.ID

1

2

3

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-17

MongoDB to store Graph

 Who are Alice’s friends-of-friends?
 Find out Alice’s friends ID

 db.persons.find({person:”Alice”},{friends:1})
 For each id, find out the friends ID again

 db.persons.find({_id:{$in:[2]}}, {friends:1}
 For each id, find out the actual person

 db.persons.find({_id:{$in:[1,99]}}, {person:1})

 The MongDB 3.4 and later has a new aggregation stage
called $graphLookup

{ _id: 1,
person: “Alice”,
friends:[2]

}

{ _id: 2,
person: “Bob”,
friends:[1,99]

}

persons collection

{ _id: 99,
person: “Zach”,
friends:[1]

}

 Who are Bob’s friends?
 Find out Bob’s friends’ ID

 db.persons.find({person:”Bob”},{friends:1})
 For each id, find out the actual person

 db.persons.find({_id: 1},{person:1}),
db.persons.find({_id: 99},{person:1}),

 db.persons.find({_id:{$in:[1,99]}}, {person:1})

 Who are friends with Bob?
 Find out Bob’s id

 db.persons.find({person:”Bob”})
 Find out the persons that are friends with Bob

 db.persons.find({friends: 2}, {person:1})

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-18

$graphLookup
db.persons.aggregate([
{$match:{person:“Alice”}},
{$graphLookup:{

from: “persons”,
startWith: “$friends”,
connectFromField:”friends”,
connectToFirld:”pid”,
maxDepth: 1,
as: “friendsnetwork”}}

])

{“_id” : 1,
“person” : “Alice”,
“friends” : [ 2],
“friendsnetwork” : [

{“_id” : 99.0,
“name” : “Zach”,
“friends” : [ 1, 3],
“depth” : 1 },

{“_id” : 1,
“name” : “Alice”,
“friends” : [ 2],
“depth” : 1},

{“_id” : 2,
“name” : “Bob”,
“friends” : [1, 99],
“depth” : 0}

] }

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-19

In Summary
 It is possible to store graph data in various storage systems

 Shallow traversal
 Relatively easy to implement
 Performance OK

 Deep traversal or traversal in other direction
 Complicated to implement

• Multiple joins or multiple queries or full table scan
 Less efficient
 Error prone

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-20

Outline
 Brief Review of Graphs
 Examples of Graph Data
 Modelling Graph Data
 Property Graph Model
 Cypher Query

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-21

Graph Technologies
 Graph Processing

 take data in any input format and perform graph related operations
 OLAP – OnLine Analysis Processing of graph data
 Google Pregel, Apache Giraph

 Graph Databases
manage, query, process graph data
 support high-level query language
 native storage of graph data
 OLTP – OnLine Transaction Processing possible
 OLAP – also possible

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-22

Graph Data Models
 RDF (Resource Description Framework) Model

 Express node-edge relation as “subject, predicate, object” triple
(RDF statement)

 SPARQL query language
 Examples: AllegroGraph, Apache Jena

 Property Graph Model
 Express node and edge as object like entities, both can have

properties
 Various query language
 Examples

 Apache Titan
• Support various NoSQL storage engine: BerkeleyDB, Cassandra, HBase
• Structural query language: Gremlin

 Neo4j
• Native storage manager for graph data (Index-free Adjacency)
• Declarative query language: Cypher query language

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-23

Property Graph Model
 Proposed by Neo technology
 No standard definition or specification
 Both Node and Edges can have property

 RDF model cannot express edge property in a natural and easy to
understand way

 The actual storage varies
 The query language varies

since: 1 year
common: sailing

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-24

Neo4j
 Native graph storage using property graph model
 Index-free Adjacency

 Nodes and Relationships are stored
 Supports indexes
 Replication

 Single master-multiple slaves replication
 Neo4j cluster is limited to master/slave replication

configuration
 Database engine is not distributed

 Cypher – query language

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-25

Property Graph Model as in Neo4j
 Property graph has the following characteristics

 It contains nodes and relationships
 Nodes contain properties

 Properties are stored in the form of key-value pairs
 A node can have labels (classes)

 Relationships connect nodes
 Has a direction, an optional type, a source node and a target node
 No dangling relationships (can’t delete node with a relationship)

 Properties
 Both nodes and relationships have properties
 Useful in modeling and querying based on properties of relationships

http://docs.neo4j.org/chunked/milestone/graphdb-neo4j.html

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-26

Property Graph Model Example

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-27

It models a graph with three entities: 2 person and one
movie, each with a set of properties;
It also models the relationship among them: one person
acted in the movie with a role, another person directed
the movie

Property Graph Model: Nodes
 Nodes are used often used to represent entities, e.g. objects

 It has properties
 It can have labels

 A label is a way to group similar nodes
 It acts like the ‘class’ concept in programming world

 Label is a dynamic and flexible feature
 It can be added or removed during run time
 It can be used to tag node temporarily

 E.g. :Suspend, :OnSale, etc

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-28

A node with two labels and two properties

properties

label

Property Graph Model: Relationships
 A relationship connects two nodes: source node and

target node
 The source and the target node can be the same one

 It always has a direction
 But traversal can happen in either direction

 It can have a type
 It can have properties

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-29

source node target node
property

type

Property Graph Model: Properties
 A property is a pair of property key and property value
 The property value can have the following type:

 Number: Integer and Float
 String
 Boolean
 Spatial Type: Point
 Temporal Type

 Date
 Time
 …

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-30

Property Graph Model: Paths
 A path is one or more

nodes with connecting
relationships, typically
retrieved as a query or
traversal result.

A path of length zero

A path of length one
COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-31

Outline
 Brief Review of Graphs
 Examples of Graph Data
 Modelling Graph Data
 Property Graph Model
 Cypher Query

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-32

Cypher
 Cypher is a query language specific to Neo4j
 Easy to read and understand
 It uses patterns to represents core concepts in the property

graph model
 E.g. a pattern may represents that a user node is having a

transaction with the item formula in it.
 There are basic pattern representing nodes, relationships and path

 It uses clauses to build queries; Certain clauses and
keywords are inspired by SQL
 A query may contain multiple clauses

 Functions can be used to perform aggregation and other
types of analysis

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-33

Cypher patterns: node
 A single node

 A node is described using a pair of parentheses, and is typically
given an identifier (variable)

 E.g.: (n) means a node n
 The variable’s scope is restricted in a single query statement

 Labels
 Label(s) can be attached to a node
 E.g.: (a:User) or (a:User:Admin)

 Specifying properties
 Properties are a list of name value pairs enclosed in a curly brackets
 E.g.: (a { name: “Andres”, sport: “Brazilian Ju-Jitsu” })

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

https://neo4j.com/docs/developer-manual/current/cypher/syntax/patterns/

07-34

Cypher patterns: relationships
 Relationship is expressed as a pair of dashes (–)

 Arrowhead can be added to indicate direction
 Relationship always need a source and target node.

 Basic Relationships
 Directions are not important: (a)–(b)
 Named relationship: (a)-[r]->(b)
 Named and typed relationship: (a)-[r:REL_TYPE]->(b)
 Specifying Relationship that may belong to one of a set of types:

(a)-[r:TYPE1|TYPE2]->(b)
 Typed but not named relationship: (a)-[:REL_TYPE]->(b)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-35

Relationship patterns (cont’d)
 Relationship of variable lengths

 (a)-[*2]->(b) describes a path of length 2 between node a and node b
 This is equivalent to (a)–>()–>(b)

 (a)-[*3..5]->(b) describes a path of minimum length of 3 and
maximum length of 5 between node a and node b

 Either bound can be omitted (a)-[*3..]->(b), (a)-[*..5]->(b)
 Both bounds can be omitted as well (a)-[*]->(b)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-36

Pattern Examples

 Pattern: (n)
 Matches all nodes in the graph

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-37

 Pattern: (m:Movie)
 Matches the movie node in the

graph

 Pattern: (p:{name: ’Tom Hanks’})
 Matches the person node with

name ‘Tom Hanks’ in the graph

 Pattern: (p1)-[r:DIRECTED]->(m1)
 Matches the path from person

Robert Zemeckis to movie
“Forrest Grum’

Cypher Clauses – Create
 CREATE

 Create nodes or relationships with properties

 Create a node matrix1 with the label Movie
CREATE (matrix1:Movie {title:’The Matrix’, released:1999,

tagline:’Welcome to the Real World’})

 Create a node keanu with the label Actor
CREATE (keanu:Actor {name:’Keanu Reeves’, born:1964})

 Create a relationship ACTS_IN
CREATE (keanu)-[:ACTS_IN {roles:’Neo’}]->(matrix1)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-38

We give the node an identifier so we can refer to the particular node later in the same query

The identifier “Keanu” and “matrix1” are used in the this create clause.
We did not give the relationship a name/identifier.
We need to write the three clauses in a single query statement to be able to use those variables

Cypher – Read
 MATCH … RETURN

MATCH is the main reading clause
 RETURN is a projecting clause
 They are chained to make a query

 Return all nodes:
MATCH (n) RETURN n

 Return all nodes with a given label: select * from movie
MATCH (movie:Movie) RETURN movie

 Return all actors’ name in the movie “The Matrix”

MATCH (a:Actor)-[:ACTS_IN]->(:Movie{title:”The Matrix”})
RETURN a.name

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-39

We give the Actor node an identifier “a” so we can use refer to in the RETURN sub-clause

We do not need to return the relationship so we did not give an identifier to it

We do not need to give an identifier to the Movie node too,

Cypher – Update
 MATCH … SET/REMOVE … RETURN

 Set the age property for all actor nodes
MATCH (n:Actor)
SET n.age = 2014 – n.born
RETURN n

 Remove a property
MATCH (n:Actor)
REMOVE n.age
RETURN n

 Remove a label
MATCH (n:Actor{name:”Keanu Reeves”})
REMOVE n:Actor
RETURN n

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-40

Cypher – Delete
 MATCH … DELETE

 Delete relationship
MATCH (n{name:”Keanu Reeves”})-[r:ACTS_IN]->()
DELETE r

 Delete a node and all possible relationship
MATCH (m{title:’The Matrix’})-[r]-()
DELETE m,r

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-41

More on READ: WHERE
 The WHERE sub clause can be used to specify various query

conditions

MATCH (n)
WHERE n.age <30 and n.employ>=3
RETURN n.name

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-42

Functions
 Functions may appear in various clauses

 Build-in and user-defined functions
 Build-in functions

 Prediction functions
 Scalar functions
 Aggregation functions
 List functions
Mathematical functions
 String functions
 Temporal functions
 Spatial Functions

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-43

Aggregating Functions
 GROUP BY feature in Neo4j is achieved using aggregating

functions
 E.g. count(), sum(), avg(), max(), min() and so on

 The grouping key is implied in the return clause
 None aggregate expression in the return clause is the grouping key
 RETURN n, count(*)

 n is a variable declared in a previous clause, and it is the grouping key
MATCH(n:PERSON) RETURN n.gender, COUNT(*)

 Count the number of nodes representing male and female in the graph
 A person’s gender is the grouping key

 A grouping key is not always necessary, the aggregation
function can apply to all results returned
MATCH (n:PERSON) RETURN COUNT(*)

 To count the number of Person nodes in the graph

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-44

Aggregation Examples
 To find out the earliest year a Person was born in the data

set
MATCH (n:PERSON) RETURN min (n.born)

 To find out the distribution of relationship types belonging to
nodes with certain feature

MATCH (n { name: ‘A’ })-[r]->()
RETURN type(r), count(*)

The grouping key is type(r) which is a scalar function, returns the type of
relationship in the matching results

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-45

Aggregation Examples: DISTINCT

MATCH (me:Person)–>(friend:Person)–>(friend_of_friend:Person)
WHERE me.name = ‘A’
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-46

Variable friend_of_friend refers to
the same node in both matches

More on READ: subqueries
 The WITH clause can chain different query parts together in a pipeline

style
 Used to apply conditions on aggregation result
 Used to modify (order, limiting, etc) the results before collecting them as a

list
 Examples

 Find the person who has directed 3 or more movies
MATCH (p:Person)-[r:DIRECTED]->(m:Movie)
WITH p, count(*) as movies
WHERE movies >= 3
RETURN p.name, movies

 Return the oldest 3 person as a list
MATCH (n:Person)
WITH n
ORDER by n.age DESC LIMIT 3
RETURN collect(n.name)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-47

MATCH (n:Person)
RETURN n.name
ORDER by n.age DESC LIMIT 3

Dealing with Array type
 Array literal is written in a similar way as it is in most

programming languages
 examples

 An array of integer: [1,2,3]
 An array of string: [“Sydney”, “University”]

 Both node and relationship can have property of array type
 Example: create an relationship with array property

create (Keanu)-[:ACTED_IN {roles:[‘Neo’]}]->(TheMatrix)

 Example: update an existing node with array property
MATCH (n:Person{name: “Tom Hanks”})
set n.phone=[“0123456789”,”93511234“]

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-48

Dealing with Array type (cont’d)
 Querying array property

 The IN operator: check if a value is in an
array
 Example: find out who has played ‘Neo’ in which

movie
MATCH (a:Person) -[r:ACTED_IN]->(m:Movie)
WHERE ‘Neo’ IN r.roles
RETURN a , m

 The UNWIND operator: flatten an array
into multiple rows
 Example: find all the movies released in

1999 or in 2003
UNWIND [1999,2003] as year
MATCH (m: Movie)
WHERE m.released = year
RETURN m.title, m.released

This is equivalent to
MATCH(m: Movie)
WHERE m.released IN [1999,2003]
RETURN m.title, m.released

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-49

Dealing with Array Type (cont’d)
 A relatively complex query

 Update another node
MATCH (n:Person{name: “Meg Ryan”}) set n.phone=[“0123456789”]

 Run a query to see who shares any phone number with Tom Hanks
MATCH (n:Person{name: “Tom Hanks”})
WITH n.phone as phones, n
UNWIND phones as phone
MATCH (m:Person)
WHERE phone in m.phone and n<>m
RETURN m.name

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-50

Where to find more about cypher query:
Developer’s guide: http://neo4j.com/docs/developer-manual/current/cypher/
Reference card: https://neo4j.com/docs/cypher-refcard/current/

http://neo4j.com/docs/developer-manual/current/cypher/
https://neo4j.com/docs/cypher-refcard/current/

Indexing

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-51

 Neo4j supports index on properties of labelled node
 Index has similar behaviour as those in relational systems
 Create Index

 CREATE INDEX ON :Person(name)

 Drop Index
 DROP INDEX ON :Person(name)

 Storage and query execution will be covered in week 8

References
 Ian Robinson, Jim Webber and Emil Eifrem, Graph Databases, Second Edition, O’Reilly

Media Inc., June 2015
 You can download this book from the Neo4j site, http://www.neo4j.org/learn will redirect you to

Home


 The Neo4j Document

 The Neo4j Graph Database Concept (http://neo4j.com/docs/stable/graphdb-neo4j.html)

 Noel Yuhanna, Market Overview: Graph Databases, Forrester White Paper, May, 2015
 Renzo Angeles, A Comparison of Current Graph Data Models, ICDE Workshops 2013

(DOI-10.1109/ICDEW.2012.31)
 Renzo Angeles and Claudio Gutierrez, Survey of Graph Database Models, ACM

Computing Surveys, Vol. 40, N0. 1, Article 1, February 2008 (DOI-
10.1145/1322432.1322433)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 07-52

http://www.neo4j.org/learn

Home


http://neo4j.com/docs/stable/graphdb-neo4j.html

COMP5338 – Advanced Data Models
Administrative
Outline
Graphs
Type of Graphs
Representing Graph Data
Adjacency List
Adjacency matrix – Directed Graph
Outline
Examples of graphs
Social Graphs and extension
Social Graph with Various Relationships
Transaction information
Outline
RDBMS to store graph
RDBMS to store Graphs
RDBMS to store Graphs
MongoDB to store Graph
$graphLookup
In Summary
Outline
Graph Technologies
Graph Data Models
Property Graph Model
Neo4j
Property Graph Model as in Neo4j
Property Graph Model Example
Property Graph Model: Nodes
Property Graph Model: Relationships
Property Graph Model: Properties
Property Graph Model: Paths
Outline
Cypher
Cypher patterns: node
Cypher patterns: relationships
Relationship patterns (cont’d)
Pattern Examples
Cypher Clauses – Create
Cypher – Read
Cypher – Update
Cypher – Delete
More on READ: WHERE
Functions
Aggregating Functions
Aggregation Examples
Aggregation Examples: DISTINCT
More on READ: subqueries
Dealing with Array type
Dealing with Array type (cont’d)
Dealing with Array Type (cont’d)
Indexing
References