程序代写代做代考 Java database flex algorithm SQL COMP5338

COMP5338 – Advanced Data Models

Dr. Ying Zhou
School of Information Technologies

COMP5338 – Advanced Data Models
Week 8: Neo4j Internal and Data Modeling

Outline
 Neo4j Storage

 Neo4j Query Plan and Indexing

 Neo4j – Data Modeling

 Neo4j – Graph Algorithms

Materials adapted by permission from Graph Databases (2nd
Edition) by Ian Robinson et al (O’Reilly Media Inc.). Copyright
2015 Neo Technology, Inc

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-2

Property Graph Model

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

label

node

label

properties

label
relationshiprelationship

type

Property of relationship

08-3

Index-free Adjacency
 Native storage of relationships between nodes

 Effectively a pre-computed bidirectional join
 Traversal is like pointer dereferencing

 Almost as fast as well
 Index-free Adjacency

 Each node maintains a direct link to its adjacent nodes
 Each node is effectively a micro-index to the adjacent nodes

 Cheaper than global indexes
 Query are faster, do not depends on the total size of the graph

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)
Slides 3-10 are based on Graph Database chapter 6.1 and 6.2

08-4

Neo4j Architecture

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

 Graph data is stored in store files on disk
 Nodes, relationships, properties and labels all have their own store

files.
 Separating graph and property data promotes fast traversal

 user’s view of their graph and the actual records on disk are
structurally dissimilar

Page 163 of Graph Database

08-5

Node store file
 All node data is stored in one node store file
 Physically stored in file named neostore.nodestore.db
 Each record is of a fixed size – 15 bytes (was 9 bytes in earlier version )
 Offset of stored node = node id * 15 (node id = 100, offset = 1500)
 Deleted IDs in .id file and can be reused

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

4 bytes for the first
relationship ID

4 bytes for the first
property ID

5 bytes for labels

08-6

Relationship store file
 All relationship data is stored in one relationship store file
 Physically stored in file named neostore.relationshipstore.db
 Each record is of a fixed size – 34 bytes
 Offset of stored relationship = relationship id * 34

 So, relationship id = 10, offset = 340

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-7

Other Files
 Property store contains fixed size records to store

properties for nodes and relationships
 Simple properties are stored inline
 Complex ones such as long string or array property are stored else

where
 Node label in node records references data in label store
 Relationship type in relationship record references data in

relationship type store
 Both Node ID and Property ID are of 4 bytes

 The maximum ID value is 2~32 -1
 ID is assigned and managed by the system

 The corresponding record will be stored in the computed offset
 The IDs of deleted nodes/relationships will be reused

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-8

Node structure
 Bob LIKES Alice

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-9

5 bytes pointer
to the first label
in label store

4 bytes pointer
to the first
relationship in
relationship
store

4 bytes pointer to the
first property in property
store

The record representing Node 1 contains 3 pointers plus one
inUse byte and one extra byte

Relationship structure
 Bob LIKES Alice

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-10

First node Second node

The previous and next is
determined by relationship
creation time in the database

Doubly linked list

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

1 2

10 20

1 1 1

Node file Name:Alice Name:Bob Name:Charlie

10 10 20

Relationship file

Offset: 340

0 1 20

Type:FOLLOWS Some property

Offset: 680

0 2 10

Some propertyType:FOLLOWS

person person person

Creation order:
node Alice
node Bob
node Charlie
relationship Alice –> Bob
relationship Alice –> Charlie

08-11

Alice’s previous relation

Alice’s next relation

Doubly linked list (cont’d)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

1 2

10 20

1 1 1

Node file

Name:Alice Name:Bob Name:Charlie

10 10 20

Relationship file

1 type prop
Offset: 340

0 1 20

1 type prop
Offset: 680

0 2 10

person person person

Add another node and relationship
create node Daniel
relationship Charlie –> Daniel

Danielfollows

Name: Daniel person

Offset: 714

1 type prop2 3 20

MATCH (:Person {Name: “Alice}) –[*2]->(other) RETURN other

stop here

08-12
Charlie’s previous relation

Charlie’s next relation

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

“The node and relationship stores are concerned only with
the structure of the graph, not its property data. Both
stores use fixed-sized records so that any individual
record’s location within a store file can be rapidly computed
given its ID. These are critical design decisions that
underline Neo4j’s commitment to high-performance
traversals.”

— Chapter 6, Graph Databases

08-13

Outline
 Neo4j Storage

 Neo4j Query Plan and Indexing

 Neo4j – Data Modeling

 Neo4j – Graph Algorithms

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-14

Neo4j Query Execution
 Each Neo4j Query is turned into an execution plan by a execution

planner
 Rule Strategy Planner

 Consider available indexes but does not use statistical information
 Cost Strategy Planner (default and in development)

 Use statistic information to evaluate a few alternative plans
 E.g. If there are less Movie nodes than People nodes, a query involving both may

get better performance if starting from a collection of Movie nodes
• See example in lab

 Query plan stages
 Starting point
 Expansion by matching given path in the query statement
 Row filtering, skipping, sorting, projection, etc…
 Combining operations
 Updating

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-15

Query Plan: an example

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

Query:

MATCH
(cloudAtlas {title: “Cloud Atlas”})<-[:DIRECTED]-(directors) RETURN directors.name explain profile 08-16 Obtain the property requires checking the property file Obtain all relationships of Cloud Atlas Obtain the name property Obtain all node records 1 2 3 4 Query Starting Points  Most queries start with one or a set of nodes except if a relationship ID is specified  MATCH (n1)-[r]->() WHERE id(r)= 0 RETURN r, n1

 This query will start from locating the first record in the relationship
file

 Query may start by scanning all nodes
 MATCH(n) RETURN (n)
 MATCH (cloudAtlas {title: “Cloud Atlas”})<-[:DIRECTED]-(directors) RETURN directors.name  Query may start by scanning all nodes belonging to a given label  MATCH (p:Person{name:”Tom Hanks”}) return p  Labels are implicitly indexed  Query may start by using index COMP5338 "Advanced Data Models" - 2018 (Y. Zhou) 08-17 Query starting from labelled node MATCH (n:Person) -[r]-(something) with n, count(something) as degree order by degree limit 1 return n COMP5338 "Advanced Data Models" - 2018 (Y. Zhou) 08-18 Obtain all 133 Person nodes records Obtain133 nodes + 256 relationships Memory processing Query Plan With Index  Neo4j supports index on properties of labelled node  Index has similar behaviour as those in relational systems  It can be built on single or composite properties  Create Index  CREATE INDEX ON :Person(name)  Drop Index  DROP INDEX ON :Person(name) COMP5338 "Advanced Data Models" - 2018 (Y. Zhou) Query: MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..4]-(hollywood) RETURN DISTINCT hollywood 08-19 A relatively complex query and plan MATCH (n:Person{name: "Tom Hanks"}) WITH n.phone as phones, n UNWIND phones as phone MATCH (m:Person) WHERE phone in m.phone and n<>m
RETURN m.name

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

Apply works by performing a nested loop. Every
row being produced on the left-hand side of

the Apply operator will be fed to the leaf
operator on the right-hand side, and

then Apply will yield the combined results

08-20

Obtain 132
Person nodes’s
phone property
twice!

Obtain all 133
Person nodes
records twice!

Obtain the
matching node’s
name property

Comparing Execution Plans
MATCH (m: Movie)
WHERE m.released IN [1999,2003]
RETURN m.title, m.released

UNWIND [1999,2003] as year
MATCH (m: Movie)
WHERE m.released = year
RETURN m.title, m.released

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

version 3.2Earlier version

08-21

Another example of comparison
 Question: Find out a list of person who has acted in at least

three movies and also directed at least one movie
 Cypher is powerful and flexible

 It is possible to write very different queries that produce the same
results

 The performance could have big difference
 The DB engine does not have much knowledge to rewrite the

queries as those in SQL
 Not based on relational algebra

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-22

Option 1
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) AS mc
WHERE mc > 2
MATCH (p)-[:DIRECTED]->(m2:Movie)
RETURN p.name, m2.title

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-23

15 person have
acted in at
least 3 movies

Need to check
all 15 person’s
relationships

Check the
movie node

Getting two
properties

Option 2
MATCH (m1:Movie)<-[a:ACTED_IN]-(p:Person)-[:DIRECTED]->(m2:Movie)
WITH p, count(distinct m1) as ac, m2
WHERE ac > 2
RETURN p.name, m2.title

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-24

38 Movie
nodes + 44
relationships

44 Person
nodes

Obtain the 18
Movie nodes
from the 18
relationships
as m1

Among 62
relationships
only 18 are of
DIRECTED
type

Memory
processing

Transactions
 Neo4j supports full ACID transactions

 Similar to those in RDBMS
 Uses locking to ensure consistency

 Lock Manager manages locks held by a transaction
 Logging

 Write Ahead Logging (WAL)
 Transaction Commit Protocol

 Acquire locks (Atomicity, Consistency, Isolation)
 Write Undo and Redo records to the WAL

 for each node, relationship, property changed is written to the log
 Write commit record to the log and flush to disk (Durability)
 Release locks

 Recovery – if the database server/machine crashes
 Apply log records to replay changes made by the transactions

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-25

Outline
 Neo4j Storage

 Neo4j Query Plan and Indexing

 Neo4j – Data Modeling

 Neo4j – Graph Algorithms

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-26

Graph Data Modelling
 Graph data modelling is very closely related with domain

modelling
 You need to decide

 Node or Relationship
 Node or Property
 Label/Type or Property

 Decisions are based on
 Features of entities in application domain
 Your typical queries
 Features and constraints of the underlying storage system

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

Slides 35-39 are based on Chapter 4 of Graph Databases book

08-27

Node vs. Relationship
 Nodes for Things, Relationship for Structures

 AS A reader who likes a book, I WANT to know which books other
readers who like the same book have liked, SO THAT I can find
other books to read.

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou)

MATCH (:Reader {name:’Alice’})-[:LIKES]->(:Book {title:’Dune’})
<-[:LIKES]-(:Reader)-[:LIKES]->(books:Book)
RETURN books.title

08-28

Node vs. Relationship
 Model Facts as Nodes

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-29

Node or Property
 Represent Complex Value Types as Nodes

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-30

Relationship Property or Relationship Type

 E.g. The relationship between user node and address
node can be:

 typed as HOME_ADDRESS, BILLING_ADDRESS or
 typed as generic ADDRESS and differentiated using a type property

{type:’home’}, {type:’billing’}

 We use fine-grained relationships whenever we have a
closed set of relationship types.
 Eg. there are only a finite set of address types
 If traversal would like to follow generic type ADDRESS, we may

have to use redundant relationships
 MATCH (user)-[:HOME_ADDRESS|WORK_ADDRESS|

DELIVERY_ADDRESS]->(address)
 MATCH (user)-[:ADDRESS]->(address)
 MATCH (user:User)-[r]->(address:Address)

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-31

Outline
 Neo4j Storage

 Neo4j Query Plan and Indexing

 Neo4j – Data Modeling

 Neo4j – Graph Algorithms

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-32

Graph Algorithm
 In addition to graph query and traversal, a rich set of graph

algorithms are provided by Neo4j
 Used to be part of the Neo4j server
 It is now moved out as a separate project

 The graph algorithms are implemented as Cypher
procedures and need to be installed separately

 Procedure is a mechanism to extend Neo4j
 Take arguments, perform database operation and return result
Written in Java and compiled to jar files
 Once installed, it can be called directly from Cypher

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-33

https://neo4j.com/docs/graph-algorithms/3.4/

https://neo4j.com/docs/developer-manual/current/extending-neo4j/procedures/

https://neo4j.com/docs/graph-algorithms/3.4/

References
 Ian Robinson, Jim Webber and Emil Eifrem, Graph

Databases, Second Edition, O’Reilly Media Inc.,
 You can download this book from the Neo4j site,

https://neo4j.com/graph-databases-book/?ref=home
 Chapter 4, Chapter 6

 Neo4j – Reference Manual
 https://neo4j.com/docs/developer-manual/current/

COMP5338 “Advanced Data Models” – 2018 (Y. Zhou) 08-34

https://neo4j.com/graph-databases-book/?ref=home
https://neo4j.com/docs/developer-manual/current/

COMP5338 – Advanced Data Models
Outline
Property Graph Model
Index-free Adjacency
Neo4j Architecture
Node store file
Relationship store file
Other Files
Node structure
Relationship structure
Doubly linked list
Doubly linked list (cont’d)
Slide Number 13
Outline
Neo4j Query Execution
Query Plan: an example
Query Starting Points
Query starting from labelled node
Query Plan With Index
A relatively complex query and plan
Comparing Execution Plans
Another example of comparison
Option 1
Option 2
Transactions
Outline
Graph Data Modelling
Node vs. Relationship
Node vs. Relationship
Node or Property
Relationship Property or Relationship Type
Outline
Graph Algorithm
References

Related Posts