Microsoft PowerPoint – 24- MapReduceP2_Neo4J2pdf
© 2018 A. Alawini
Map-Reduce
Neo4j: Graph Database
Abdu Alawini
University of Illinois at Urbana-Champaign
CS411: Database Systems
December 2, 2018
1
© 2018 A. Alawini
Announcements
•HW 5 is due on 12/9
•You are welcome
•PT1 Stage5 final demos
• Come at least 5 minutes before your presentation
• All team members must be present during the demo.
•PT1 report and video link are tomorrow at
Noon!
•You are very welcome
•PT2 report and slides are tomorrow at Noon!
•You are very very welcome
2
© 2018 A. Alawini
Last week!
•NoSQL Introduction
•Relational-NoSQL Trade-offs
•MongoDB
•Model and simple queries
•Join
•Aggregation
•Updates, replication and sharding
•Map-reduce
3
© 2018 A. Alawini
MapReduce dataflow
Mapper
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Reducer
In
p
u
t
d
at
a
O
u
tp
u
t
d
at
a
“The Shuffle”
Intermediate
(key,value) pairs
4
© 2018 A. Alawini
Simple example: Word count
Mapper
(1-2)
Mapper
(3-4)
Mapper
(5-6)
Mapper
(7-8)
(1, the apple)
(2, is an apple)
(3, not an orange)
(4, because the)
(5, orange)
(6, unlike the apple)
(7, is orange)
(8, not green)
(the, 1)
(apple, 1)
(is, 1)
(apple, 1)
(an, 1)
(not, 1)
(orange, 1)
(an, 1)
(because, 1)
(the, 1)
(orange, 1)
(unlike, 1)
(apple, 1)
(the, 1)
(is, 1)
(orange, 1)
(not, 1)
(green, 1)
(apple, 3)
(an, 2)
(because, 1)
(green, 1)
(is, 2)
(not, 2)
(orange, 3)
(the, 3)
(unlike, 1)
(apple, {1, 1, 1})
(an, {1, 1})
(because, {1})
(green, {1})
(is, {1, 1})
(not, {1, 1})
(orange, {1, 1, 1})
(the, {1, 1, 1})
(unlike, {1})
Each mapper
receives some
of the KV-pairs
as input
The mappers
process the
KV-pairs
one by one
Each KV-pair output
by the mapper is
sent to the reducer
that is responsible
for it
The reducers
sort their input
by key
and group it
The reducers
process their
input one group
at a time
Key range the node
is responsible for
1 2 3 4 5
5
Reducer
(A-G)
Reducer
(H-N)
Reducer
(O-U)
Reducer
(V-Z)
© 2018 A. Alawini
Today’s lecture
•Continue with Map-reduce
•Introduction to Neo4J
•Cypher: A Graph Query Language
•Querying Nodes and Relationships using Patterns
•Manipulating Graph Data
6
© 2018 A. Alawini
Map-Reduce examples
• Distributed grep – all lines matching a pattern
• Count URL access frequency
• Reverse web-link graph
• Inverted index
7
© 2018 A. Alawini
Designing MapReduce algorithms
• Key decision: What should be done by map, and what by reduce?
• map can do something to each individual key-value pair, but
it can’t look at other key-value pairs
• Example: Filtering out key-value pairs we don’t need
• map can emit more than one intermediate key-value pair for each incoming
key-value pair
• Example: Incoming data is text, map produces (word,1) for each word
• reduce can aggregate data; it can look at multiple values, as long as map has
mapped them to the same (intermediate) key
• Example: Count the number of words, add up the total cost, …
• Need to get the intermediate format right!
• If reduce needs to look at several values together, map
must emit them using the same key!
8
© 2018 A. Alawini
Common mistakes to avoid
• Mapper and reducer should be stateless
• Don’t use static variables – after map +
reduce return, they should remember
nothing about the processed data!
• Reason: No guarantees about which
key-value pairs will be processed by
which workers!
• Don’t try to do your own I/O!
• Don’t try to write to files in the file system
• The MapReduce framework does the I/O for you
(usually):
• All the incoming data will be fed as arguments to map
and reduce
• Any data your functions produce should be output via
emit
HashMap h = new HashMap();
map(key, value) {
if (h.contains(key)) {
h.add(key,value);
emit(key, “X”);
}
} Wrong!
map(key, value) {
File foo =
new File(“xyz.txt”);
while (true) {
s = foo.readLine();
…
}
} Wrong!
9
© 2018 A. Alawini
More common mistakes to avoid
• Mapper must not map too much data to the same key
• In particular, don’t map everything to the same key!!
Otherwise the reduce worker will be overwhelmed!
• It’s okay if some reduce workers have more work than others
• Example: In WordCount, the reduce worker that works on the key
‘and’ has a lot more work than the reduce worker that works on
‘syzygy’.
map(key, value) {
emit(“FOO”, key + ” ” + value);
}
reduce(key, value[]) {
/* do some computation on
all the values */
}
Wrong!
10
© 2018 A. Alawini
MapReduce and MongoDB
•Suppose we have a collection of orders that look like
the following:
11
{
_id: ”o123”,
cust_id: “abc123”,
ord_dat: new Date(“Oct 04, 2013”),
status: ‘A’,
price: 12,
items: [ {sku: “mmm”, qty: 5, price: 2.5},
{sku:”nnn”, qty: 3, price: 1.5} ]
}
© 2018 A. Alawini
Example: MapReduce and MongoDB
•Return the total cost per customer.
12
© 2018 A. Alawini
Example 2: MapReduce and MongoDB
• Return the total and average number of each item (SKU) in
orders with status “A”.
13
© 2018 A. Alawini
Outline
Map-reduce
•Introduction to Neo4J
•Cypher: A Graph Query Language
•Querying Nodes and Relationships using Patterns
•Manipulating Graph Data
14
© 2018 A. Alawini
• Many types of data can be represented as
graphs
• Road networks, with intersections as
nodes and road segments as edges
• Computer networks, with computers as
nodes and connections as edges
• Social networks, with people/postings as
nodes and edges as relationship (e.g.
friends, likes, created, …)
• Graph databases store relationships and
connections as first-class entities:
“Property Graph Model”
Neo4j: Graph Database
15
©
M
ich
ael G
rossn
iklau
s
© 2018 A. Alawini
Labeled Property Graph
Data Model
16
h
ttp
s://n
eo4
j.com
/d
evelop
er/g
rap
h
-d
atab
ase/
Node
Relationship
NodeProperties
Relationships
Nodes
© 2018 A. Alawini
• Nodes
• have a system-assigned id
• can have key/value properties
• there is a reference node
(“starting point” into the node space)
• Relationships (Edges)
• have a system-assigned id
• are directed
(but can be traversed in either direction)
• have a type – can have key/value properties
• Key/value properties
• values always stored as strings
• support for basic types and arrays of basic types
Neo4j Nodes and Relationships
17
Node
Label
Relationship
Key: Value
….
Label
Node
Label
Node
Label
©
M
ich
ael G
rossn
iklau
s
© 2018 A. Alawini
• Performance
• Graph databases have better performance when dealing with
connected data vs. relational databases and NOSQL
• Flexibility
• No need to define schema upfront
• Agility
• graph data model is schema-free
• testable graph database’s API
• query language
The Power of Graph Databases
18
Graph DB and application
evolve in an agile fashion
© 2018 A. Alawini
Outline
Map-reduce
Introduction to Neo4J
•Cypher: A Graph Query Language
•Querying Nodes and Relationships using Patterns
•Manipulating Graph Data
19
© 2018 A. Alawini
•SQL-like syntax
•Declarative pattern-matching graph query language
•Query a graph DB to find data (Nodes, Relationships,
subgraphs) that matches a specific pattern
• uses ASCII to specify a patterns
Cypher: A Graph Query Language
20
© 2018 A. Alawini
(x:Person {name: ‘Abdu’})
<-[:KNOWS]-(y:Person {name: ‘Dave’})
-[:KNOWS]->(z:Person {name: ‘Susan’)
-[:KNOWS]->(x)
• Patterns describes path that
connects a node (or set of nodes)
• Identifiers allow us to refer to the
same node more than once when
describing a pattern
• In Cypher, graphs are described
using specification by example.
Cypher Basics
21
Cypher Pattern
Variables (Identifies)
name: Abdu
name: Susan name: Dave
© 2018 A. Alawini
Cypher Queries
•Cypher is comprised of three main concepts
• MATCH: graph pattern to match, bound to the starting points
• WHERE: filtering criteria
• RETURN: what to return
•Implemented using the Scala programming language
22
©
M
ich
ael G
rossn
iklau
s
© 2018 A. Alawini
•Nodes are surrounded with parentheses
•Nodes queries
Querying Nodes
23
MATCH (node:Label)
WHERE node.property = {value}
RETURN node.property
Source: https://neo4j.com/developer/cypher-query-language/
()
(matrix)
(:Movie)
(matrix:Movie)
(matrix:Movie {title: “The Matrix”})
(matrix:Movie {title: “The Matrix”, released:
1997})
© 2018 A. Alawini
Examples
24
• Find ”Apollo 13” movie
• Find 1990’s movies
© 2018 A. Alawini
Relationships are basically an arrow –> between two nodes.
• relationship-types -[:KNOWS]-> , <-[: LIKE]-
• a variable name -[rel:KNOWS]-> before the colon
• additional properties -[rel:KNOWS {since:2010}]->
• structural information for paths of variable length -[:KNOWS*..4]->
Querying Relationships
25
Source: https://neo4j.com/developer/cypher-query-language/
MATCH (n1:Label1)-[rel:TYPE]->(n2:Label2)
WHERE rel.property = {value}
RETURN rel.property, type(rel)
© 2018 A. Alawini
Example
26
© 2018 A. Alawini
• Nodes and relationship expressions are the building blocks for more
complex patterns.
• Patterns can be written continuously or separated with commas.
• You can refer to variables declared earlier or introduce new ones.
Types of Pattern Matching
• friend-of-a-friend
• shortest path:
• collaborative filtering
• tree navigation
Patterns
27
(user)-[:KNOWS]-(friend)-[:KNOWS]-(foaf)
(root)<-[:PARENT*]-(leaf:Category)-[:ITEM]->(data:Product)
path = shortestPath( (user)-[:KNOWS*..5]-(other) )
(user)-[:PURCHASED]->(product)<-[:PURCHASED]-()-[:PURCHASED] ->(otherProduct)
Source: https://neo4j.com/developer/cypher-query-language/
© 2018 A. Alawini
Shortest path example
28
© 2018 A. Alawini
Collaborative Filtering Example
29
© 2018 A. Alawini
Outline
Map-reduce
Introduction to Neo4J
•Cypher: A Graph Query Language
Querying Nodes and Relationships using Patterns
•Manipulating Graph Data
30
© 2018 A. Alawini
Manipulating Graph Data
31
Create statement creates nodes and relationships specified in the
pattern
© 2018 A. Alawini
FOREACH Statement
32
MATCH p =(begin)-[*]->(END )
WHERE begin.name = ‘A’ AND END .name = ‘D’
FOREACH (n IN nodes(p)| SET n.marked = TRUE )
https://neo4j.com/docs/cypher-manual/current/clauses/foreach/
© 2018 A. Alawini
Cont.
33
© 2018 A. Alawini
34
© 2018 A. Alawini
DELETE, REMOVE and SET Commands
35
© 2018 A. Alawini
•acts like a combination of MATCH or CREATE
•checks for the existence of data first before creating it
Completing patterns: Merge
36
© 2018 A. Alawini
ON CREATE, ON MATCH
37
Running the above command for the first time
Running the same command for the second time
Source: https://neo4j.com/developer/cypher-query-language/
© 2018 A. Alawini
Outline
Map-reduce
Introduction to Neo4J
Cypher: A Graph Query Language
Querying Nodes and Relationships using Patterns
Manipulating Graph Data
38