程序代写代做代考 database flex algorithm file system Microsoft PowerPoint – 24- MapReduceP2_Neo4J2pdf

Microsoft PowerPoint – 24- MapReduceP2_Neo4J2pdf

© 2018 A. Alawini

Map-Reduce
Neo4j: Graph Database

Abdu Alawini
University of Illinois at Urbana-Champaign

CS411: Database Systems

December 2, 2018

1

© 2018 A. Alawini

Announcements
•HW 5 is due on 12/9
•You are welcome 

•PT1 Stage5 final demos
• Come at least 5 minutes before your presentation
• All team members must be present during the demo.

•PT1 report and video link are tomorrow at
Noon!
•You are very welcome 

•PT2 report and slides are tomorrow at Noon!
•You are very very welcome 

2

© 2018 A. Alawini

Last week!

•NoSQL Introduction
•Relational-NoSQL Trade-offs
•MongoDB
•Model and simple queries
•Join
•Aggregation

•Updates, replication and sharding
•Map-reduce

3

© 2018 A. Alawini

MapReduce dataflow

Mapper

Mapper

Mapper

Mapper

Reducer

Reducer

Reducer

Reducer

In
p
u
t

d
at

a

O
u
tp

u
t

d
at

a

“The Shuffle”

Intermediate
(key,value) pairs

4

© 2018 A. Alawini

Simple example: Word count

Mapper
(1-2)

Mapper
(3-4)

Mapper
(5-6)

Mapper
(7-8)

(1, the apple)

(2, is an apple)

(3, not an orange)

(4, because the)

(5, orange)

(6, unlike the apple)

(7, is orange)

(8, not green)

(the, 1)

(apple, 1)

(is, 1)

(apple, 1)
(an, 1)

(not, 1)

(orange, 1)

(an, 1)

(because, 1)
(the, 1)

(orange, 1)

(unlike, 1)

(apple, 1)

(the, 1)

(is, 1)

(orange, 1)
(not, 1)
(green, 1)

(apple, 3)
(an, 2)
(because, 1)
(green, 1)

(is, 2)
(not, 2)

(orange, 3)
(the, 3)

(unlike, 1)

(apple, {1, 1, 1})
(an, {1, 1})
(because, {1})
(green, {1})

(is, {1, 1})
(not, {1, 1})

(orange, {1, 1, 1})
(the, {1, 1, 1})

(unlike, {1})

Each mapper
receives some
of the KV-pairs

as input

The mappers
process the

KV-pairs
one by one

Each KV-pair output
by the mapper is

sent to the reducer
that is responsible

for it

The reducers
sort their input

by key
and group it

The reducers
process their

input one group
at a time

Key range the node
is responsible for

1 2 3 4 5

5

Reducer
(A-G)

Reducer
(H-N)

Reducer
(O-U)

Reducer
(V-Z)

© 2018 A. Alawini

Today’s lecture

•Continue with Map-reduce
•Introduction to Neo4J
•Cypher: A Graph Query Language

•Querying Nodes and Relationships using Patterns
•Manipulating Graph Data

6

© 2018 A. Alawini

Map-Reduce examples

• Distributed grep – all lines matching a pattern

• Count URL access frequency

• Reverse web-link graph

• Inverted index

7

© 2018 A. Alawini

Designing MapReduce algorithms

• Key decision: What should be done by map, and what by reduce?
• map can do something to each individual key-value pair, but

it can’t look at other key-value pairs

• Example: Filtering out key-value pairs we don’t need
• map can emit more than one intermediate key-value pair for each incoming

key-value pair

• Example: Incoming data is text, map produces (word,1) for each word
• reduce can aggregate data; it can look at multiple values, as long as map has

mapped them to the same (intermediate) key

• Example: Count the number of words, add up the total cost, …
• Need to get the intermediate format right!

• If reduce needs to look at several values together, map
must emit them using the same key!

8

© 2018 A. Alawini

Common mistakes to avoid
• Mapper and reducer should be stateless

• Don’t use static variables – after map +
reduce return, they should remember
nothing about the processed data!

• Reason: No guarantees about which
key-value pairs will be processed by
which workers!

• Don’t try to do your own I/O!
• Don’t try to write to files in the file system
• The MapReduce framework does the I/O for you

(usually):
• All the incoming data will be fed as arguments to map

and reduce

• Any data your functions produce should be output via
emit

HashMap h = new HashMap();

map(key, value) {
if (h.contains(key)) {

h.add(key,value);
emit(key, “X”);

}
} Wrong!

map(key, value) {
File foo =

new File(“xyz.txt”);
while (true) {

s = foo.readLine();

}
} Wrong!

9

© 2018 A. Alawini

More common mistakes to avoid

• Mapper must not map too much data to the same key
• In particular, don’t map everything to the same key!!

Otherwise the reduce worker will be overwhelmed!

• It’s okay if some reduce workers have more work than others
• Example: In WordCount, the reduce worker that works on the key

‘and’ has a lot more work than the reduce worker that works on
‘syzygy’.

map(key, value) {
emit(“FOO”, key + ” ” + value);

}

reduce(key, value[]) {
/* do some computation on
all the values */

}

Wrong!

10

© 2018 A. Alawini

MapReduce and MongoDB

•Suppose we have a collection of orders that look like
the following:

11

{
_id: ”o123”,
cust_id: “abc123”,
ord_dat: new Date(“Oct 04, 2013”),
status: ‘A’,
price: 12,
items: [ {sku: “mmm”, qty: 5, price: 2.5},

{sku:”nnn”, qty: 3, price: 1.5} ]
}

© 2018 A. Alawini

Example: MapReduce and MongoDB

•Return the total cost per customer.

12

© 2018 A. Alawini

Example 2: MapReduce and MongoDB
• Return the total and average number of each item (SKU) in

orders with status “A”.

13

© 2018 A. Alawini

Outline

Map-reduce
•Introduction to Neo4J
•Cypher: A Graph Query Language

•Querying Nodes and Relationships using Patterns
•Manipulating Graph Data

14

© 2018 A. Alawini

• Many types of data can be represented as
graphs
• Road networks, with intersections as

nodes and road segments as edges

• Computer networks, with computers as
nodes and connections as edges

• Social networks, with people/postings as
nodes and edges as relationship (e.g.
friends, likes, created, …)

• Graph databases store relationships and
connections as first-class entities:
“Property Graph Model”

Neo4j: Graph Database

15

©

M

ich
ael G

rossn
iklau

s

© 2018 A. Alawini

Labeled Property Graph
Data Model

16

h
ttp

s://n
eo4

j.com
/d

evelop
er/g

rap
h
-d

atab
ase/

Node
Relationship

NodeProperties

Relationships
Nodes

© 2018 A. Alawini

• Nodes
• have a system-assigned id
• can have key/value properties
• there is a reference node

(“starting point” into the node space)

• Relationships (Edges)
• have a system-assigned id
• are directed

(but can be traversed in either direction)

• have a type – can have key/value properties
• Key/value properties

• values always stored as strings
• support for basic types and arrays of basic types

Neo4j Nodes and Relationships

17

Node

Label

Relationship

Key: Value
….

Label

Node

Label

Node

Label

©

M

ich
ael G

rossn
iklau

s

© 2018 A. Alawini

• Performance
• Graph databases have better performance when dealing with

connected data vs. relational databases and NOSQL

• Flexibility
• No need to define schema upfront

• Agility
• graph data model is schema-free
• testable graph database’s API
• query language

The Power of Graph Databases

18

Graph DB and application
evolve in an agile fashion

© 2018 A. Alawini

Outline

Map-reduce
Introduction to Neo4J
•Cypher: A Graph Query Language

•Querying Nodes and Relationships using Patterns
•Manipulating Graph Data

19

© 2018 A. Alawini

•SQL-like syntax

•Declarative pattern-matching graph query language

•Query a graph DB to find data (Nodes, Relationships,
subgraphs) that matches a specific pattern
• uses ASCII to specify a patterns

Cypher: A Graph Query Language

20

© 2018 A. Alawini

(x:Person {name: ‘Abdu’})
<-[:KNOWS]-(y:Person {name: ‘Dave’}) -[:KNOWS]->(z:Person {name: ‘Susan’)
-[:KNOWS]->(x)

• Patterns describes path that
connects a node (or set of nodes)

• Identifiers allow us to refer to the
same node more than once when
describing a pattern

• In Cypher, graphs are described
using specification by example.

Cypher Basics

21

Cypher Pattern

Variables (Identifies)

name: Abdu

name: Susan name: Dave

© 2018 A. Alawini

Cypher Queries

•Cypher is comprised of three main concepts
• MATCH: graph pattern to match, bound to the starting points
• WHERE: filtering criteria
• RETURN: what to return

•Implemented using the Scala programming language

22

©

M

ich
ael G

rossn
iklau

s

© 2018 A. Alawini

•Nodes are surrounded with parentheses

•Nodes queries

Querying Nodes

23

MATCH (node:Label)
WHERE node.property = {value}
RETURN node.property

Source: https://neo4j.com/developer/cypher-query-language/

()
(matrix)
(:Movie)
(matrix:Movie)
(matrix:Movie {title: “The Matrix”})
(matrix:Movie {title: “The Matrix”, released:
1997})

© 2018 A. Alawini

Examples

24

• Find ”Apollo 13” movie

• Find 1990’s movies

© 2018 A. Alawini

Relationships are basically an arrow –> between two nodes.

• relationship-types -[:KNOWS]-> , <-[: LIKE]- • a variable name -[rel:KNOWS]-> before the colon
• additional properties -[rel:KNOWS {since:2010}]->
• structural information for paths of variable length -[:KNOWS*..4]->

Querying Relationships

25

Source: https://neo4j.com/developer/cypher-query-language/

MATCH (n1:Label1)-[rel:TYPE]->(n2:Label2)
WHERE rel.property = {value}
RETURN rel.property, type(rel)

© 2018 A. Alawini

Example

26

© 2018 A. Alawini

• Nodes and relationship expressions are the building blocks for more
complex patterns.

• Patterns can be written continuously or separated with commas.
• You can refer to variables declared earlier or introduce new ones.
Types of Pattern Matching

• friend-of-a-friend

• shortest path:

• collaborative filtering

• tree navigation

Patterns

27

(user)-[:KNOWS]-(friend)-[:KNOWS]-(foaf)

(root)<-[:PARENT*]-(leaf:Category)-[:ITEM]->(data:Product)

path = shortestPath( (user)-[:KNOWS*..5]-(other) )

(user)-[:PURCHASED]->(product)<-[:PURCHASED]-()-[:PURCHASED] ->(otherProduct)

Source: https://neo4j.com/developer/cypher-query-language/

© 2018 A. Alawini

Shortest path example

28

© 2018 A. Alawini

Collaborative Filtering Example

29

© 2018 A. Alawini

Outline

Map-reduce
Introduction to Neo4J
•Cypher: A Graph Query Language
Querying Nodes and Relationships using Patterns
•Manipulating Graph Data

30

© 2018 A. Alawini

Manipulating Graph Data

31

Create statement creates nodes and relationships specified in the
pattern

© 2018 A. Alawini

FOREACH Statement

32

MATCH p =(begin)-[*]->(END )
WHERE begin.name = ‘A’ AND END .name = ‘D’
FOREACH (n IN nodes(p)| SET n.marked = TRUE )

https://neo4j.com/docs/cypher-manual/current/clauses/foreach/

© 2018 A. Alawini

Cont.

33

© 2018 A. Alawini
34

© 2018 A. Alawini

DELETE, REMOVE and SET Commands

35

© 2018 A. Alawini

•acts like a combination of MATCH or CREATE
•checks for the existence of data first before creating it

Completing patterns: Merge

36

© 2018 A. Alawini

ON CREATE, ON MATCH

37

Running the above command for the first time

Running the same command for the second time

Source: https://neo4j.com/developer/cypher-query-language/

© 2018 A. Alawini

Outline

Map-reduce
Introduction to Neo4J
Cypher: A Graph Query Language
Querying Nodes and Relationships using Patterns
Manipulating Graph Data

38