Discussion Session on Neo4j
Amirhossein Aleyasen
Outline
● We will learn how to load CSV files to Neo4j and create a graph from them.
● We will try 6 sample queries on the loaded data.
Load CSV to Neo4j
LOAD CSV:
LOAD CSV FROM ‘https://neo4j.com/docs/cypher-manual/3.5/csv/artists.csv’ AS line
CREATE (:Artist { name: line[1], year: toInteger(line[2])})
LOAD CSV with Header:
LOAD CSV WITH HEADERS FROM
‘https://neo4j.com/docs/cypher-manual/3.5/csv/artists-with-headers.csv’ AS line
CREATE (:Artist { name: line.Name, year: toInteger(line.Year)})
LOAD CSV from a local file
LOAD CSV WITH HEADERS FROM “file:///HW5-Dataset/business.csv” AS row
CREATE (:Business {business_id: row.business_id, city: row.city, is_open:
row.is_open,name : row.name,postal_code: row.postal_code,review_count:
row.review_count,stars: row.stars,state: row.state});
● Use file:// as protocal.
● Note that paths are relative to [NEO4j-HOME]/imports
● Use MERGE instead CREATE if many rows exist for a single node.
CREATE INDEX
We create index to ensure their quick lookup when creating relationships in the
next step.
CREATE INDEX ON :Business(business_id);
CREATE INDEX ON :Review(review_id);
CREATE INDEX ON :Review(user_id);
CREATE INDEX ON :Review(business_id);
CREATE INDEX ON :User(user_id);
Create Relationships from CSV
LOAD CSV WITH HEADERS FROM “file:///HW5-Dataset/review.csv” AS row
MATCH (business:Business {business_id: row.business_id})
MATCH (review:Review {review_id: row.review_id})
MERGE (business)-[:HAS]->(review);
Q1. Find users with more than 5 reviews.
Q1. Find users with more than 5 reviews.
MATCH (user:User)-[:REVIEWED]->(:Review)
WITH user, count(*) as reviewCount
WHERE reviewCount > 5
RETURN user, reviewCount
Q2. Find all the users with at least one 5 star
review.
Q2. Find all the users with at least one 5 star review.
MATCH (user:User)-[:REVIEWED]->(review:Review)
WITH user.user_id as user_id, review.stars as stars
where stars = “5”
RETURN distinct user_id
MATCH (user:User)-[:REVIEWED]->(review:Review{stars : “5”})
RETURN distinct user.user_id
Q3. Find all the users with at least ten 5 star
reviews.
Q3. Find all the users with at least ten 5 star reviews.
MATCH (user:User)-[:REVIEWED]->(review:Review{stars : “5”})
WITH distinct user.user_id as user_id, count(*) as fiveStarCounts
WHERE fiveStarCounts >= 10
RETURN user_id
Q4. Find all the businesses in Urbana with at
least one 1 star review.
Q4. Find all the businesses in Urbana with at least one 1 star
review.
MATCH (business:Business{city : “Urbana”})-[:HAS]->
(review:Review{stars:”1″})
RETURN distinct business.business_id
Q5. Return names of the businesses in Urbana
with at least ten 5 star review.
Q5. Return names of the businesses in Urbana with at least
ten 5 star review.
MATCH (business:Business{city : “Urbana”})-[:HAS]->
(review:Review{stars:”5″})
WITH distinct business.name as name, count(*) as c
WHERE c >= 10
RETURN name
Q6. Return names of the businesses in
Champaign that the number of their 1 star
reviews are more than the number of their 5 star
reviews.
Q6. Return names of the businesses in Champaign that the
number of their 1 star reviews are more than the number of 5
star reviews.
MATCH (review1:Review{stars:”1″})<-[HAS]-(business:Business{city : "Champaign"})-[:HAS]->(review5:Review{stars:”5″})
WITH business.name as name , count(distinct review1) as count_r1,
count(distinct review5) as count_r5
WHERE count_r1 > count_r5
RETURN name