CMPSC 431W Fall 2015 Homework 4

CMPSC 431W Fall 2015 Homework 4

Assigned: Friday October 30th, 2015 Due: Mon November 9h, 2015 Total points: 100

In this homework, we will be using the dataset provided by Yelp in their Dataset Challenge (http://www.yelp.com/dataset_challenge). The dataset includes information for the businesses (not only restaurants), checkins, reviews, tips, and users. The one we will be using for this homework is the business information, which you can find the json for it on ANGEL (Homework > Homework 4 – Yelp dataset), or download with the link below: http://www.cse.psu.edu/~yul189/cmpsc431w/data/yelp_academic_dataset_business.json

It is recommended to install MongoDB on your own machine since we are dealing with bigger dataset this time. Please contact as ASAP when you face difficulty on installation.

Question 1 (5pts)

After installing MongoDB, what is the command you use to start the server? What is the command you use to start the Mongo shell?

Question 2 (5pts)

Write down the command for importing the dataset into a database called yelp, and a collection called business. Hint: use the mongoimport command. According to the message shown in the console, how many objects are imported?

Question 3 (10pts)

Try to play around with using

           db.business.find().limit(x).skip(y).pretty()

to observe what the data in the dataset looks like. Tell us what you find about the fields for the documents, as detailed as possible.

Question 4 (70pts)

Based on the following requirement of a demanding customer, find the appropriate business. Provide the name of the business and command you use to find such business. (If there’s no such result, just state so.)

  1. Adam wants to have dinner in a Chinese restaurant that has the highest review.
  2. Becky wants a restaurant where it is good for brunch, has valet parking, and good

    for kids.

  3. Caitlin wants to find the highest rated (i.e., number of stars) restaurant in

    Shadyside of Pittsburgh.

  4. Daniel wants to find a pizza place in Charlotte city that provides take-out and

    accepts credit cards.

  5. Elise wants to find a Tanning SPA that is open seven days a week, located in

    Scottsdale.

1

  1. Gabby wants to find a bar in Pittsburgh that has happy hour. But she doesn’t want it to be good for dancing because she’s terrible at it.
  2. Henry has to work late on the weekday, but he’s in an urgency to fix his car. Find an automotive place that closes the latest in Phoenix on weekdays.
  3. Saturday 8 p.m. in Las Vegas, Kate is craving for Frozen Yogurt, and in a need to use Wi-Fi. Help her find a place.
  4. Linda wants to know the number of beauty salons in each city in the dataset. Also the average rating of the beauty salons in each city.

Question 5 (10pts)

After helping out these demanding customers, what do you think of using NoSQL? Imagine having the same dataset stored in a relational database such as MySQL. What pros and cons do you think using each will have?

Bonus (3pts max for semester final grade)

By observing the dataset, write a report (two pages minimum) analyzing the dataset. You can discuss about any interesting finding from a data analysis point of view. Providing plots is the baseline requirement.

2