University of Stirling
Computing Science and Mathematics CSCU9YQ – NoSQL Databases
Assignment – A MongoDB Database for Movie Reviews
Welcome! We hope you like movies as much as we do. You are given a dataset in JSON format containing information about a set of movies produced since the early 1900’s until 2018. The dataset includes features such as genre, countries, actors, director, etc, as well as awards and ratings according to up three movie rating agencies. A description of the movies’ plot is also provided.
You are encouraged to use both the Mongo shell and Compass. For the aggregations, you can use either aggregation pipelines (preferably) or map-reduce commands.
Part 1: Load Data and Discuss Data Modelling
1. In the MongoDB University Server1, create a collection called ‘MovieData’. Using an import command, fill the collection with the dataset given in the file ‘movieData.json’, available on Canvas under Units/Assignment. Report the import command you used.
2. What are the advantages of modelling this dataset using MongoDB in contrast with a Relational Model? Justify your answer [Max Word Count: 100]
3. Can you use any other NoSQL database to model this data? If so, what would be the advantages of this Model in comparison with using a Document Model? [Max Word Count: 100]
Part 2: Queries
For the following questions provide both the query command used and the output produced.
1. Which genre has received the largest number of awards?
2. What are the top five movie genres most commonly found in the collection?
3. List all the movies where the UK has participated, and which have won at least two awards. Show only the movie title, year and number of awards. Sort them by the number of awards (descending).
4. List the movies with at least one nomination where “Salma Hayek” is one of the actresses. Show only the movie title and sort them by year (descending).
1 To connect to the University MongoDB server from your own computer you will need to download MongoDB Community Server version 3.4.19 from https://www.mongodb.com/download-center/community
Follow the installation guidelines. To connect to the server, use the same command we have been using in the practicals: mongo –host mqr0.cs.stir.ac.uk -u
1
Part 3: Queries with Discussion
For these questions provide the query command, the output produced, as well as an answer to the question asked.
1. The movies have been reviewed by up to three rating agencies: IMDb, Rotten Tomatoes, and Metacritic. Find the top 10 movies according to each of these rankings and list them. Show only the title, year and rating. Provide a discussion of your results, for example, do the different ratings agree? Are some ratings biased in some way? [Max Word Count for Discussion: 100]
2. List the movies where there is a discrepancy larger than 40 points between the rotten tomatoes ‘meter’ (Critic ratings) and ‘userMeter’ (Viewer ratings). Sort them by the value of the difference in Ratings. Provide a discussion of your results. For example, can you find any patterns in the year released, the countries of production, actors, etc. on these lists of movies? [Max Word Count for Discussion: 100]
3. Recommend five films related to dogs. List only the title and year. Discuss how you planned and designed your query to better answer this question. [Max Word Count for Discussion: 100]
Submission
● The submission should be done using Canvas.
● The official submission deadline is Friday 15th March. However, the server will remain open
without penalty until Sunday 17th March at midnight. After that, the official penalty per
delayed submissions (3 marks over 100) per day will be applied.
● The submission consists of a single file in Word, PDF, RTF or TXT format containing the
answers to the stated questions.
Marking Scheme
The assignment is worth 25 marks distributed as follows:
● Part 1: 7 marks, 1 for loading the data, 3 for each data modelling question.
● Part 2: 8 marks, 2 marks for each correct query and results.
● Part 3: 10 marks, 3 for questions 1 and 2, 4 for question 4. Marks will be given for the
correct query and results, as well as for the insights of your discussion.
2