CS代考 COMP5349: Cloud Computing Sem. 1/2022

School of Computer Science Dr.
COMP5349: Cloud Computing Sem. 1/2022
Objectives
Week 6: II

Copyright By PowCoder代写 加微信 powcoder

The objective of this lab is to help you identify common structures in simple exploratory workloads. Many exploratory workloads are or include variations of the basic word count- ing problem and can be implemented in a similar way.
A word counting problem consists of a map stage and a reduce stage. The map stage outputs many (word, 1) pairs by sequentially scanning the input. The reduce stage groups these pairs by key, which is the word, and sum all values. This simple strategy can be used in many settings with keys referring to different features of interest. Both questions in this week’s exercises are variations of the basic word count problem.
Question 1: Bigram counting
A bigram refers to a sequence of two adjacent elements. At word level, a bigram is a two word sequence from an input, which could be a line or a sentence. For instance, the sentence “the quick brown fox jumped over the lazy dog” has the following 8 bigrams:
• (the, quick)
• (quick, brown) • (brown, fox)
• (fox, jumped)
• (jumped, over) • (over, the)
• (the, lazy)
• (lazy, dog)
You are asked to find the top 5 bigrams in the 1984 processed.txt file based on their occurrences. The file contains ’s novel 1984, preprocessed by removing stop words and punctuation. Each sentence occupies a line. Bigrams occur only within one line. The text has been converted to lowercase.
31.03.2022

Question 2: Movie yearly statistics
The movies.csv file contains movie information. Each row represents one movie and has the following format: movieId,title,genres. A sample row looks like this:
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
The title field always contains both the title and the year the movie is released. The release year is put in a pair of brackets after the actual title. If a movie belongs to many genres, the genres are listed in alphabetic order and are separated by pipeline characters.
We are interested in Sci-Fi movies. Each year, a number of Sci-Fi movies are released. Some year may have more Sci-Fi movies released than other years. You are asked to find out the year when most Sci-Fi movies are released.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com