MapReduce代写: COMP931

COMP931 3 2017 s 2 Assignment

Question 1.. MapReduce ( 5 pts )

Assume that in an online shopping system,, a huge log file stores the information of each transaction.. Each line of the log is in format of “ userID \ t product \ t price \ t time ” . Your task is to use MapReduce to find out the top – 5 most expensive products purchased by each user in 2016 .

You only need to write down the pseudo code ( Mapper,, Reducer,, and optionally Combiner and Partitioner ) to describe the algorithm ( assume only one reducer ) . Note that the efficiency and scalability of your solution will be evaluated .

Question 2 . MinHash ( 5 pts )

We want to compute min – hash signature for two columns,, C 1 and C 2 using two ps e udo – random permutations of columns using the following function::

h 1 (nn)) = 3n + 2 mod 7

h 2 (nn)) = 2 n – 1 mod 7

Here,, n is the row number in original ordering.. Instead of explicitly reordering the columns for each hash function,, we use the implementation discussed in class,, in which we read each data in a column once in a sequential order,, and update the min hash signatures as we pass through them..

Complete the steps of the algorithm and give the resulting sign atures for C 1 and C 2 .

Question 3 . Streaming Data (55 pts )

Suppose we are maintaining a count of 1s using the DGIM method.. We represent a bucket by (ii,, t)),, where i is the number of 1s in the bucket and t is the bucket timestamp (ttime of the most recent 1))..

Consider that the current time is 200,, window size is 60,, and the current list of buckets is:: (116,, 148)) (88,, 162)) (88,, 177)) (44,, 183)) (22,, 192)) (11,, 197)) (11,, 200)).. At the next ten clocks,, 201 through 210,, the stream has 0101010101.. What will the sequence of buckets be at the end of these ten inputs??

Question 4 . Collaborative Filtering ( 5 pts )

Consider four users u 1 , u 2 , u 3 and u 4 , and three movies m 1 , m 2 , and m 3 . The ratings of movies from the users are as below::

(aa ) Estimate the rating of u 1 to m 2 using the user – user collaborative filtering method (aadopt the cosine similarity measure to compute the user similarities))..

(bb ) Estimate the rating of u 1 to m 2 using the i tem – item collaborative filtering method (aadopt the cosine similarity measure to compute the item similarities))..

S ubmission::

Deadline:: Sunday 5 th Nov 09::559::559 PM

Please provide your solutions to these questions in a pdf file named as “ answers..ppdf ” . Log in any CSE server (ww illiams or wagner)),, and u se the give command below to submit your solutions :

$ give cs9313 assignment 5 answers..ppdf

O r you can submit through::

https::////ccgi..ccse..uunsw..eedu..aau//~~ggive//SStudent//ggive..pphp

If you submit your assignment more than once,, the last submission will replace the previous one.. To prove successful submission,, please take a screenshot as assignment submission instructions show and keep it by yourself..

Late submission penalty

You will receive zero marks for this assignment..

Plagiarism :

The work you submit must be your own work.. Submission of work partially or completely derived from any other person or jointly written with any other per son is not permitted.. The penalties for such an offence may include negative marks,, automatic failure of the course and possibly other academic discipline.. Assignment submissions will be examined manually.. Relevant scholarship authorities will be informed if students holding scholarships are involved in an incident of plagiarism or other misconduct.. Do not provide or show your assignment work to any other person – apart from the teaching staff of this subject.. If you knowingly provide or show you r assignment work to another person for any reason,, and work derived from it is submitted you may be penalized,, even if the work was submitted without your knowledge or consent..