CS代考 King’s College London

King’s College London
This paper is part of an examination of the College counting towards the award of a degree. Examinations are governed by the College Regulations under the authority of the Academic Board.
Examination Period Module Code Module Title
Format of Examination: Start time:
Time Allowed Instructions:
Rubric
August (Period 3) 2020 7CCSMBDT
Big Data Technologies
Written questions
8 am (BST) 1 September 2020
90 minutes
You are permitted to access any materials you wish, but this is not mandated and is not expected. You may use a calculator if you find this helpful.
ANSWER ALL QUESTIONS.
Question 1 carries FIFTY marks.
Question 2 carries FIFTY marks.
The rubric for this paper must be followed and extra answers should not be submitted. For answers that are handwritten, write with blue/black ink on light coloured paper. Include the Module code, question number and student number on every page to be submitted. For an- swers that are typed, use the template provided.
Submission Deadline: 9.30am
Submission Process: Work must be submitted to the level 7 Informatics As-
sessments KEATS page.
Your work must be submitted as a PDF file. If you have prepared some answers on computer,
and some on paper (which have then been digitised), you may upload at most two PDF files – one for computer-prepared answers, one for digitised answers. Do not duplicate answers across the two PDFs – if you do this, the computer-prepared answer will be taken. You should check that your work displays correctly after it has been uploaded.
ACADEMIC HONESTY AND INTEGRITY
Students at King’s are part of an academic community that values trust, fairness and respect and actively encourages students to act with honesty and integrity. It is a College policy that students take responsibility for their work and comply with the university’s standards and re- quirements. Online proctoring / invigilation will not be used for our online assessments. By submitting their answers students will be confirming that the work submitted is completely their own. Misconduct regulations remain in place during this period and students can familiarise themselves with the procedures on the College website
Important: Students should copy out the following statement and include it with their submission for each examination:
I agree to abide by the expectations as to my conduct, as described in the academic honesty and integrity statement.
 2020 King’s College London

August (Period 3) 2020 7CCSMBDT
Please answer Questions 1–2. 1.
a. Discuss the two main types of queries that are common in a streaming environment. For each type of query, describe one application that is based on it.
[20 marks]
b. Explain why deriving a uniform random sample from a stream is challeng- ing, while performing the same task on a static dataset is not challenging.
[15 marks]
c. A business maintains the following information on customers: name, address, phone-number, and history of purchased products. The business wants to be able to efficiently update the history of purchased products. Discuss one type of NoSQL database that is appropriate for the task and explain why it is appropriate. Also, discuss one type of NoSQL database that is inappropriate for the business and explain why it is inappropriate.
[15 marks]
Page 2
SEE NEXT PAGE

August (Period 3) 2020 7CCSMBDT
2.
a. What are the different types of failures in MapReduce? How each type
of failure is addressed?
[20 marks]
b. Assume that you are given a Log file, containing the following informa- tion about visits to webpages: Date, Time, URL, IP, visit_length. The first two lines of the Log file are as follows:
2020-01-01 20:01:02 http://www.google.com 77.32.11.111 3 2020-02-02 23:01:08 http://www.google.com 77.32.11.111 7
Observe that each date is in the format YYYY-MM-DD, where Y is a digit for the year, M is a digit for the month and D is a digit for the day. You are also given a part of an mrjob program as follows:
from mrjob.job import MRJob – – – # Omitted code
class Mrmyjob(MRJob)
… # Write your code here data=line.split() date=data[0].strip() time=data[1].strip() url=data[2].strip() ip=data[3].strip() year=date[0:4] month=date[5:7]
… # Write your code here
Complete the program so that it computes the most visited webpage in each month of 2020. Describe the input, output, and objective of any function(s) you write.
[30 marks]
Page 3
FINAL PAGE