King’s College London
This paper is part of an examination of the College counting towards the award of a degree. Examinations are governed by the College Regulations under the authority of the Academic Board.
Examination Period Module Code Module Title
Format of Examination: Start time:
Time Allowed Instructions:
Rubric
Summer (Period 2) 2020 7CCSMBDT
Big Data Technologies
Written questions
11am
90 minutes
You are permitted to access any materials you wish, but this is not mandated and is not expected. You may use a calculator if you find this helpful.
ANSWER ALL QUESTIONS. Question 1 carries FORTY marks. Question 2 carries SIXTY marks.
The rubric for this paper must be followed and extra answers should not be submitted. For answers that are handwritten, write with blue/black ink on light coloured paper. Include the Module code, question number and student number on every page to be submitted. For an- swers that are typed, use the template provided.
Submission Deadline: 12.30pm
Submission Process: Work must be submitted to the level 7 Informatics As-
sessments KEATS page.
Your work must be submitted as a PDF file. If you have prepared some answers on computer,
and some on paper (which have then been digitised), you may upload at most two PDF files – one for computer-prepared answers, one for digitised answers. Do not duplicate answers across the two PDFs – if you do this, the computer-prepared answer will be taken. You should check that your work displays correctly after it has been uploaded.
ACADEMIC HONESTY AND INTEGRITY
Students at King’s are part of an academic community that values trust, fairness and respect and actively encourages students to act with honesty and integrity. It is a College policy that students take responsibility for their work and comply with the university’s standards and re- quirements. Online proctoring / invigilation will not be used for our online assessments. By submitting their answers students will be confirming that the work submitted is completely their own. Misconduct regulations remain in place during this period and students can familiarise themselves with the procedures on the College website
Important: Students should copy out the following statement and include it with their submission for each examination:
I agree to abide by the expectations as to my conduct, as described in the academic honesty and integrity statement.
2020 King’s College London
Summer (Period 2) 2020 7CCSMBDT
Please answer Questions 1–2.
1. Auserwantstoperformaggregatestatisticscomputationssuchascounting certain values or computing the average of certain numbers in a very large file. The user is not willing to use a Relational Database Management System (RDBMS), due to performance issues, and is also not willing to program using the MapReduce framework (e.g., using mrjob).
a. Describe two Big Data technologies the user can use and justify why. [15 marks]
b. Describe one similarity among the two Big Data technologies of Question 1a.
[10 marks]
c. Describe two differences among the two Big Data technologies of Ques- tion 1a.
[15 marks]
Page 2
SEE NEXT PAGE
Summer (Period 2) 2020 7CCSMBDT
2.
a. Consider a stream of integers D = [7, 4, 3, 2] and the hash functions h1(x) = (2x)mod3+1 and h2(x) = (3x)mod5. We apply the FM-sketch algorithm on D using h1 and produce an estimate of the number of distinct values in D, denoted by e1. Then, we apply the same algorithm on D using h2 and produce an estimate of the number of distinct values in D, denoted by e2. Assuming the most basic type of estimate (no improvements to the estimate that is output by a single Flajolet-Martin (FM) sketch) and that the hash values (i.e., outputs of h1(x) and h2(x)) are stored using 4-bits, which estimate among e1 and e2 is the most accurate? Justify your answer, by explaining the computation of each estimate e1 and e2 in detail.
[40 marks]
b. Consider that the stream D = [7, 4, 3, 2] is held by a party A and another stream D′ = [7, 1, 5, 6] is held by a party B. Parties A and B are interested in estimating the number of distinct values in the union stream D ∪ D′. Briefly describe how this can be performed without A and B exchanging their streams.
[20 marks]
Page 3
FINAL PAGE