程序代写代做代考 algorithm database hadoop hbase graph chain clock C CS 425/ECE 428 Fall 2020 – Practice Midterm

CS 425/ECE 428 Fall 2020 – Practice Midterm
(Midterm is structured in the same way, and for same time as, regular midterm.) On-campus exam Date: Practice
Exam time For All Students: 90 minutes. Exam will be available for 6 hours once you start. Applies to both on-campus and Coursera.
Instructions:
1. Please hand in solutions that are typed (you may use your favorite word processor to start a new solutions doc, like you do with homeworks). We will not accept handwritten solutions. Figures and equations (if any) may be drawn by hand (and scanned/photographed clearly. Please ensure what you submit is clear.
2. Please submit PDF only! Please submit on Gradescope. Do not submit on Coursera.
3. Please start each problem (Q1-Q5) on a fresh page, and type your name at the top of each page (parts of each question, e.g., Q2’s a and b, don’t need to be on fresh pages).
Don’t forget to mark these pages on Gradescope.
4. This exam contains FIVE questions Q1-Q5 (pages 1-7). Please answer ALL parts of
ALL questions (no choice).
5. The exam is open-book (course material) and not proctored.
6. Calculators ok. No cellphones or other electronic devices. No web access (apart from
Course websites and Piazza).
7. Piazza during the above exam period (Oct 11 8 am US Central – Oct 13 8 am US
Central):
a. Please watch the running post (continuously updated) with midterm
clarifications.
b. On Piazza, you can only ask private questions to instructors, only about exam
questions. Please do not email questions to the staff mailing list.
c. There is a Piazza blackout on asking any questions about the course material, HWs, or MPs. During this time, please do not post anything visible to entire
class, or post anything (visible to class, or not) about course material.
8. What you CAN use: all course material provided by instructor this semester, i.e.,
slides, videos, textbook, HWs+solutions, etc.
9. What you CANNOT use: Cannot use the Web, cannot communicate with other
students (during exam, or afterwards, until we release solutions). You won’t gain any
advantage if you do these and will only waste your precious time.
10. You CANNOT communicate with other students during the exam, or afterwards.
(Note this is different from Homeworks, where you were allowed to collaborate.) 11. Answer all questions (no choice).
12. Before submitting your final doc, please make sure each question starts on a fresh
page (this helps us in Gradescope to not miss your solution!). 1

1. Short Multiple Choice or Fill in the Blanks [10 X 2 = 20 points]
There are 10 questions below. For each question, select the ONE BEST answer among the choices, or write the best short phrase to fill in the blanks. No justification required. No partial credit.
1. In an asynchronous system, which of the following is TRUE?
a. Messages are not delayed beyond a pre-fixed known latency
b. Clocks of different processes are synchronized to within a known bound
c. Failures are not allowed to occur
d. None of the above
2. A Mapreduce task is running on a cluster with 4 racks, each with 2 machines. The machines are named as S: S11, S12, S21, S22, S31, S32, S41, S42. A Map task needs to access as input a block that has replicas on machines S21, S22, S41. If the only free containers in the cluster are available at the following machines: S11 and S42, the task will be scheduled at ___________
3. A datacenter run by an upstart company called DataStones consumed about 100 kWh so far in 2015 AD. If only 80 kWh of this 100 kWh was used in running IT equipment (servers, routers, etc.), then the PUE of DataStones’ datacenter is __________________
4. A BitTorrent client C is trying to download a file with four blocks 0 through 3. C is talking to three neighboring nodes (clients) A, B, D. Client A currently is storing blocks 0, 2, 3; while B is storing blocks 0, 1, 2; while C is storing blocks 2, 3 only. C FIRST fetches the block ____________
5. In a system of three processes sending unicasts, an event e1 has a vector timestamp of (1, 2, 30) while an event e2 has a vector timestamp of (11, 21, 301). Which of the following statements is TRUE?
a. e1 happened before e2
b. e2 happened before e1
c. e1 and e2 are concurrent
d. We can’t tell what the causality relation between e1 and e2 is
6. In Hadoop, which of the following entities is responsible for scheduling decisions?
a. AM
2

b. RM
c. NM
d. M&M
e. All of the above
7. A set of Cassandra clients uses the QUORUM consistency level for both reads and writes. Then, assuming no failures:
a. Once a client C has received an ack for a write W, any subsequent reads by C will see the value written by W or subsequent writes
b. Once a client C has received an ack for a write W, any subsequent reads by any client will see the value written by W or subsequent writes
c. Both above statements are true
8. A Cassandra cluster is running a RackInferring snitch. You are reviewing a log
that contains IP addresses 111.121.213.122 and 111.122.213.122 and 111.123.213.56. These IP addresses:
a. Are the same machine
b. Are in the same rack in the same datacenter, but are different machines
c. Are in different racks but in the same datacenter
d. Are in different datacenters
9. Which of the following systems prefers availability over consistency, under partitions?
a. Cassandra
b. HBase
c. A traditional relational database (e.g., MySQL).
d. None of the above
10. A modified gossip protocol uses a spanning tree to get the multicast to half of the processes. This takes O(log(N)) time. Thereafter you have the choice of either pull or push gossip. Which will complete faster, i.e., reach a given fraction of infected processes with the same given probability, but earlier?
a. Push
b. Pull
c. Both complete at about the same speed (or time)
(Go to next page)
3

2. Lamport Timestamps [10 + 10 = 20 points]
This question has two parts.
(a) In the figure below, assign Lamport timestamps to each event. Events include
message sends and receipts (arrows) and steps (dark circles). The initial timestamp of each process is 0. You could either annotate figure or write out the sequence of timestamps for the events at each process (e.g., P0: followed by a list of 7 timestamps).
P0 P1
P2 P3
P0 P1
P2 P3
(b) Now mark vector timestamps for the same timeline.
(Go to next page)
4

3. Chord Routing [4 + 8 + 8 = 20 points]
This question has three parts.
In a Chord P2P system, there are 256 points on the ring. Six machines join, with the following peer ids: 1, 10, 100, 150, 200, 250.
(a) Indicate the successor entries of each node on the Chord ring.
(b) Show the finger table entries of machine with peer id 10.
(c) Use the Chord routing algorithm to route a message for key (file id) 2 starting from peer id 150. There is no file replication, and each peer has exactly one successor and no predecessors.
(Go to next page)
5

4. MapReduce [20 points]
You are given a social network log that captures information about all the posts on a social network during the course of a day. Each line contains tuples (a, hh:mm:ss) where a is a user id, and hh:mm:ss is the timestamp of a post made by the user. If a user has multiple posts, they will appear as separate entries. The entries are not sorted in any order.
Write a MapReduce program to find that hour of day when the maximum posts were made. Your output should be one of the integers 0-23.
You should have at least some parallelism. Ensure that your output does not contain duplicates. You can set your key and value to arbitrary objects. You cannot retain data at any of the machines from a task (Map or Reduce) for use in a later task. Chaining MapReduces is allowed, as long as you don’t over-use chaining (where parallelization could have helped instead). Each MapReduce in a chain can read the dataset, but there is no other persistent memory across the chain. Pseudocode is preferable, and it can be coarse-grained, e.g., you can say “get top k from this list by field x”, “sort this list by key x”, etc.). Be clear and concise and don’t miss any steps.
(Go to next page)
6

5. Key-Value Stores [10+10 = 20 points]
This question contains two parts.
(a) Someone tells you that Cassandra should really be using the finger tables from Chord as well. Give three reasons arguing against this proposal.
(b) Instead of using the Bloom filter along with an SSTable why not simply use a list of the keys present in that SSTable?
(END OF EXAM)
7