9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Midterm #1 (Mod 1 and 2)
Due Sep 27 at 11:59pm Points 150 Questions 26
Available Sep 21 at 12am – Sep 27 at 11:59pm 7 days Time Limit 120 Minutes
Copyright By PowCoder代写 加微信 powcoder
Instructions
Note the due date on Carmen. Extensions will only be awarded only for extreme cases, e.g., COVID, fire, hurricane, etc.
A PDF version of the Exam is provided: exam1.2020au.pdf. You will need to refer to the PDF to answer some questions.
All answers must be provided online through Carmen. Once you begin the exam on Carmen, you will have 120 minutes to complete the exam. This exam is open book and open notes. Please do *not* collaborate with other students (past or present).
You are welcome to peruse the PDF before beginning the exam. Opening the PDF from the link above (or in the Files tab) does not count as starting the exam.
The exam is entirely multiple choice. If you feel a problem is worded vaguely, try your best. After the exam, you can send me an email to address the problem. If I agree, we will discard the question or offer extra credit. To be fair to all students, clarification questions are not allowed.
Do not share the exam (PDF or screenshots) with anyone else. Doing so would be a grave violation of the academic misconduct policy.
Attempt History Attempt
LATEST Attempt 1
120 minutes
92.67 out of 150
Correct answers are hidden. Score for this quiz: 92.67 out of 150
Submitted Sep 27 at 12:17am This attempt took 120 minutes.
https://osu.instructure.com/courses/86941/quizzes/420677
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 2
Structured, relational data can comprise a part of unstructured data.
True False
Question 3
Virtualization allows multiple operating systems to run atop the same hardware.
True False
Question 1
A petabyte is 6 times larger than a gigabyte.
True False
Question 4
Data scientists write map & reduce code that manages (1) where tasks run and (2) how data is partitioned.
https://osu.instructure.com/courses/86941/quizzes/420677 2/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 5
For term co-occurence, the all-pairs algorithm discussed in class is a ____________ implementation.
Holistic Distributive Algebraic
Question 6
For term co-occurence, the all-pairs algorithm outputs smaller objects than the stripes algorithms.
True False
True False
Question 7
The key-value pairs emitted by a map task are sent directly to reduce tasks.
https://osu.instructure.com/courses/86941/quizzes/420677 3/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 8
Map tasks run concurrently (at the same time) as reduce tasks to ensure high throughput.
True False
Question 9
A network switch connects disks to memory within a blade
True False
Question 10
The Y-Axis on a Throughput-Capacity curve captures throughput for map tasks. The X-axis distinguishes map-reduce configurations.
True False
https://osu.instructure.com/courses/86941/quizzes/420677 4/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 11
11 / 11 pts
The following question references the map reduce job in the PDF. It aims to capture “love” sentiment expressed in comments about pictures posted online. Dotted lines indicate which key value pairs map to each map task. The Map Task boxes execute the code within them. There are two map tasks, each has an output.
Look closely at this example, it does not reflect a valid execution of the map stage in map reduce. Why?
The same map function must execute on each node Data can be acquired from remote nodes
The reduce stage must be specified
A7 has an invalid first line
There is no problem with the map-reduce code getFirstLine is not defined and may capture multiple lines Emit is capitalized
Map functions accept an array of values as input
Question 12
3.67 / 11 pts
The following question references the map reduce job in the PDF. It aims to capture “love” sentiment expressed in comments about pictures posted online. Dotted lines indicate which key value pairs map to each map task. The Map Task boxes execute the code within them. There are two map tasks, each has an output.
https://osu.instructure.com/courses/86941/quizzes/420677 5/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
What are the outputs of Map Instance that processes key A1? (check all that apply)
(Pic:C, [0,0])
(Pic:A, “love”)
(Pic:A, 0)
(Pic:A, 1)
(Pic:B, 1)
(Pic:B, [1,1,1,1])
(Pic:A, [1,0,0,0,1,1,1])
No output, error will occur (Pic:B, 0)
None of these answers are correct (Pic:B, “love”)
Question 13
0 / 11 pts
The following question references the map reduce job in the PDF. It aims to capture “love” sentiment expressed in comments about pictures posted online. Dotted lines indicate which key value pairs map to each map task. The Map Task boxes execute the code within them. There are two map tasks, each has an output.
https://osu.instructure.com/courses/86941/quizzes/420677 6/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 14
22 / 22 pts
Is the pseudo-code for the reduce function below correct? The goal is to compute percentage of each pic’s comments that express “love” sentiment (i.e., comments including “love” over total comments).
Reduce Class (k, v[]) {
int sum = 0;
int count = 0;
for each v in V {
sum=sum+v;
count=count+1;
Emit (k, sum/count)
True False
Consider the Map Instance that processes key A1, the inputs to a combiner executing on the same node would be? (check all that apply)
(Pic:B, 0) (Pic:C, 0) (Pic:B, [0,0]) (Pic:A, 0) (Pic:A, “Love”) (Pic:A, [1,1,0])
https://osu.instructure.com/courses/86941/quizzes/420677 7/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 15
True or false: Given the map and reduce functions above. If the number of key value pairs increased 40X, we would need to rewrite the code to use more mappers/reducers?
True False
Question 16
True or false: Given the map and reduce functions above. If we want to compute the share of “love” attributed to each pic (i.e., loved for picA over total loved for all pics), we would need to rewrite the map and/or reduce function.
True False
Question 17
The number of map tasks (i.e., map function invocations) is determined by the number of blades
True False
https://osu.instructure.com/courses/86941/quizzes/420677 8/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 19
The number of NameNodes in HDFS is determined by the number of key-value inputs
True False
Question 20
The number of reduce tasks is determined by the number of unique intermediate keys
True False
Question 18
The number of map instances is determined by the number of blades
True False
Question 21
The number of reduce tasks is determined by the number of map tasks
https://osu.instructure.com/courses/86941/quizzes/420677 9/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 22
Contextual information:
• In your company’s datacenter, blades have 500 GB disk space for storage and 24 GB RAM
• Blades in a rack are connected to a 20 port CISCO switch
• Each disk provides 500 MB/s peak throughput, DDR3 DRAM provides 15 GB/s peak throughput, the switch transmits at 1 GB/s. You have access to only 1 rack.
• Alternatively, a cloud data center provides 50 GB free disk storage and 12 GB RAM and a two-tier hierarchy for networking speeds. Rack swtiches provide 10 GB/s over 100 ports. Racks are connected with 500 MB/s over 100 ports. SSD speeds are 2 GB/s. DDR4 DRAM offers 30 GB/s.
1. Which configuration provides the maximum throughput for map tasks?
Local memory only, your company data center memory, cloud
Local memory only, cloud
rack disk, your company
rack disk, cloud
data center disk, cloud
True False
https://osu.instructure.com/courses/86941/quizzes/420677 10/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 23
Contextual information:
• In your company’s datacenter, blades have 500 GB disk space for storage and 24 GB RAM
• Blades in a rack are connected to a 20 port CISCO switch
• Each disk provides 500 MB/s peak throughput, DDR3 DRAM provides 15 GB/s peak throughput, the switch transmits at 1 GB/s. You have access to only 1 rack.
• Alternatively, a cloud data center provides 50 GB free disk storage and 12 GB RAM and a two-tier hierarchy for networking speeds. Rack swtiches provide 10 GB/s over 100 ports. Racks are connected with 500 MB/s over 100 ports. SSD speeds are 2 GB/s. DDR4 DRAM offers 30 GB/s.
If map tasks access input data stored in-memory on machines attached the same network switch, which configuration provides highest throughput?
Your company Cloud
Question 24
Contextual information:
• In your company’s datacenter, blades have 500 GB disk space for storage and 24 GB RAM
• Blades in a rack are connected to a 20 port CISCO switch
• Each disk provides 500 MB/s peak throughput, DDR3 DRAM provides 15 GB/s peak throughput, the switch transmits at 1 GB/s. You have access to only 1 rack.
• Alternatively, a cloud data center provides 50 GB free disk storage and 12 GB RAM and a two-tier hierarchy for networking speeds. Rack swtiches provide 10 GB/s over 100 ports. Racks are connected with 500 MB/s over 100 ports. SSD speeds are 2 GB/s. DDR4 DRAM offers 30 GB/s.
https://osu.instructure.com/courses/86941/quizzes/420677 11/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Question 25
Contextual information:
• In your company’s datacenter, blades have 500 GB disk space for storage and 24 GB RAM
• Blades in a rack are connected to a 20 port CISCO switch
• Each disk provides 500 MB/s peak throughput, DDR3 DRAM provides 15 GB/s peak throughput, the switch transmits at 1 GB/s. You have access to only 1 rack.
• Alternatively, a cloud data center provides 50 GB free disk storage and 12 GB RAM and a two-tier hierarchy for networking speeds. Rack swtiches provide 10 GB/s over 100 ports. Racks are connected with 500 MB/s over 100 ports. SSD speeds are 2 GB/s. DDR4 DRAM offers 30 GB/s.
4. You want to save money by buying cloud blades that use slower DDR3 RAM (like your your local company). Assuming map-tasks require more than 15,000 MB storage, will this affect throughput?
Which configuration offers the greatest storage capacity for a single map task?
Rack disk, Your company Rack disk, cloud
Question 26
Contextual information:
• In your company’s datacenter, blades have 500 GB disk space for storage and
https://osu.instructure.com/courses/86941/quizzes/420677 12/13
9/27/2020 Midterm #1 (Mod 1 and 2): AU20 CSE 3244 – Data Mgmt in Cloud (35610)
Quiz Score: 92.67 out of 150
• Blades in a rack are connected to a 20 port CISCO switch
• Each disk provides 500 MB/s peak throughput, DDR3 DRAM provides 15 GB/s peak throughput, the switch transmits at 1 GB/s. You have access to only 1 rack.
• Alternatively, a cloud data center provides 50 GB free disk storage and 12 GB RAM and a two-tier hierarchy for networking speeds. Rack swtiches provide 10 GB/s over 100 ports. Racks are connected with 500 MB/s over 100 ports. SSD speeds are 2 GB/s. DDR4 DRAM offers 30 GB/s.
5. How much would upgrading your company’s rack-level switch to support 80 ports change throughput for rack memory and rack disk configurations ?
0.5X (reduce throughput)
https://osu.instructure.com/courses/86941/quizzes/420677 13/13
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com