CS计算机代考程序代写 algorithm GPU deep learning data science DSCC 201/401 Homework Assignment #2 Due: February 22, 2021 at 9 a.m. EDT

DSCC 201/401 Homework Assignment #2 Due: February 22, 2021 at 9 a.m. EDT
Answers to these questions should be submitted via Blackboard. Questions 1-5 should be answered by all students (DSCC 201 and 401) and Question 6 should be answered by students registered in DSCC 401. Please upload a file containing your answers and explanations to Blackboard (Homework #2: Hardware for Data Science) as a Word document (docx), text file (txt), or PDF.
You currently lead the data science department at a major airline company. Due to the tight constraint on financial resources because of the pandemic, the executives at the company have asked you to determine any possible cost savings derived from comparing the passenger data, flight patterns, weather data, and airplane sensor data acquired from every flight in the company’s fleet over the past few years with recent data from the past few months. In order to complete this task, you would like to purchase some high performance computing infrastructure so you can analyze the data and make successful recommendations. The company has collected about 20 TB of data over the past few years, and you would like to apply machine learning algorithms to make predictions about which flight routes and patterns will be the most cost effective for the airline company to maintain. You expect to receive new data on a monthly basis and would like to develop consistent workflows to refine your models over time. Based on the project description and requirements, you recommend purchasing hardware to build a Linux cluster. A vendor has provided you with a quote for a server with a good starting configuration. The quote (HPC Quote.pdf) has been uploaded to the Blackboard site and is available under the instructions for this homework assignment. The quote reflects the quantity of items and price for 2 servers. Please provide thorough and thoughtful explanations to the following questions.
Question 1: Your team decides to purchase the 2 servers described by the specifications provided in the quote. How many teraFLOPS of theoretical computing capacity will be provided by a Linux cluster with these servers as compute nodes? Provide a double-precision floating point (FP64) value. Please show your reasoning and how you derive your values. Hint: The Intel Xeon Platinum 8260 CPUs use the Cascade Lake microarchitecture.
Question 2: After further review, your team decides that it may not be essential to include the Nvidia Volta V100 GPUs in each server to provide additional acceleration for the computations. What would be the total computing capacity of the 2 servers (in teraFLOPS) if the GPU cards were removed from the systems? Provide a value based on double-precision floating point calculations (FP64). What does the NVLink in the server specification provide for the system?
Question 3: If your team purchases these 2 compute nodes, will this equipment create a complete Linux cluster solution? Does it have everything needed to connect and work together? Why or why not? Can you think of any missing hardware components that should be ordered to make it fully operational that can be used by your whole team? Is there any

additional hardware needed to make these 2 servers into a complete Linux cluster? Is there any additional software that may be needed to make the Linux cluster complete? Please be detailed and specific in your answer.
Question 4: In lecture, we discussed the architecture of the Sierra supercomputer. This system has 2,880 nodes each with 6 Nvidia Volta V100 GPUs. According to IBM, the total computing capacity provided by the GPUs of one node is about 42 teraFLOPS (FP64). If IBM replaced the Volta V100 GPUs with the newly announced Ampere A100 GPUs, what would be the total estimated theoretical performance (in petaFLOPS) of the Sierra supercomputer? Provide both an FP64 and FP16 estimate for performance. Would this system exceed the processing capabilities of the current top supercomputer in the world? Show how you derived your answer.
Question 5: According to Slide 27 of Lecture 3, the Nvidia DGX A100 server has an advertised performance of “5 petaFLOPS AI.” What is the precision of the computations used to measure this performance? Show how you derived your answer. Hint: Read the data sheet for the A100 GPU available here: https://www.nvidia.com/content/dam/en-zz/Solutions/Data- Center/a100/pdf/nvidia-a100-datasheet.pdf
Question 6: (DSCC 401 ONLY): Read the paper, “Benchmarking TPU, GPU, and CPU Platforms for Deep Learning,” by Wang, Wei, and Brooks. A copy of this paper has been uploaded to the Blackboard site and is available under the instructions for this homework assignment (Wang_Wei_Brooks.pdf). Provide detailed answers to the following questions:
A. What is ParaDnn?
B. What is a TPU? The authors of the paper had special access to Google’s new TPU v3. What is the performance of this accelerator as mentioned in the paper? How does the theoretical performance of Google’s TPU v3 compare with Nvidia’s theoretical performance of the Ampere A100 GPU?
C. Did the authors measure the performance on multi-GPU systems that use PCIe or NVLink in this study? If so, how many GPUs did they use concurrently? If not, what was the explanation?