CS525 Parallel Computing Midterm 1
Submission Instructions
1. Submission Deadline No extension will be allowed.
2. Reminder: The exam will be open book open notes, but completely individual -students are NOT allowed to consult anyone else. To answer each question in the exam, please feel free to make reasonable and clearly stated assumptions and write them down at the beginning of your answer. For example, you can assume that the mesh network has no wrap-around if this is not clearly stated in the question’s text. Piazza will be temporarily silent except for Instructors to post announcement. There will be no office hours during the exam. If you have any questions regarding the exam, please email both the Instructor AND GTA. In this tex file, I use %% To students: Insert your answer here.” to hold place for answer. Feel free to remove or modify them if necessary.
Copyright By PowCoder代写 加微信 powcoder
3. How to submit:
Once you have access to Gradescope, you will find “Assignments” on the sidebar.
(a) “Midterm 1 PDF submission”:
This is the place where you’ll submit your PDF file that contains your answer to each question. You can generate the PDF file using either the latex or word processor e.g. Microsoft Word.
CS525 Spring Midterm 1 Parallel Computing
Problem 1 (20 points)
(a) Consider a datacenter with 16 racks. These racks are connected as a 2-D Mesh with each link operating at 100 Gb/s (Gigabits per second). Each racks has 32 blades with a complete crossbar with each link operating at 16 Gb/s. Each blade has 2 sockets connected on a bus operating at 16 Gb/s. Each socket has 20 cores – the cores are connected through a bus operating at 1 Gb/s. If all processors communicate across the bisection, how much bandwidth would each communicating pair of processors have?
CS525 Spring Midterm 1 Parallel Computing
(b) Repeat Problem (a) except in this case, the cores are connected through a crossbar in which each link operates at 1 Gb/s and the 2 sockets are connected through a bus operating at 32 Gb/s.
Problem 2 (20 points)
(a) Consider following code:
1: for(i=0;i<1000000;i++)do 2: sum+ = a[i] + b[i]
3: end for
CS525 Spring Parallel Computing
Assume that each a[i] and b[i] are one word. Assume a system with 64KB of L1 with latency of 1 cycle and DRAM latency of 100 cycles. Assume a cache line of 4 words. Assume a processor operating at 1 GHz capable of executing 1 FLOP each cycle. Assume that the memory does not support multiple outstanding reads. What is the peak FLOPs of this code?
CS525 Spring Midterm 1 Parallel Computing
(b) Repeat Problem (a) when the memory allows 4 outstanding reads.
CS525 Spring Midterm 1 Parallel Computing
Problem 3 (20 points)
(a) It takes 400 ns for a message of size 100 words to ping-pong between two processors. It takes 2200 ns for a message of size 1000 words to ping-pong between two processors on the same computer. What is the effective value of ts and tw on this computer?
CS525 Spring Midterm 1 Parallel Computing
(b) Assume the computer from Problem (a). In this case for a configuration of 16 pro- cessors, when 8 processors exchange messages with 8 processors on the other side of the bisection, the time for a 1000 word message is 4200 ns for a ping-pong. When using a configuration of 64 processors this time is 8200 ns. What is the bisection of the platform as a function of number of processors.
CS525 Spring Midterm 1 Parallel Computing
Problem 4 (20 points)
(a) Assume that a computer has a 1 cycle L1 and 100 cycle DRAM with no support for multiple outstanding reads. On this computer when multiplying 1024 × 1024 matrices, the cache hit ratio is 90%. When multiplying 512 × 512 matrices, the cache hit ratio is 97%. What are the computation rates for these two matrix sizes.
CS525 Spring Midterm 1 Parallel Computing
(b) Consider a threaded version of the program to multiply two 1024 × 1024 matrices. The matrices are broken into 2 × 2 blocks.
• Thread 1 computes A1,1 × B1,1. • Thread 2 computes A1,2 × B2,1. • Thread 3 computes A1,1 × B1,2. • Thread 4 computes A1,2 × B2,2. • Thread 5 computes A2,1 × B1,1. • Thread 6 computes A2,2 × B2,1. • Thread 7 computes A2,1 × B1,2. • Thread 8 computes A2,2 × B2,2.
Thread 1,3,5,7 compute C1,1, C1,2,C2,1 and C2,2 respectively. Assume a 75% cache hit ratio for computing this sums. How long does it take for the threaded version to execute? What is the speedup (S = T1/T8) of this threaded formulation.
CS525 Spring Midterm 1 Parallel Computing
Problem 5 (20 points)
(a) The communication pattern of your computation is a 2-D mesh. Would you prefer a 2-D mesh with links operating at 100 MB/s or a hypercube with links operating at 100 MB/s in the following cases for p = 32.
i A best case mapping of processes to processors. ii A worst case mapping of processes to processors.
CS525 Spring Midterm 1 Parallel Computing
(b) Repeat Problem (a) with mesh links operating at 1 GB/s and hypercube links at 100 MB/s for p = 1024.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com