5/2/2021 Exam 03 Short Answer: CMPSC 450 Spring 2021
Exam 03 Short Answer
Due Apr 21 at 8:50am Points 9 Questions 3
Available Apr 21 at 8am – Apr 21 at 8:50am about 1 hour Time Limit None
This quiz was locked Apr 21 at 8:50am.
Attempt History
Attempt Time Score
LATEST Attempt 1 35 minutes 1 out of 9 *
* Some questions not yet graded
Correct answers are hidden.
Score for this quiz: 1 out of 9 * Submitted Apr 21 at 8:50am This attempt took 35 minutes.
Question 1
0 / 3 pts
Given a 2-D Mesh network with 256 nodes. The network has a latency of 1 ms and a bandwidth of 20 GB/s. How long (in seconds) would it take to perform a broadcast of 30 GB of data to all nodes?
Your Answer:
16 *16 * 4 * 1 = 1024ms = 1.024s 30 / 20 = 1.5s
1.024+1.5 = 2.524s
https://psu.instructure.com/courses/2109084/quizzes/4153103
1/4
5/2/2021 Exam 03 Short Answer: CMPSC 450 Spring 2021
Mesh, 0.001, 20GB/s, 30GB = 45.03 30 hops (diameter = 2 * (sqrt(p) – 1)) Tcomm = hops * (alpha + data/bandwidth)
Question 2
1 / 3 pts
Below is a snippet of code that is to run in an MPI environment. Assume that the code is syntactically correct. What is the expected output from the following code if run on 4 processors?
int Assign_vals(int my_rank) {
switch (my_rank){ case 0: return 4; case 1: return 3; case 2: return 5; case 3: return 2;
} }
main() {
int p, my_rank, x, m, ml, q, t;
MPI_Init(NULL, NULL) MPI_Comm_size(MPI_COMM_WORLD, &p) MPI_Comm_rank(MPI_COMM_WORLD, &my_rank)
x = Assign_vals(my_rank);
/* Every process can print to stdout */ printf(“Proc %d > x = %d\n”, my_rank, x);
if (my_rank == 0) {
https://psu.instructure.com/courses/2109084/quizzes/4153103
2/4
5/2/2021 Exam 03 Short Answer: CMPSC 450 Spring 2021
m = x; ml = 0;
for (q = 1, q < p, q++) {
MPI_Recv(&t, 1, MPI_INT, q, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE)
if (t > m) {
m = t; ml = q;
printf(“m = %d, ml = %d\n”, m, ml); } else {
MPI_Send(&x, 1, MPI_INT, 0, 0, MPI_COMM_WORLD) }
MPI_Finalize(); } /* main */
Your Answer:
Proc 0 > x = 4 Proc 1 > x = 3 Proc 2 > x = 5 Proc 3 > x = 2 m = 3, ml = 14
} }
// This is a greater than reduction Proc 0 > x = 4 Proc 1 > x = 3 Proc 2 > x = 5 Proc 3 > x = 2 m = 5, ml = 2 */
Question 3
0 / 3 pts
https://psu.instructure.com/courses/2109084/quizzes/4153103
3/4
5/2/2021 Exam 03 Short Answer: CMPSC 450 Spring 2021
Quiz Score: 1 out of 9
Determine the approximate execution time of running Cannon’s matrix multiplication algorithm using the following information:
Matrix size: 1000000 x 1000000. Data Type: double. Network Latency: 10 microseconds. Network Bandwidth: 10 GB/s. Nodes: 100, 10-core Xeon Processors, running at 4 GHz. Each with 256 GB of RAM. Assume the processors can complete 1 add and 1 multiply per clock and have a memory bandwidth of 40 GB/s.
Break your answer into the following parts:
1) Short description of the phases of the algorithm.
2) Evaluation of execution time of each phase of the algorithm. 3) Final answer for total execution time of the algorithm.
Your Answer:
1) Cannon algorithm is an algorithm for optimizing matrix partitioning multiplication, which is a storage efficient algorithm
2) O( (n^3)/p ) = 10^3
3)
https://psu.instructure.com/courses/2109084/quizzes/4153103
4/4