MPI代写 ECE6105 – Introduction to High Performance Computing Homework 6

The George Washington University

ECE6105 – Introduction to High Performance Computing Homework 6
Due November 20th, end of the day

In this homework, you are required to implement your own version of MPI_Allgather and compare its performance against MPI version of it.

Obtain necessary files, and inspect them
1. Files are attached to the assignment on blackboard. Download the tarball, scp it to
  pyramid and untar it
2. You should see two files all_gather.c and Makefile inside folder hw6
  i. all_gather.c has the main function implemented for you, also it has an empty function My_Allgather, which you are supposed to fill in.
  1. The main function runs the appropriate function, measures the time it takes and reports the timing
  2. Main function also does a basic checksum validation. This is supposed to catch some errors in your implementations. i.e. A checksum validation doesn’t necessarily mean your code is functioning correctly.
  ii. Makefile creates 4 executables if you simply run make
  1. base_debug: Uses MPI_Allgather, and reports the resulting array
  
  on each rank after the operation
  2. base_perf: Uses MPI_Allgather, doesn’t report the resulting array 3. custom_debug: Uses My_Allgather, and reports the resulting array
  
  on each rank after the operation
  4. custom_perf: Uses My_Allgather, doesn’t report the resulting array
  
  iii. All the executables above expect an integer command line argument that is the size of the local array. So, for example you should be able to run:
```
               mpirun -np 8 ./custom_perf 1000
```
  To measure your implementation’s performance
Implement the My_Allgather function.
1. You are not supposed to make any changes in the function signature itself.
2. You are not supposed to make any changes in the main function or the Makefile
3. Do not use any collectives (such as MPI_Gather)
4. You can use custom_debug executable to see the resulting array on each rank.
  This should help you debug any issues that you may have. Furthermore, make
  
  sure that you don’t see any checksum errors.
5. You can use your local MPI installation until you are confident that you have the
  correct implementation

2. Do a weak scaling study on Pyramid.

You are supposed to create job submission scripts appropriately and use them for
this task
Use 500000 elements per rank
Start from 1 node (8 ranks in pyramid) and go up to 32 ranks. You may omit odd
number of ranks. i.e. You can use 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32 ranks

i. This will require you to manage a lot of files. I recommend being principled when choosing file/job names.

For each number of ranks, run your implementation and base implementation with the exact same settings
Record time results of every run

3. Create a report including the following material/discussions:

A line plot showing the difference between your implementation and the base implementation
Pyramid has Infiniband QDR network, which has theoretical peak of 8 Gbit/s data transfer. Given your timing results, add a plot showing the efficiency of both implementations in terms of network utilization with different number of ranks
1. This should practically show how much of this bandwidth was utilized
2. You may use the aggregate data size reported by the benchmark as the
  data size. Note that the benchmark reports in bytes, whereas the network
  
  throughput uses gigabits.
A brief discussion about what you could have done to improve performance, if
your version doesn’t outperform the MPI version. Or what may be the causes of lower performance in your implementation.

i. If your version outperforms MPI version, describe what you have done better. (I don’t expect this to happen unless you spend significant effort, my initial naïve implementation performed 4 times worse then the MPI version)

4. Submit your code and report on Blackboard
a. tar czvf hw6.tar.gz hw6 command will create “hw6.tar.gz” from the

folder “hw6”

Related Posts