We discussed in class how some libraries have an optimization step in the setup call which determines the optimal configuration for a given platform. Given 4 possible implementations of matrix-matrix multiplication (serial, openMP, Cannon-MPI and Summa-MPI) determine when it is advantageous to use each. Find for a range of matrix sizes (NxN) and a range of available processors/nodes (P) which method would perform the fastest.
You are free to use whatever C source code you find online. However, cite your sources. Copied code without credit will result in zero points.
Your subroutines should use double precision floating point variables.
P should be tested from the minimum to the maximum of the open allocation (100 nodes).
NxN should be pushed to the maximum size.
This analysis must be run on the ACI-b nodes.