CS代写 PERFORMANCE MODELLING AND ADVANCED MPI

PERFORMANCE MODELLING AND ADVANCED MPI
1. Introduction
The Message Passing Interface, MPI, is a specification for a number of methods and some data types that support a particular model of parallel programming. All data is shared using explicit messages, that is to say we don¡¯t have any shared memory space. Each processor can send to any other processor, and messages must be explicitly received. If we want to synchronise our processes, we must use a barrier. Some MPI operations will have the side of effect of creating a barrier.
This lab looks at benchmarking communications and the derived data types with MPI. Clusters usually have a job scheduler and a queueing system to ensure a fair usage policy can be applied and that the resources are shared fairly.

Once again, as with the previous lab sessions, the cluster you¡¯ll be running on is called Kudu. The compute nodes in the cs402 partition on the cluster have dual socket 6-core processors, therefore each node has 12 cores in total.
At the end of this lab exercise you will have analysed the performance differences associated with the use of blocking and non-blocking communications; and looked at programming with derived datatypes.
This lab makes use of the OpenMPI Library [1]; Slurm.
3. Exercises
3.1. MPI Blocking and Non-Blocking Communications.
Inside the directory provided for the second coursework assignment (Karman), a file named pingpong.c is given; this program will be used for the first half of the lab exercises. The ping pong program consists of two processes that continually send/receive messages between each other. Within the program, calls to MPI_Send and MPI_Recv, which are the blocking send and receive functions, are used to send and receive increasingly larger messages.
Firstly, run the program using 2 MPI processes; ensure that both processes are located in different compute nodes. When you first execute the program, it will output the total time elapsed when sending and receiving each message size for iters number of times.

2 PERFORMANCE MODELLING AND ADVANCED MPI
Task 1 – Change the program so that it outputs the average time for send- ing and receiving each message size. Using an appropriate tool (such as an Excel spreadsheet), plot the average times for all message sizes. Observe the plot, in par- ticular paying attention to the shape that these data points (average times) form. From the plotted figure, try to obtain ts (the message startup time/overhead) and tw (the transfer time for one byte of data) that were presented in the lectures on performance modelling.
Task 2 – If we keep increasing the size of the message being sent ( by changing the value of the variable maxsize in the ping pong program), what message size will cause the average time for sending and receiving such size of message to in- crease ¡°out of proportion¡± when compared with the increase of previous message sizes. What can such a message size tell you, based on the knowledge you have gained from lectures? (Hint: consider what you have learnt about the system buffer in MPI.)
Task 3 – Change the calls made to blocking send and receive within the ping pong program to calls to the non-blocking send and receive variants. Compare the output times with those of the program using the blocking variants.
3.2. Programming with Derived Datatypes.
In lectures, we learnt that the segment of code displayed in Listing 1 can be used to construct the derived datatype shown in Figure 1; this example is comprised of 3 consecutive blocks of data elements having datatype MPI_FLOAT, with each block containing 2 data elements, and finally the start of each successive block being separated by 4 data elements.
MPI_Datatype floattype;
MPI_Type_vector (3, 2, 4, MPI_FLOAT, &floattype);
MPI_Type_commit (&floattype);
MPI_Send (data, 1, floattype, dest, tag, MPI_COMM_WORLD);
MPI_Type_free (&floattype);
Listing 1: MPI code used to construct the derived datatype displayed in Figure 1
In C/C++, multi-dimensional arrays/matrices are stored in memory using the row-major layout. That is, the first row of the array is placed in contiguous mem- ory, followed by subsequent rows (in order); hence, in memory, the last element of a row is followed by the first element of the subsequent row. Assuming we have a 20 ¡Á 10 matrix (20 rows, 10 columns) of MPI_DOUBLE numbers (which we will refer to as matrix a), consider a common scenario that might occur whereby a

PERFORMANCE MODELLING AND ADVANCED MPI 3
Figure 1. The memory layout of data to be sent.
process needs to send columns of the matrix, rather than rows. In this case, think about how a process could send the elements in the order of columns to another process (namely, the process should send all of the elements in the first column, followed by sending the 2nd column, 3rd, etc.). Before you begin experimenting with derived datatypes, we will walk through this example with you.
One of the key things to recognise is that each column consists of 20 elements, and each element is spaced 10 elements apart in memory. Therefore, we can construct a derived datatype to deal with this using the following lines of code:
MPI_Datatype MPI_column;
MPI_Type_vector(/* count= */ 20,
/* blocklength= */ 1,
/* stride= */ 10,
MPI_DOUBLE,
&MPI_column);
Next, we can use this derived datatype to exchange columns of the matrix between processes. The first column of the matrix can be sent using the following statement:
MPI_Send(a, 1, MPI_column, …);
Sending subsequent columns requires adjusting the starting address of the first block of data. For instance, in order to send the second column we would specify the starting address like so:
MPI_Send(&(a[0][1]), 1,MPI_column, … );
Knowing this, sending all of the columns would simply require a loop that iter-
ates over the matrix.
Task 4 – Suppose there is a matrix a of size 48 ¡Á 48, and we need to send the elements a[4*i][4*i] (i = 0, …, 11). Write a program (with 2 MPI pro- cesses) that sends these elements from one process to another using a single transfer

4 PERFORMANCE MODELLING AND ADVANCED MPI
(MPI_Send).
Task 5 – Write a new program that can be used to send the elements of a[i][j] (i, j = 0, …, 11) using a single MPI_Send. Namely, the elements:
a[0][0], a[0][4], a[0][8], …, a[0][40], a[0][44],
a[1][0], a[1][4], a[1][8], …, a[1][40], a[1][44],
a[44][0], a[44][4], a[44][8], …, a[44][40], a[44][44]
If it were not for the use of derived datatypes, think about how we would have had to send the elements in Task 4 and Task 5. By comparison, we now hope that you understand how the derived datatype provides convenience, alongside a performance benefit when we are sending non-contiguous data.
References
[1] Open MPI Documentation. https://www.open-mpi.org/doc/ (accessed February 08, 2021), 2020.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts