FIT3143 Lab Week 6
Lecturers: ABM Russel (MU Australia) and (MU Malaysia)
FURTHER MESSAGE PASSING INTERFACE
OBJECTIVES
Copyright By PowCoder代写 加微信 powcoder
● Design and develop parallel algorithms for various parallel computing architectures
● Analyse and evaluate the performance of parallel algorithms
INSTRUCTIONS
● Download and set up software applications used in this unit [Refer to Lab Week 1]
● Setup eFolio (including Git) and share with tutor and partner [Refer to Lab Week 1]
DESCRIPTION:
● Become familiar with MPI
● Practice parallel algorithm design and development
● Practice buffer transfer between MPI processes
● Analyse the performance of a parallel algorithm
WHAT TO SUBMIT:
1. E-folio document containing algorithm or code description, analysis of results, screenshot of the running programs and git repository URL. E-folio template for this lab can be found in Week 06 of Moodle.
2. CodeandsupportingfilesintheGit.
3. This is an assessed lab. Therefore, you are required to submit both the E-folio document, code file(s) and text files into Moodle. Submission link is available in Week 06 of Moodle. Each student makes a submission. Although you are working in a team of two (or three) members and your submitted files will be the same within a team, each team member is required to make a submission independently in Moodle.
EVALUATION CRITERIA
This Lab-work is part of grading, with 10 maximum marks, which is then scaled to 2 percentage points of the overall unit marks.
Activity 1 Activity 2 Activity 3
Code compiles without errors and executed correctly (2 marks)
Sufficient code comments
Questions or instructions fully answered
Proper tabulation of results and analysis (2 marks)
LAB ACTIVITIES (10 MARKS)
1. Sending in a ring (broadcast by ring)
Write an MPI program that takes data from process zero and sends it to all other processes by sending it in a ring. That is, process i should receive the data and send it to process i+1, until the last process is reached. The last process then sends the data back to process zero. Each MPI process prints out the received data. Use a loop to repeat the cycle until a sentinel value is specified to exit the program.
Process size-1
The following starter code is provided. This code is incomplete. Complete the code. Compile and execute the code using at least four MPI processes. Observe and display your results.
#include
#include
int main(int argc, char **argv)
int rank, s_value, r_value, size;
MPI_Status status;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
if (rank == 0) {
printf(“Enter a round number: “);
fflush(stdout);
scanf( “%d”, &s_value );
// Add your code here
printf( “Process %d got %d from Process %d\n”,
rank, r_value, size – 1);
fflush(stdout);
// Add your code here
printf( “Process %d got %d from Process %d\n”,
rank, r_value, rank – 1);
fflush(stdout);
} while (r_value >= 0);
MPI_Finalize( );
return 0; }
2. Prime Search using Message Passing Interface Let’s revisit the prime search problem which you previously worked on. To recap, in Week 03, you were required to:
a) WriteaserialCprogramtosearchforprimenumberswhicharelessthananinteger, n, which is provided by the user. The expected program output is a list of all prime numbers found, which is written into a single text file (e.g., primes.txt).
b) Measurethetime(i.e.,ts)requiredtosearchforprimenumberslessthananinteger, n when n = 10,000,000. Calculate the theoretical speed up using Amdahl’s law when p = 4, with p = number of processes (or threads).
c) WriteaparallelversionofyourserialcodeinCutilizingPOSIXThreads.Here,design and implement a parallel partitioning scheme which distributed the workload among the threads. Compare the performance of the serial program (ts) in part (a) with that of the parallel program (tp) in part (b) for a fixed number of processors or threads (e.g., p = 4). Calculate the actual speed up, S(p).
In this week’s lab activity, we shall continue from where we left off in Week 03. In detail:
d) ImplementaparallelversionofyourserialcodeinCusingMessagePassingInterface (MPI).
i) The root process will prompt the user for the n value (e.g., n = 10,000,000). The specified n value will then be disseminated to other MPI processes to calculate the prime numbers. Each process (including root process) computes
the prime number (based on the equal or varied workload distribution per node) and writes the computed prime number into text files.
The name of the text files should include the node rank value (e.g., process_0.txt, process_1.txt, process_2.txt, etc.). Execute the compiled program using at least four MPI processes (i.e., p = 4) using your virtual machine or physical computer.
ii) Measure the overall time required to search for prime numbers less than an integer, n when n = 10,000,000. Compare your results with the serial version
in part (a) and calculate the actual speed up (i.e., 𝑠!”#$!% (𝑝) = #! ). Analyse and #”
compare the actual speed up with the theoretical speed up for an increasing number of MPI processes and/or n. You can either tabulate your results or plot a chart when analysing the performance of the serial and parallel programs.
iii) In addition, write your observation comparing the actual speed up between part (d) and part (c) whether the speed up using MPI is any better than using POSIX threads.
e) Modify part (d) such that each MPI process returns the computed prime numbers to the root process.
i) The root process receives the computed prime numbers and prints this number into a single text file. Note that only the root process writes the computed prime numbers to a text file. All other MPI processes (inclusive of root process) compute the prime numbers based on the assigned workload distribution.
ii) Repeat parts (d)(ii) and (d)(iii) steps for Part (e) and write your observation.
Note: You may opt to use different algorithms to search for prime numbers. However, please ensure that you implement both the serial and parallel versions of the algorithm in C based on the aforementioned specifications.
3. Executing the prime search algorithm on the CAAS Cluster
Note: Before attempting the instructions below, please refer to the available resources in Week 5 of Moodle on how to access the cluster as a service (CAAS) high-performance computing platform. We encourage you to review the CAAS tutorial video and to try out the sample source code and job scripts, which are all available in Week 5 of Moodle.
Instructions:
Based on your serial and parallel code implementations of the prime search algorithm:
i) Compile and execute the serial code on the CAAS platform. Measure the time
taken to complete the task (i.e., ts).
ii) Compile and execute the parallel code using POSIX threads on the CAAS platform. Measure the time taken to complete the task (i.e., tp) for 16 threads on a single compute node.
iii) Compile and execute the parallel code using MPI on the CAAS platform. Measure the time taken to complete the task (i.e., tp) for 16 MPI processes on a single compute node (i.e., single server).
iv) Repeat part (iii) with 32 MPI processes on two compute nodes (i.e., a server cluster with 16 MPI processes on a single server).
Include screenshots of step i) to step iv) executions on the CAAS platform and tabulate all actual speed ups.
Hints: You need to revise the code to no longer prompt the user for the value of n at runtime. Instead, the value of n is now passed as a command line argument into the application.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com