Performance Programming Coursework 1
Introduction
The overall aim of the Performance Programming coursework is to take a serial application and improve its performance on the compute nodes of ARCHER2. The coursework is split into two parts, with the first part focussed on optimising the application using the compiler, and the second part focussed on hand-optimising the code itself. This document outlines coursework 1, optimising the application using the compiler.
Copyright By PowCoder代写 加微信 powcoder
Both pieces of coursework are assessed through a written report detailing the work undertaken and performance achieved. Note that the target platform is ARCHER2 and its associated software stack. If you do not already have access to ARCHER2 please contact the course organiser.
We will be using a simple molecular dynamics, which simulates the movement of particles over time. The starting source code is available on Learn and called is MD_2022.tar.
There are both C and Fortran versions of the code available. You should select one of these versions for use in both of the assignments for this course, and work only on that version.
Running the program
As provided the program reads an initial state from the file input.dat and then performs 5 blocks of 100 timesteps writing an output file after each block. The output files are in the same format as the input file so you can use any output file as an input for a shorter performance test that performs less than 500 iterations. The code reports timing information for each block of 100 timesteps and for the loop over blocks that includes file access operations.
Checking correctness
Note that optimising the code may change the floating point results slightly, so a simple diff on output files is not a useful verification test. The subdirectory Test contains a C program which, when compiled, can be used to test that two output files from the MD code are the same to within an acceptable tolerance. The syntax for this is:
diff-output file1 file2
This program will not detect the presence of NaN values in the input so you should test for these explicitly, either by extending the diff-output program, or creating a small program or script to check the output yourself.
In addition, very small numerical differences will be magnified over time, particularly once the particles start to collide, so the verification test is unlikely to pass for more than 200 time-steps from a common starting point. The verification test is intended as a guide rather than a definitive test of correctness so you need to give some thought to how you test for correctness. We suggest building tests using blocks of 100 iterations (timesteps) from a region of the simulation after the particles have started to collide.
Assignment
The assignment for coursework 1 is to produce a report (around 5 pages including figures) on optimising the application using the compiler activity. The report may contain additional appendices if you wish, though coursework 1 assessment will be based on the main report. The report should present the results of your work investigating and improving the performance of this code using the compiler only. The source code is provided with Makefiles for the C and Fortran versions of the code, but these Makefiles may not include the optimal compiler flags and options for this application. For coursework 1 your task is to investigate improving performance of the application on ARCHER2 using different compiler flags, and potentially different compilers, to attempt to get the best performance possible without altering the code itself.
The report should outline the compiler(s) and compiler flag(s) you have chosen, and the performance achieved with those compiler flags. This should be an iterative process, with you investigating the effect of different levels of compiler optimisation on the performance and correctness of the application. You should summarise which compiler flags you would suggest using for the application based on the experiments you have undertaken.
Normal performance optimisation procedure would be to start with profiling the application to obtain information about performance. However, for this coursework, where we are restricting ourselves to purely optimising through the compiler, you do not need to profile the application.
You should remember, that, as with all performance reports, you should also document the environment you are running your tests in (i.e. what hardware are you using, what compilers, etc…) and also make sure your results are reproducible by running any benchmarking multiple times. You can report whatever number you wish (average, minimum, maximum) providing you state what you did in your report and it is consistent. File I/O times do not need to be considered and can be omitted from timing results.
This coursework is marked on the report you submit, so the report should be a stand-alone document including discussions of the compiler flags chosen at the performance observed for different compiler flags. You may also experiment with different compilers on ARCHER2 to evaluate performance across compilers.
Marking scheme
The report will be marked on:
• Discussion of the experiments undertaken with the compiler and the performance achieved
with the different compiler flags and/or compilers. Discussion of the what the important
optimisation flags do for the given compiler(s) (70).
• Methodology used in the assignment as demonstrated in the report. This includes general
approach, tools used etc. (20).
• Clarity, relevance and presentation of the report (10).
Coursework is due at 16:00, 2th March 2022 (UK Time)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com