IN THIS ASSIGNMENT, USE MPI TO PARALLELISE YOUR OPTIMISED STENCIL CODE, AND RUN ON ALL 16 CORES OF A BLUECRYSTAL PHASE 3 NODE
ASSIGNMENT 2: DISTRIBUTED MEMORY PARALLELISM WITH MPI
Progress: 100%
ASSIGNMENT DESCRIPTION
- Start with your optimised serial Stencil code
- Use MPI “Single Program Multiple Data (SPMD)” distributed memory parallelism to run your code from 1 core up to all 16 cores of a BCp3 node
- Serial optimisations will help
• You should then produce a short report discussing your findings (3-4 pages)
- Describe the optimisations that you tried and your approach to parallelism
- Explain why your optimisations improved performance
- Include an analysis of how well your code scales from 1 to 16 cores, in increments of one core
Ballpark runtimes for a flat MPI code on 16 cores are: 1024 x 1024: 0.02s
4096 x 4096: 0.6s
8000 x 8000: 2.2s
COURSEWORK SUBMISSION
• Your submission will be made via SAFE and should include: 1. A 3 to 4 page report in PDF form, which must include:
a. Your name and user id
b. A description of your MPI parallelisation;
c. Comparisons of your MPI performance vs optimised serial;
d. An analysis of the scalability of your code from 1 core up to 16 cores;
2. The working code you used to generate the results in your report.
• Your code must pass the Python check script “check.py” included in the repo.
GUIDANCE PART 1
• To achieve a good mark of 60%+:
- A well-written, 3-4 page report that clearly demonstrates you understand what you did
- Code that successfully uses MPI parallelism
GUIDANCE PART 2
• To aim for a first (70%+), you’ll need:
- An excellent 4 page report
- Code that:
Applies further MPI techniques that improve performance above those we’ve described. These may include code transformations beyond those discussed in class.
• Delivers performance on 16 cores that achieves a good fraction of STREAM memory bandwidth
• With ~6 weeks allocated to the MPI assignment, 10 hours allocated to the course each week for 10 weeks, and 4 hours per week spent in lectures and labs, don’t spend more than 6 * (10 – 4) = ~36 hours on this assignment in total (twice that for the serial assignment).
• It should only take half that time to do the simple version which, along with some interesting experiments and a decent report, should be good enough to earn 60%+
•
SUBMISSION REQUIREMENTS
• Your report which must be in a file called “report.pdf”, • Lower case r: “report.pdf” NOT “Report.pdf”
- Your source code, i.e. “stencil.c”
- Your makefile, called “Makefile”
- An env.sh file containing any module commands you need to use to load specific compilers or anything required else to run your code properly
- Don’t modify the timing code in the starting code, as we’ll use this to automatically extract timing information from each submission
- We must be able to reproduce any runtimes you quote in your report by compiling and running the code that you submit
- Don’t zip these files up, instead submit them as separate files in SAFE
PLAGIARISM CHECKING
- The HPC assignments are all for individuals, they are not group work
- We will check all submitted code for plagiarism using the MOSS online tool
- MOSS ignores the example code we give you
- MOSS will spot if any of you have worked together or shared code, so
please don’t!
- We’ll also check all submitted reports using the TurnItIn tool, which will find any shared text
- So please don’t copy code or text from each other! You will get caught, and then both the copier and original provider will get a 0 for the whole assignment.
SUMMARY
- Remember, you’ll get marks for:
- A well written, comprehensive, report
- An MPI code that successfully explores most of the optimisations we suggest
- Have fun exploring your first parallel programs!