MPI代写 - PowCoder代写

IN THIS ASSIGNMENT, USE MPI TO PARALLELISE YOUR OPTIMISED STENCIL CODE, AND RUN ON ALL 16 CORES OF A BLUECRYSTAL PHASE 3 NODE

ASSIGNMENT 2:  DISTRIBUTED MEMORY PARALLELISM WITH MPI

Progress: 100%

ASSIGNMENT DESCRIPTION

Start with your optimised serial Stencil code
Use MPI “Single Program Multiple Data (SPMD)” distributed memory parallelism to run your code from 1 core up to all 16 cores of a BCp3 node
Serial optimisations will help
• You should then produce a short report discussing your findings (3-4 pages)
- Describe the optimisations that you tried and your approach to parallelism
- Explain why your optimisations improved performance
- Include an analysis of how well your code scales from 1 to 16 cores, in increments of one core

Ballpark runtimes for a flat MPI code on 16 cores are: 1024 x 1024: 0.02s
4096 x 4096: 0.6s
8000 x 8000: 2.2s

COURSEWORK SUBMISSION

• Your submission will be made via SAFE and should include: 1. A 3 to 4 page report in PDF form, which must include:

a. Your name and user id
b. A description of your MPI parallelisation;
c. Comparisons of your MPI performance vs optimised serial;
d. An analysis of the scalability of your code from 1 core up to 16 cores;

2. The working code you used to generate the results in your report.
• Your code must pass the Python check script “check.py” included in the repo.

GUIDANCE PART 1
• To achieve a good mark of 60%+:

A well-written, 3-4 page report that clearly demonstrates you understand what you did
Code that successfully uses MPI parallelism

GUIDANCE PART 2

• To aim for a first (70%+), you’ll need:

An excellent 4 page report
Code that:
Applies further MPI techniques that improve performance above those we’ve described. These may include code transformations beyond those discussed in class.

• Delivers performance on 16 cores that achieves a good fraction of STREAM memory bandwidth

• With ~6 weeks allocated to the MPI assignment, 10 hours allocated to the course each week for 10 weeks, and 4 hours per week spent in lectures and labs, don’t spend more than 6 * (10 – 4) = ~36 hours on this assignment in total (twice that for the serial assignment).

• It should only take half that time to do the simple version which, along with some interesting experiments and a decent report, should be good enough to earn 60%+

•

SUBMISSION REQUIREMENTS

• Your report which must be in a file called “report.pdf”, • Lower case r: “report.pdf” NOT “Report.pdf”

Your source code, i.e. “stencil.c”
Your makefile, called “Makefile”
An env.sh file containing any module commands you need to use to load specific compilers or anything required else to run your code properly
Don’t modify the timing code in the starting code, as we’ll use this to automatically extract timing information from each submission
We must be able to reproduce any runtimes you quote in your report by compiling and running the code that you submit
Don’t zip these files up, instead submit them as separate files in SAFE

PLAGIARISM CHECKING

The HPC assignments are all for individuals, they are not group work
We will check all submitted code for plagiarism using the MOSS online tool
- MOSS ignores the example code we give you
- MOSS will spot if any of you have worked together or shared code, so
  please don’t!
We’ll also check all submitted reports using the TurnItIn tool, which will find any shared text
So please don’t copy code or text from each other! You will get caught, and then both the copier and original provider will get a 0 for the whole assignment.

SUMMARY

Remember, you’ll get marks for:
- A well written, comprehensive, report
- An MPI code that successfully explores most of the optimisations we suggest
Have fun exploring your first parallel programs!

Related Posts