MPI代写

IN THIS ASSIGNMENT, USE MPI TO PARALLELISE YOUR OPTIMISED STENCIL CODE, AND RUN ON ALL 16 CORES OF A BLUECRYSTAL PHASE 3 NODE

ASSIGNMENT 2:
 DISTRIBUTED MEMORY PARALLELISM WITH MPI

Progress: 100%

ASSIGNMENT DESCRIPTION

  • Start with your optimised serial Stencil code
  • Use MPI “Single Program Multiple Data (SPMD)” distributed memory parallelism to run your code from 1 core up to all 16 cores of a BCp3 node
  • Serial optimisations will help

    • You should then produce a short report discussing your findings (3-4 pages)

    • Describe the optimisations that you tried and your approach to parallelism
    • Explain why your optimisations improved performance
    • Include an analysis of how well your code scales from 1 to 16 cores, in increments of one core

Ballpark runtimes for a flat MPI code on 16 cores are: 1024 x 1024: 0.02s
4096 x 4096: 0.6s
8000 x 8000: 2.2s

COURSEWORK SUBMISSION

• Your submission will be made via SAFE and should include: 1. A 3 to 4 page report in PDF form, which must include:

a. Your name and user id
b. A description of your MPI parallelisation;
c. Comparisons of your MPI performance vs optimised serial;
d. An analysis of the scalability of your code from 1 core up to 16 cores;

2. The working code you used to generate the results in your report.
• Your code must pass the Python check script “check.py” included in the repo.

GUIDANCE PART 1
• To achieve a good mark of 60%+:

  • A well-written, 3-4 page report that clearly demonstrates you understand what you did
  • Code that successfully uses MPI parallelism

GUIDANCE PART 2

• To aim for a first (70%+), you’ll need:

  • An excellent 4 page report
  • Code that:

    Applies further MPI techniques that improve performance above those we’ve described. These may include code transformations beyond those discussed in class.

• Delivers performance on 16 cores that achieves a good fraction of STREAM memory bandwidth

• With ~6 weeks allocated to the MPI assignment, 10 hours allocated to the course each week for 10 weeks, and 4 hours per week spent in lectures and labs, don’t spend more than 6 * (10 – 4) = ~36 hours on this assignment in total (twice that for the serial assignment).

• It should only take half that time to do the simple version which, along with some interesting experiments and a decent report, should be good enough to earn 60%+

SUBMISSION REQUIREMENTS

• Your report which must be in a file called “report.pdf”, • Lower case r: “report.pdf” NOT “Report.pdf”

  • Your source code, i.e. “stencil.c”
  • Your makefile, called “Makefile”
  • An env.sh file containing any module commands you need to use to load specific compilers or anything required else to run your code properly
  • Don’t modify the timing code in the starting code, as we’ll use this to automatically extract timing information from each submission
  • We must be able to reproduce any runtimes you quote in your report by compiling and running the code that you submit
  • Don’t zip these files up, instead submit them as separate files in SAFE

PLAGIARISM CHECKING

  • The HPC assignments are all for individuals, they are not group work
  • We will check all submitted code for plagiarism using the MOSS online tool
    • MOSS ignores the example code we give you
    • MOSS will spot if any of you have worked together or shared code, so

      please don’t!

  • We’ll also check all submitted reports using the TurnItIn tool, which will find any shared text
  • So please don’t copy code or text from each other! You will get caught, and then both the copier and original provider will get a 0 for the whole assignment.

SUMMARY

  • Remember, you’ll get marks for:
    • A well written, comprehensive, report
    • An MPI code that successfully explores most of the optimisations we suggest
  • Have fun exploring your first parallel programs!