COMP528 Assignment Resits (2018/19)
Dr. Michael K. Bane
Overview
• 4 assignments, each worth 10% of total
your letter will indicate which (if any) assignments you are expected to resit
• Resits
questions comparable to original, testing same learning etc
you will get lots of hints and help by going back to the lab work and to previous assignments
all codes to be written in C, compiled and benchmarked on Chadwick
standards of academic integrity expected (as per original) – reports may go through “TurnItIn”
for automatic checking
will be marked on the code & report, for correctness & understanding of the topics
• Submission
each assignment as a single zip file (comprising Report & code & any scripts (plus any
supporting evidence you wish))
submission to SAM: 91 for resit#1, 92 for resit#1, 93 for resit#3, 94 for resit#4
DEADLINE for all submissions: 10am, Friday 9th August 2019
Assignment 1: MPI
Assignment #1 Resit
• Testing knowledge of
• parallel programming & MPI & timing via a batch system
• TASK: least squares regression – parallelisation using MPI
• https://www.mathsisfun.com/data/least-squares-regression.html
• for set of discrete points (x[i], y[i]), the best linear fit y=mx + b using given equations (next slide) to determine m & b
• write two C codes to determine m and b for a given input set of x,y
i. A serial test code
ii. One using MPI parallelism
• use the Intel compiler and compile with no optimisation ‘-O0’
• time the section of the code (running in batch) that finds m & b, and do this on various number of MPI processes and discuss your findings e.g. in terms of speed-up and parallel efficiency (and Amdahl’s Law)
Assignment #1 Resit
• Remember:
• can parallelise where lots of
independent work
• MPI is single code with each process having its own “rank” (useful to split up work?)
https://www.mathsisfun.com/data/least-squares-regression.html
• MPI provides “Reduction” calls e.g. for doing summation over processes and storing result on “root” process (or on all processes)
• MPI provides timing MPI_Wtime function, and the wall-clock time is the difference between two consecutive calls to MPI_Wtime
• that N may not be equally divisible by the number of MPI processes (available via MPI_Comm_size function)
Assignment #1 Resit
• Data
• suggestion: use a small set of input data (x,y), to check you are getting the correct answer (serially and for any number of MPI processes); once all good, then use data for the assignment (as below). Remember to use the batch system to undertake your timings for different numbers of MPI processes
• Assignment data: • N=100,000
• x[i] = (float)i/1000.0 for i=1 to i=99,999 note we start at i=1 and go to N-1
• y[i] = sin(x[i]/500.0) / cos(x[i]/499.0 + x[i]) you will need to include
Assignment #1 Resit
• Code
• Submit both serial & MPI code • Submit any scripts used
• Report: up to 3 pages
• Discussion of your approach & of your results
• Give command that you use to • Compile
• Submitandrunyourparallelcode
• The equation of the best fit straight line
• Marking
• Correctness of codes: 50% • Explaining/understanding parallel principles & MPI: 25% • Discussion of results: 25%
Assignment 2: OpenMP
Assignment #2 Resit
• Testing knowledge of
• parallel programming & OpenMP & timing via a batch system
• TASK: least squares regression – parallelisation using OpenMP
• (see Assignment#1 for detailed description)
• for set of points discrete points (x[i], y[i]), the best linear fit y=mx + b using given equations (next slide) to determine m & b
• use the same assignment data as described for Assignment#1 Resit
• write a C code to determine m and b for a given input set of x,y that uses
OpenMP work-sharing constructs to parallelise the work
• use the Intel compiler and compile with no optimisation ‘-O0’
• time the section of the code (running in batch) that finds m & b, and do this on various number of OpenMP threads and discuss your findings e.g. in terms of speed-up and parallel efficiency (and Amdahl’s Law)
Assignment #2 Resit
• Remember:
• can parallelise where lots of independent work
• OpenMP is single code with fork-join parallel regions in which each thread having its own thread number. Typically parallelise at the ‘for’ loop level
• OpenMP provides a “Reduction” clause e.g. for doing summation over processes and storing result on “master” thread
• OpenMP provides timing omp_get_wtime function, and the wall-clock time is the difference between two consecutive calls
• OpenMP loop parallelisation can have different “schedules” which may be useful for irregular work distribution between threads
• You can use compiler flags to ignore all OpenMP.
Assignment #2 Resit
• Code
• Submit OpenMP code
• Submit any scripts used
• Report: up to 3 pages
• Discussion of your approach & of your results
• Give command that you use to • Compile
• Submitandrunyourparallelcode
• The equation of the best fit straight line
• Marking
• Correctness of code: 50% • Explaining/understanding parallel principles & MPI: 25% • Discussion of results: 25%
Assignment 3: GPU Programming
Assignment #3 Resit
• Testing knowledge of
• parallel programming of GPUs
• TASK: discretization using GPU
• Function f(x) = exp(x/3.1) – x*x*x*x*18.0
• You need to discretize this between x=0.0 and x=60.0 and find the minimum using 33M points
• Write a C-based code with an accelerated kernel written in either CUDA or using OpenACC directives; the code should
• timeaserialruncomprisingsettingvaluesandthenfindingminimum(i.e.allontheCPU)
• timeanacceleratedrunwithvaluessetontheGPU,passedbacktoCPUandthe minimum found on the CPU
Assignment #3 Resit
• Reminder for CUDA
• write C + CUDA kernel in file e.g. myCode.cu (note the .cu suffix)
• compile (on login node):
module load cuda-8.0
nvcc -Xcompiler -fopenmp myCode.cu
• debug running in batch
qrsh -l gputype=tesla,h_rt=00:10:00 -pe smp 1-16 -V -cwd ./a.out
• timing run in batch (hogging all GPU & CPU cores for yourself)
qrsh -l gputype=tesla,exclusive,h_rt=00:10:00 -pe smp 16 -V -cwd ./a.out
• For openACC
• please see lecture notes
Assignment #3 Resit
• Code
• Submit code and any scripts used
• Report: up to 3 pages
• Discussion of your approach & of your results
• includinghowspeedratioofGPUtoCPU
• notingwhetheryouincludeGPUmemory&datacosts(andwhateffectthiswouldhave)
• Give command that you use to
• Compile,submitandrunyourparallelcode
• Value of the minimum of f(x[i]) and for which value of x[i] this occurs
• Marking
• Correctness of code: 40% • Explaining/understanding parallel principles & GPUs: 30% • Discussion of results: 30%
Assignment 4: hybrid programming
Assignment #4 Resit
• Testing knowledge of
• parallel programming & hybrid MPI+OpenMP parallelism
• TASK: hybrid MPI+OpenMP parallelisation of galaxy formation
• using the C code “COMP528-assign4-resit.c” provided in Sub-Section “Resit
Assignments” at https://cgi.csc.liv.ac.uk/~mkbane/COMP528/
• add MPI and OpenMP to accelerate the simulation (including, if appropriate, the initialisation); as per the original assignment, use MPI to parallelise at a coarse grained level (dividing the number of bodies (variable “BODIES”) between the number of processes) and each MPI process then using OpenMP to parallelise its work
• use the Intel compiler and compile with optimisation flag ‘-O2’
• time the section of the code (running in batch) that simulates the movement of the galaxies, and do this on various number of MPI processes & OpenMP threads
Assignment #4 Resit
• Code – submit MPI+OpenMP code & any scripts used • Report: up to 3 pages
• Discussion of your approach & of your results
• howyoudeterminedwhattoparallelise&explainwhyyouchosethegivenparallelisationmethod • theresults(accuracy,speed-up,parallelefficiency)
• whichcombinationofMPI/OpenMPyoufoundtobethefastest
• Include a paragraph on what you would need to scale the number of BODIES by 100 orders of magnitude (and keep run time about the same)
• e.g.isBarklabigenough?isCPUtheonlyoption? • State commands that you use to
• Compile,submit,run&timeyourcodetogettimingdatapresented • Marking
• Code: 30% • Explaining/understanding parallel principles used: 25% • Discussion on scaling by 100 orders of magnitude: 20% • Discussion of results: 25%
• Good luck
• Ask if any questions!
• Michael Bane, G14 Ashton m.k.bane@Liverpool.ac.uk
Skype: https://join.skype.com/invite/m49PHwnmVmo2