COMP528-JAN21 University of Liverpool
COMP528-JAN21 – Lab 6
Coordinator: Fabio Papacchini
OpenMP
Refer to recent lectures for details on compiling and running OpenMP codes. The OpenMP official website
(https://www.openmp.org) has the full spec as well as worked examples for OpenMP. We are focusing on
v4.5. You are expected to be able to use the reference materials.
Login to Barkla and obtain today’s lab file/s:
cd
tar -xzf /users/papacchf/COMP528-JAN21/labs/openmp.tgz
cd intro-openmp
If you now list the directory (using “ls”) you should see:
q1a.c q1.c q2.c q3.c run-openmp.sh
As usual we will use Barkla to edit, compile, run and debug the examples. In this lab you will
� Compile and run OpenMP codes on Barkla batch system
� Examine the scope of parallel regions
� Examine how the ‘schedule’ clause may affect performance
In the following, please do NOT request more than 40 threads nor more than 5 minutes wall
clock time
Preliminary: Login to Barkla and get the files (as above) if not already done so. For all examples you
should compile using the Intel compiler (“icc”), so please load the relevant module (you do NOT need an
MPI module). All compilations in this lab should be without optimisation (“-O0”) and using the appropriate
flag to support OpenMP (see both the lectures, and/or OpenMP official website and/or “man icc” if you
are unsure). You are not provided with all answers here, you may have to view the OpenMP lectures on
CANVAS.
We have provided a SLURM script for use with OpenMP codes. “run-openmp.sh” takes a single param-
eter (the name of your code), compiles (without optimisation) and runs a number of times that compiled
code on the number of threads passed to SLURM via the “-c” parameter. For example:
sbatch -t 2 -c 8 ./run-omp.sh myTest.c
will attempt to compile myTest.c and if successful, sets the appropriate environment variable and runs the
executable using 8 threads across 8 cores, with a maximum wallclock of 2 minutes.
What is the compile flag that ensures the OpenMP directives in the code are handled correctly by the
Intel compiler?
You are encouraged to follow all the steps in this document. You can email me your solutions
(please write “COMP528 Lab6” in the subject), and I will provide you with feedback on them.
Fabio Papacchini: Fabio. .uk 1
COMP528-JAN21 University of Liverpool
1 Scope of OpenMP Directives / Replication .v. Worksharing
(a) Examine code q1.c, and write down what output you would expect from running on 3
threads for each of the set of statements preceded by the comment
/* 1(b) – what will you see? */
(b) Using the batch script, compile and run “q1.c” in batch using 3 OpenMP threads and compare your
expectations with what happens in reality. (For now, you can ignore the output relating to work
blocks…)
(c) The second part of q1.c (lines 35 onwards) involves setting x[i] to the value computed by function
func(i) and outputting which thread is undertaking which iteration. Interactively (on the command
line), compile the code without OpenMP and write down the times that the different work
blocks take. Given the code has been compiled without support for parallelism, are the
timings as you expect?
(d) Now look at the code for WORK BLOCKS 1,2 and 3 (as commented in the code). Write down how
long you think each of these blocks will take when running on 4 threads. Compile with
OpenMP and run in batch on 4 threads to check your hypothesis. (Use the quickest set of data from
the runs.)
(e) Did you spot there is also a WORK BLOCK 4? Take a close look at the #omp for directive’s clauses
and explain the timings you observe.
(f) Compare q1a.c with q1.c, e.g.
diff -y q1.c q1a.c | less
or if you prefer a graphical interface you can try
kdiff3 q1.c q1a.c
paying close attention to the differences for the parallel region. NB look just at the initial exploration
of “myThread” and “numThreads” bearing in mind what will be in the scope of the first OpenMP
parallel region. Before compiling & running, write down the expected output from running
each of these on 3 threads. Then run to check your understanding
2 Schedule
(a) Examine q2.c and consider the nested loop (LINES 34–40). Estimate the comparative amount of work
for the iterations of “col” for the case of maxSize=6. Consider, for a “static” schedule (see lectures)
how much comparative work will happen on each thread when we run with a team of 1
or 2 or 3 threads? Repeat the calculation for the “static,1” schedule. Which do you think will
be fastest in this case?
(b) This example size (maxSize=6) would be too small to measure these differences (can you explain
more?!), but the general principle holds. Edit the code,
a. add the relevant OpenMP directive to parallelise the nested loop (without giving any schedule at
this point)
b. add another OpenMP directive towards the end of the main() function in order to output the
number of threads (use variable “numThreads”) used.
c. Run with maxSize=6 to confirm your output is correct, and then change maxSize from 6 to 9000
and comment out the “prettyPrint” functions
(c) Compile & run the code as edited in (b), using a small number of threads. Run each timing experiment a
small number (say 5) times and record the quickest of these when comparing timings between
experiments.
(d) Compare the relative performances taken with your predictions
Fabio Papacchini: Fabio. .uk 2
COMP528-JAN21 University of Liverpool
(e) To experiment with different schedules we can use the “runtime” option for the “schedule” clause
of the “omp for” directive. Edit your code to include this for the nested loop. Recompile and use
OMP SCHEDULE to try different schedules. From your estimates of the work per iteration, which
“static” schedule/s do you think will improve the run time of the code: try it.
(f) You can also try the schedules that are more dynamic in nature.
(g) Which scheduling did you find best, and why?
3 Data Clauses
(a) Compile & run q3.c (does a summation) both without OpenMP and then with OpenMP. For the
parallel version run on 1,2,3,4 and 5 threads. What do you notice?
(b) how will you fix this?
Fabio Papacchini: Fabio. .uk 3
Scope of OpenMP Directives / Replication .v. Worksharing
Schedule
Data Clauses