程序代写 Vectorisation Performance Programming Exercise

Vectorisation Performance Programming Exercise

Introduction
In these exercises we are going to investigate vectorisation performance.

Copyright By PowCoder代写 加微信 powcoder

We will use the optimisation reports from the Intel compiler to tell us about how well the compiler is vectorising things. This is done by using the compiler flag – qopt-report=5. For every source file that is compiled this will produce a text file called *.optrpt (where * o. This will have vectorisation reports of every loop in the file.
To get started with the exercises download VectorSource.tgz from learn. You should be able unpack it with the command tar zxvf VectorSource.tar.gz
We are going to use the Intel compiler for these exercises so make sure you have it loaded. You can check with the command module list which will show you the modules you currently have loaded.
Auto Vectorisation
Code restructuring
The exercise explores the automatic vectorisation the Intel compiler will perform on loops in a simple code, and the reports it gives about what it has done. There you can choose either a C or Fortran example to examine and experiment with. The C and Fortran examples are slightly different. Choose either C or Fortran and then move into the restructure directory (i.e. VectorisationSource/C/restructure or VectorisationSource/Fortran/restructure). For this exercise we aren’t going to run the code, just compile and investigate the compiler optimisation reports.
1. Go into the C directory and compiler the code by typing make.
2. Read the optimisation report that has been
produced(vectorisation.optrpt). We are primarily interested in the core computational kernel, the routine vector_add. Has the compiler managed to vectorise the code?
3. Add the restrict keyword to the function arguments of type float * in vector_add, re-compile and examine the optimisation report for the file again. Has it changed?
4. Now align the array allocates and re-compile. Examine the optimisation report, is it fully vectorising now?
5. Finally, tell the compiler that the arrays have been aligned (this can be done in a number of different ways, see the lecture slides), re-compile and examine the optimisation reports.

1. Once in the Fortran directory compile the code by typing make.
2. Read the optimisation report that has been
produced(vectorisation.optrpt). We are primarily interested in the core computational kernel, the routine vector_add. Has the compiler managed to vectorise the code?
3. Align the array allocates and re-compile. Examine the optimisation report, is it vectorising now?
4. Tell the compiler that the arrays have been aligned (this can be done in a number of different ways, see the lecture slides). Re-compile and examine the optimisation reports.
5. Now try using array notation for the vector_add kernel. Re-compile and view the vectorisation reports? Has this changed anything?
Bad Compiler Advice
In this exercise we explore what happens if you tell the compiler a loop can be vectorised that does not follow the restrictions required for vectorisation. The code to use for this exercise is in the badadvice directory. For this exercise we are going to run the code, we have provided you a batch script (runbadprog.sh for the C example and runvect.sh for the Fortran one) to do this. We have also provided a makefile to build the code.
1. Compile the code, read the optimisation reports, is the external function being vectorised?
2. Run the code by submitting a batch job (i.e. sbatch runbadprog.sh). Look at the output that is produced.
3. Add a compiler pragma above the loop in the simple_assign function to force the compiler to vectorise the loop. Re-compile and check the optimisation reports to make sure the loop is now being vectorised (note, we have not fixed the alignment so we will not get perfect vectorisation in this example).
4. Re-run the code and compare the output to the original code. Are there differences between the results for this run and the original run?
5. Add the –no-vec flag to the makefile, re-compile and re-run to make sure the difference in results is caused by the vectorisation.
6. What is the vectorisation doing in this example?
1. Compile the code, read the optimisation reports, is the vector_add_accum subroutine being vectorised?
2. Run the code by submitting a batch job (i.e. sbatch runvect.sh). Examine the output that is produced.

3. Now, force the compiler to vectorise the loop inside the vector_add_accum routine by adding this directive before the loop:!dir$ simd
4. Re-compile, check the optimisation reports to ensure it is actually vectorising that loop, and re-run the code on the KNL. Compare the output you get now to the original output, is it the same?
Explicit vectorisation
As well as using the compiler to identify and exploit vectorisation, it is possible to manually specify the vectorisation you require for the application. This can be done using vector intrinsics, that match specific vector instructions, or by using higher level programming techniques, such as the OpenMP simd directives.
In this exercise we will focus on using the higher level programming approaches, i.e. OpenMP simd.
OpenMP simd
Revisit the first exercise you undertook, the restructure exercise, but implement the vectorisation using OpenMP simd directives rather than asserting alignment to the compiler. There is a new directory, simd, with the original code for you to vectorise using the OpenMP simd command.
Note, whilst OpenMP simd will force the compiler to vectorise, you still are responsible for ensuring that arrays used are aligned, otherwise you may get incorrect results or the application may crash.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com