OPTIMISING WITH THE COMPILER
• Introduction
• Optimisation techniques • compiler flags
Copyright By PowCoder代写 加微信 powcoder
• compiler hints
• code modifications
• Optimisation topics • locals and globals
• conditionals
• data types
• register use and spilling • loop unrolling/pipelining • inlining
Introduction
• Unless we write assembly code, we are always using a compiler.
• Modern compilers are (quite) good at optimisation • memory optimisations are an exception
• Usually much better to get the compiler to do the optimisation.
• avoids machine-specific coding
• compilers break codes much less often than humans
• Even modifying code can be thought of as “helping the compiler”.
Compiler flags
• Typical compiler has hundreds of flags/options. • most are never used
• many are not related to optimisation
• Most compilers have flags for different levels of general optimisation.
• -O1, -O2, -O3,….
• When first porting code, switch optimisation off.
• only when you are satisfied that the code works, turn optimisation on, and test again.
• but don’t forget to use them!
• also don’t forget to turn off debugging, bounds checking and profiling flags…
Compiler flags (cont.)
• Note that highest levels of optimisation may • break your code.
• give different answers, by bending standards.
• make your code go slower.
• Always read documentation carefully.
• Isolate routines and flags which cause the problem. • binary chop
• one routine per file may help
Compiler flags (cont.)
• Many compilers are designed for an instruction set architecture, not one machine.
• flags to target ISA versions, processor versions, cache configurations
• defaults may not be optimal, especially if cross-compiling • Some optimisation flags may not be part of -On
• check documentation
• use sparingly (may only be beneficial in some cases)
Compiler hints
• A mechanism for giving additional information to the compiler, e.g.
• values of variables (e.g. loop trip counts) • independence of loop iterations
• independence of index array elements
• aliasing properties
• Appear as comments (Fortran), or preprocessor pragmas (C)
• don’t affect portability
Incremental compilation
• Compilers can only work with the limited information available to them.
• Most compilers compile code in an incremental fashion
• Each source file is compiled independently of each other.
• Most compilers ignore all source files other than those specified on the
command line (or implicitly referenced via search paths, e.g. include files)
• Routines from other source files treated as “black-boxes” • Make worst case assumptions based on routine prototype.
• You can help by providing more information
• Information in routine prototypes • INTENT PURE const etc.
• Compiler hints
• Command line flags
Code modification
• When flags and hints don’t solve the problem, we will have to resort to code modification.
• Be aware that this may
• introduce bugs.
• make the code harder to read/maintain.
• only be effective on certain architectures and compiler versions.
• Try to think about
• what optimisation the compiler is failing to do
• What additional information can be provided to compiler • how can rewriting help
• How can we work out what the compiler has done? • eyeball assembly code
• use diagnostics flags
• Increasingly difficult to work out what actually occurred in the processor.
• superscalar, out-of-order, speculative execution
• Can estimate expected performance
• count flops, load/stores, estimate cache misses • compare actual performance with expectations
Locals and globals
• Compiler analysis is more effective with local variables
• Has to make worst case assumptions about global
• Globals could be modified by any called procedure (or by another thread).
• Use local variables where possible
• Automatic variables are stack allocated: allocation is
essentially free.
• In C, use file scope globals in preference to externals
Conditionals
• Even with sophisticated branch prediction hardware, branches are bad for performance.
• Try to avoid branches in innermost loops.
• if you can’t eliminate them, at least try to get them out of
the critical loops.
• Simple example: do i=1,k
if (n .eq. 0) then a(i) = b(i) + c
endif end do
if (n .eq. 0) then
a(i) = b(i) + c
• A little harder for the compiler…..
if (i .le. j) then
a(i) = b(i) + c else
a(i) = b(i) + c
do i = j+1,k
Data types
• Performance can be affected by choice of data types
• often a difference between 32-bit and 64-bit arithmetic (integer and floating point).
• complicated by trade-offs with memory usage and cache hit rates
• Avoid unnecessary type conversions • e.g. int to long, float to double
• N.B. some type conversions are implicit
• However sometimes better than the alternative e.g.
• Use DP reduction variable rather than increase array precision.
• Compilers are generally good at Common Subexpression Elimination.
• A couple of cases where they might have trouble: Different order of operands
d=a+c e=a+b+c
Function calls
d = a + func(c)
e = b + func(c)
CSE including function calls.
• To extract a CSE containing a function call the compiler has to be sure of various things:
• The function always returns the same value for the same input.
• The function does not cause any side effects that would be effected by changing the number of times the function is called:
• Modifying its inputs. • Changing global data.
• Need to be very careful with function prototypes to allow compiler to know this.
Register use
• Most compilers make a reasonable job of register allocation.
• But only limited number available.
• Can have problems in some cases:
• loops with large numbers of temporary variables
• such loops may be produced by inlining or unrolling
• array elements with complex index expressions
• can help compiler by introducing explicit scalar temporaries, most compilers will use a register for an explicit scalar in preference to an
tmp = c[0];
for (i=0;i