COMPILER OPTIMISATION III
Introduction
• In this session we will look at optimisations for the memory hierarchy (i.e. data caches).
Copyright By PowCoder代写 加微信 powcoder
• These transformations require knowledge of loops and array accesses
• doneatahigherlevelthanstandardIR
• Unimodular transformations
• Loop interchange and permutation
• Loop reversal
• Loop skewing
• Loop fusion and distribution
• Loop tiling
• Array padding
Programmer’s perspective:
• These optimisations are not done by all compilers.
• Whereas it is (relatively) easy for a compiler to work out whether a given transformation reduces the number of instructions required, it is much harder for it to predict cache misses.
• Compilers have limited scope to change memory layouts
• You may need to consider implementing this type of optimisation by hand.
Arrays and loops
• In a nest of more than one loop, loop order is important for exploiting spatial locality in caches.
• Recall that in Fortran, arrays are laid out by columns, whereas in C (and Java) they are laid out by rows.
Fortran C/Java
A(i,j) j A[i][j] ii
Loop interchange
• Loop interchange swaps the loops in a double loop nest
• Can be generalised to reordering loop nests of depth 3 or more
• loop permutation
• Compiler must determine that loop iterations can be run in any order
Loop interchange
for (j=0;j
a[i]+=b[i]; }}
a[i]+=b[i];
Loop skewing
• Loop skewing changes the shape of the loops iteration space.
• Generally does not itself improve locality, but can enable other transformations.
for (j=0;j