代写代考 PD 32 // prefetch distance (fp words)

Prefetching
Mike Bailey
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Computer Graphics

Copyright By PowCoder代写 加微信 powcoder

prefetch.pptx
mjb – March 30, 2019

Prefetching
Prefetching is used to place a cache line in memory before it is to be used, thus hiding the latency of fetching from off-chip memory.
There are two key issues here:
1. Issuing the prefetch at the right time
2. Issuing the prefetch at the right distance
The right time:
If the prefetch is issued too late, then the memory values won’t be back when the program wants to use them, and the processor has to wait anyway.
If the prefetch is issued too early, then there is a chance that the prefetched values could be evicted from cache by another need before they can be used.
The right distance:
The “prefetch distance” is how far ahead the prefetch memory is than the memory we are using right now.
Too far, and the values sit in cache for too long, and possibly get evicted.
Too near, and the program is ready for the values before they have arrived.
Computer Graphics
mjb – March 30, 2019

Prefetching in g++
void __builtin_prefetch( void *, int rw, int locality );
#define WILL_READ_ONLY #define WILL_READ_AND_WRITE
// prefetch distance (fp words)
#define LOCALITY_NONE #define LOCALITY_LOW #define LOCALITY_MED #define LOCALITY_HIGH
#define PD
for( int i = 0; i < ArraySize; i++ ) { if( (i%16) == 0 ) { __builtin_prefetch ( &a[ i+PD ], WILL_READ_AND_WRITE, LOCALITY_LOW ); __builtin_prefetch ( &b[ i+PD ], WILL_READ_ONLY, LOCALITY_LOW ); } a[ i ] = a[ i ] + b[ i ]; Computer Graphics mjb – March 30, 2019 Prefetching in icc and icpc Overall, icc and icpc seem to do a good job of prefetching without you doing anything extra, but if you want to be sure: #pragma prefetch var:which-prefetch:#vector-iterations #pragma prefetch a:0:8 #pragma prefetch a:1:32 #pragma prefetch b:1:32 for( int i = 0; i < ArraySize; i++ ) { vprefetch0 vprefetch1 There can be two memory prefetches inside a loop if( (i%16) == 0 ) { __builtin_prefetch ( &a[ i+PD ], __builtin_prefetch ( &b[ i+PD ], a[ i ] = a[ i ] + b[ i ]; WILL_READ_AND_WRITE, WILL_READ_ONLY, LOCALITY_LOW ); LOCALITY_LOW ); Computer Graphics mjb – March 30, 2019 Prefetching in Visual Studio void _m_prefetch( void * ); Loads a cache line into cache and sets the cache line state to exclusive. #define PD 32 // prefetch distance (fp words) for( int i = 0; i < ArraySize; i++ ) { if( (i%16) == 0 ) { _m_prefetch( &a[ i+PD ] ); _m_prefetch( &b[ i+PD ] ); a[ i ] = a[ i ] + b[ i ]; Computer Graphics mjb – March 30, 2019 The Effects of Prefetching on SIMD Computations Array Multiplication Example Length of Arrays (NUM): 1,000,000 Number of pairs of floats processed per SIMD call (SIMDSIZE): 4 Prefetch Distance (PD): 32 for( inti=0; iCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com