计算机代写 STEPS 100

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Multicore Block Data Decomposition: 1D Heat Transfer Example
You have a steel bar. Each section of the bar starts out at a different temperature. There are no incoming heat sources or outgoing heat sinks (i.e., ignore boundary conditions). Ready, go! How do the temperatures change over time?
The fundamental differential equation here is: T 2T C t k(x2 )

Copyright By PowCoder代写 加微信 powcoder

ρ is the density in kg/m3
C is the specific heat capacity measured in Joules / (kg ∙ °K)
k is the coefficient of thermal conductivity measured in Watts / (meter ∙ °K)
= units of Joules/(meter∙ sec ∙ °K)
Computer Graphics
In plain words, this all means that temperatures, left to themselves, try to even out. Hots get cooler. Cools get hotter. The greater the temperature differential, the faster the evening-out process goes.
mjb – March 15, 2020
Computer Graphics
Data Decomposition

Data_decomposition.pptx
mjb – March 15, 2020
Numerical Methods:
Changing a Derivative into Discrete Arithmetic
How fast the 𝜕􏰃𝑇 temperature is = changing within the bar 𝜕𝑥􏰃
𝑇􏰄􏰅􏰆 − 2𝑇􏰄 + 𝑇􏰄􏰁􏰆 (∆𝑥)􏰃
How much the
temperature changes
over time 𝜕𝑡
Computer Graphics
mjb – March 15, 2020
T 2T C t k(x2 )
How much the temperature changes in the time step
 k Ti1 2Ti Ti1  Ti  2 t
Multicore Block Data Decomposition: 1D Heat Transfer Example
Computer Graphics
k 2T  ( )
 C  x  
Physical properties of the material
How fast the temperature is changing within the bar
As a side note: the quantity k/(ρC) has the unlikely units of m2/sec!
mjb – March 15, 2020
1D Data Decomposition: Partitioning Strategies
Ti-1 Ti Ti+1
Core #1 Core #2 Core #3
On a shared memory multicore system, the obvious approach is to allocate the data as one large global-memory block (i.e., shared).
You will actually need two such arrays, one to hold the current temperature values that you are reading from and one to hold the next temperature values that you are writing to.
Computer Graphics
mjb – March 15, 2020
1D Data Decomposition: Partitioning
#include
#include
#include
#define NUM_TIME_STEPS 100
#ifndef NUMN #define NUMN #endif
#ifndef NUMT #define NUMT #endif
16 // total number of nodes
4 // number of threads to use
#define NUM_NODES_PER_THREAD
( NUMN / NUMT )
Temps[2][NUMN];
int Now; // which array is the “current values“= 0 or 1 int Next; // which array is being filled = 1 or 0
DoAllWork( int );
Computer Graphics
mjb – March 15, 2020

Allocate as One Large Continuous Global Array
Ti-1 Ti Ti+1
omp_set_num_threads( NUMT ); Now =0;
for( int i = 0; i < NUMN; i++ ) Temps[Now][ i ] = 0.; Temps[Now][NUMN/2] = 100.; double time0 = omp_get_wtime( ); #pragma omp parallel default(none) shared(Temps,Now,Next) int me = omp_get_thread_num( ); DoAllWork( me ); double time1 = omp_get_wtime( ); // each thread calls this double usec = 1000000. * ( time1 - time0 ); ComdpuotuerbGlerampheicgs aNodesPerSecond = (float)NUM_TIME_STEPS * (float)NUMN / usec; mjb – March 15, 2020 DoAllWork( ), I DoAllWork( int me ) { // what range of the global Temps array this thread is responsible for: int first = me * NUM_NODES_PER_THREAD; int last = first + ( NUM_NODES_PER_THREAD - 1 ); for( int step = 0; step < NUM_TIME_STEPS; step++ ) // first element on the left: { float left = 0.; if( me != 0 ) left = Temps[Now][first-1]; float dtemp = ( ( K / (RHO*C) ) * ( left - 2.*Temps[Now][first] + Temps[Now][first+1] ) / ( DELTA*DELTA ) ) * DT; Temps[Next][first] = Temps[Now][first] + dtemp; // all the nodes in between: for( int i = first+1; i <= last-1; i++ ) { float dtemp = ( ( K / (RHO*C) ) * ( Temps[Now][i-1] - 2.*Temps[Now][ i ] + Temps[Now][i+1] ) / ( DELTA*DELTA ) ) * DT; Temps[Next][ i ] = Temps[Now][ i ] + dtemp; Computer Graphics What happens if two cores are writing to the same cache line? False Sharing! mjb – March 15, 2020 DoAllWork( ), II // last element on the right: { float right = 0.; if( me != NUMT-1 ) right = Temps[Now][last+1]; float dtemp = ( ( K / (RHO*C) ) * ( Temps[Now][last-1] - 2.*Temps[Now][last] + right ) / ( DELTA*DELTA ) ) * DT; Temps[Next][last] = Temps[Now][last] + dtemp; // all threads need to wait here so that all Temps[Next][*] values are filled: #pragma omp barrier // want just one thread swapping the definitions of Now and Next: #pragma omp single Now = Next; Next = 1 - Next; } // implied barrier exists here: } //for(intstep=... } Because each core is working from left to right across the data, I am guessing that there is little cache line conflict. What happens if two cores are writing to the same cache line? False Sharing! Computer Graphics mjb – March 15, 2020 Allocate as Separate Thread-Local (private) Sub-arrays Ti-1 Ti Ti+1 Core #1 Core #2 Core #3 We could make each sub-array a thread-local (i.e., private) variable. This would put each sub- array on each thread’s individual stack. The strategy is now to read from the single large global array and compute into each thread’s local array. When we are done, copy each local array into the global array. Computer Graphics mjb – March 15, 2020 Allocate as Separate Thread-Local (private) Sub-arrays float nextTemps[NUM_NODES_PER_THREAD]; for( int i = 0; i < NUM_NODES_PER_THREAD; i++ ) nextTemps[ i ] = Temps[first+i]; // read from Temps[ ], write into nextTemps[ ] for( int steps = 0; steps < NUM_TIME_STEPS; steps++ ) { // all the other nodes in between: for( int i = 1; i < NUM_NODES_PER_THREAD-1; i++ ) { float dtemp = ( ( K / (RHO*C) ) * ( Temps[first+i-1] - 2.*Temps[first+i] + Temps[first+i+1] ) / ( DELTA*DELTA ) ) * DT; nextTemps[ i ] = Temps[first+i] + dtemp; // don't update the global Temps[ ] until they are no longer being used: #pragma omp barrier // update the global Temps[ ]: for( int i = 0; i < NUM_NODES_PER_THREAD-1; i++ ) { Temps[first+i] = nextTemps[ i ]; } // be sure all global Temps[ ] are updated: #pragma omp barrier Compu}ter//Gforarp(hincst steps = 0; ... mjb – March 15, 2020 Allocate as Separate Thread-Global-Heap Sub-arrays Ti-1 Ti Ti+1 Core #1 Core #2 Core #3 We could make each sub-array a thread-heap (also private) variable. This would put each sub- array on the heap. The strategy is now to read from the single large global array and compute into each thread’s heap array. When we are done, copy each heap array into the global array. Computer Graphics mjb – March 15, 2020 Allocate as Separate Thread-Global-Heap Sub-arrays float *nextTemps = new float [NUM_NODES_PER_THREAD]; for( int i = 0; i < NUM_NODES_PER_THREAD; i++ ) nextTemps[ i ] = Temps[first+i]; // read from Temps[ ], write into nextTemps[ ] for( int steps = 0; steps < NUM_TIME_STEPS; steps++ ) { // all the other nodes in between: for( int i = 1; i < NUM_NODES_PER_THREAD-1; i++ ) { float dtemp = ( ( K / (RHO*C) ) * ( Temps[first+i-1] - 2.*Temps[first+i] + Temps[first+i+1] ) / ( DELTA*DELTA ) ) * DT; nextTemps[ i ] = Temps[first+i] + dtemp; // don't update the global Temps[ ] until they are no longer being used: #pragma omp barrier // update the global Temps[ ]: for( int i = 0; i < NUM_NODES_PER_THREAD-1; i++ ) { Temps[first+i] = nextTemps[ i ]; } // be sure all global Temps[ ] are updated: #pragma omp barrier Compu}ter//Gforarp(hincst steps = 0; ... mjb – March 15, 2020 Local Global Storage Strategy and # of Number of Nodes Threads Computer Graphics mjb – March 15, 2020 Performance as a Function of Number of Nodes 15 Computer Graphics # of Nodes to Compute # of Threads mjb – March 15, 2020 Performance as a Function of Number of Threads Computer Graphics # of Threads # of Nodes mjb – March 15, 2020 CT k(2T 2T) 2D Heat Transfer Equation  k Ti1,j 2Ti,j Ti1,j Ti,j  Ti,j1 2Ti,j Ti,j1  T k 2T 2T  C  x2  tC(x2 y2) Computer Graphics mjb – March 15, 2020 T 2T 2T 2T C t k(x2  y2  z2 ) T k 2T 2T 2T t C(x2 y2 z2 ) Computer Graphics  k Ti1,j,k 2Ti,j,k Ti1,j,k Ti,j,k  2 Ti,j1,k 2Ti,j,k Ti,j1,k 2 3D Heat Transfer Equation Ti,j,k1 2Ti,j,k Ti,j,k1   C  x mjb – March 15, 2020 MegaNodes Computed Per Second MegaNodes Computed Per Second MegaNodes Computed Per Second 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com