程序代写代做代考 compiler algorithm Take-home Test

Take-home Test

Take-home Test

What needs to be done?

• Take the sequential program in tempsim.c and write an OpenMP
version and an MPI version
• You will write a short report on these (described later)
• Run on 1, 2, 4 and 8 cores and processes.
• Use a single node to run these – this will give you faster turn-around
• Size your data so that the 8 core version takes less than 20 seconds to run,

and is long enough you see a difference in times over different runs. We
don’t need to run for long periods of time for this.
• Show a speedup on some number of processors. You may increase the

problem size, if necessary.
• A hint: printf(“after h init\n”); fflush(stdout );

The program (in simtemp.c)
#include
#include
#include
#include

int main (void) {

int T = 100; // number of timesteps
int L = 10000; // length of the strip
int H = 100; // height of the strip
double strip[H][L]; // array holding temperature of the strip.

int l, h, t;
double avg = 0.0;

double exectime = -omp_get_wtime();

// init the strip. The left and right sides are -30C, the top and
bottom edges are 100C.

for (h = 0; h < H; h++) { strip[h][0] = -30.00; strip[h][L-1] = -30.00; } for (l = 1; l < L-1; l++) { strip[0][l] = 100.00; strip[H-1][l] = 100.00; } for (h = 1; h < H-1; h++) for (l = 1; l < L-1; l++) strip[h][l] = 0.0; for (t = 0;t < T; t++) // printf("in t=%d\n",t); fflush(stdout ); for (h = 1; h < H-1; h++) for (l = 1; l < L-1; l++) strip[h][l] = (strip[h][l] + (strip[h-1][l] + strip[h+1][l] + strip[h][l-1] + strip[h][l+1])/4.0)/2.0; for (h = 1; h < H-1; h++) for (l = 1; l < L-1; l++) avg += strip[h][l]; // printf("after avg loop\n"); fflush(stdout ); avg = avg / ((H-2)*(L-2)); exectime = exectime + omp_get_wtime(); printf("average temperature of the strip = %lf, time = %lf \n", avg, exectime); return 0; } The MPI version • Convert simtemp.c to MPI, and time it. • The t loop should not be parallelized • Leave the timing information where it is. All I/O should be outside of timing loops. • Use reductions where possible. • Time this for 1, 2, 4, 8 and 16 processes using a single node. • If the strip array is distributed across processes, we need to have halo regions • This is because each process will need to use data from the adjacent process to compute the temperatures at the edges of the part of the strip stored on a process. P0 P1 P2 Pn-1. . . 0 9 10 19 20 29 L-10 L-1 0 H-1 Shared data Patterned data is shared between adjacent processes. The red top and bottom are initialized to 100, the blue sides are initialized to -30. Neither of these are updated. Three solutions 1. Add buffer areas to the array based on the number of cores. Start up tasks to operate on each region. I.e., use OpenMP tasks like MPI processes 2. Use a “red-black” algorithm on the strip. First we do the red regions, then the black. This eliminates races. Make sure you create enough iterations in your parallel loop to make it run in parallel. this And this in parallel 2. Use a “red-black” algorithm on the strip. First we do the red regions, then the black. This eliminates races. Then this And this in parallel 3. Use a fine-grained red-black algorithm. Do all of the points corresponding to red boxes in parallel, and then all of the points corresponding to black boxes. This is the most standard on shared memory machines. The OMP program • Convert simtemp.c to openMP, and time it. • The t loop should not be parallelized • Leave the timing information where it is. All I/O should be outside of timing loops. • Use reductions where possible. • Use SIMD and parallel for simd where appropriate. • If the Intel compiler on scholar does not support SIMD, let me know. • Time this for 1, 2, 4, 8 and 16 threads using a single node. • Note that the same issues as we had with MPI and halo regions exist here. • Updating and reading border values can lead to races, slow convergence, etc. What to turn in • Your code • A report • Should have a table comparing OpenMP and MPI run times • A high level description of how you distributed your data in MPI • You should have some speedup • You can use scanned hand drawn pictures. Life is too short to do powerpoint for a take-home exam. • Turn it in to blackboard as a zip file of the directory containing your code and report. The directory should be named , and
the zip file should be .zip.

Take-home Test
What needs to be done?
The program (in simtemp.c)
The MPI version
Slide Number 5
Shared data
Three solutions
Slide Number 11
Slide Number 12
Slide Number 13
The OMP program
What to turn in