CS计算机代考程序代写 Fortran compiler cache OpenMP

OpenMP
CMPSC 450

OpenMP
• API for shared memory programming
• Joint effort by several compiler vendors to come up with a standard
o See openmp.org, began 1997
• Not a language: compiler directives, library • C, C++, Fortran bindings
• Chapters 6 and 7 of the textbook • Additional resources on Canvas
CMPSC 450

OpenMP API components
• Compiler directives #pragma omp parallel for #pragma omp critical
• Runtime library routines omp_get_num_threads() omp_get_wtime()
• Environment variables OMP_NUM_THREADS OMP_PROC_BIND
CMPSC 450

OpenMP general code structure
CMPSC 450

Parallel region construct
#pragma omp parallel [clause ….] if (scalar_expression)
default (shared | none)
private (list)
shared (list)
reduction (operator: list) num_threads (integer-expression)
{
}
STRUCTURED_BLOCK
CMPSC 450

Data scope (or data sharing) attribute clauses
• Most variables are shared by default
• Private: automatic variables inside parallel regions
• `Reduction’ clause: shared variable, per-thread private copies are created
• Additional attribute clauses
• Firstprivate – initialize variable from the serial part of code
• Lastprivate – thread that executes the ending loop index copies its value to the master (serial) thread
• Copyprivate
• Copyin
CMPSC 450

Hello World in OpenMP
#include
void main() {
#pragma omp parallel
{
int ID = 0;
printf(“ hello(%d) “, ID);
printf(“ world(%d) \n”, ID);
}
}
CMPSC 450

OpenMP fork-join model
CMPSC 450

OpenMP calculate pi
static long num_steps = 100000;
double step;
void main()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
for (i=0; i < num_steps; i++) { x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; } CMPSC 450 OpenMP calculate pi #include
#include
static long num_steps = 1000000000;
double step;
#define NUM_THREADS 3
// define some space between our summing variables to // alleviate false sharing
// get total number of threads since
// we aren’t guaranteed what we request
i_nthreads = omp_get_num_threads();
// limit writing to numThreads to a single thread if (i_ID == 0) i_numThreads = i_nthreads;
for (i = i_ID; i < num_steps; i+=i_nthreads) { #define SPAN int main(void) { int i_numThreads; double pi; 64 // step will be shared (read) among all threads step = 1.0 / (double) num_steps; omp_set_num_threads(NUM_THREADS); // sum will be shared among all threads // each thread will write into a separate array element double d_sum[NUM_THREADS][SPAN]; #pragma omp parallel { // declare some private variables int i, i_nthreads, i_ID; double x; // this will range from 0 to nThreads i_ID = omp_get_thread_num(); // finish the summation by summing the individual results of the threads for (int i = 0; i < i_numThreads; i++) pi += d_sum[i][0] * step; printf("PI = %0.9f\n", pi); } x = (i+0.5) * step; d_sum[i_ID][0] = d_sum[i_ID][0] + 4.0 / (1.0 + x*x); } } CMPSC 450 OpenMP barrier #pragma omp parallel { /* code */ #pragma omp barrier /* more code */ } • All threads must execute barrier simultaneously. Faster threads wait for slower threads here. CMPSC 450 Critical region #pragma omp parallel { /* code */ #pragma omp critical [name] STRUCTURED_BLOCK /* code */ } • Block executed by only one thread at a time • Name is a global identifier CMPSC 450 Critical region, example int x = 0; #pragma omp parallel num_threads(8) shared(x) { #pragma omp critical { x = x + 1; } } • Not very a very efficient example CMPSC 450 Example: summation • Summing ‘n’ values (32-bit signed integers) in an array A • Assume there’s no overflow, sum < 231 – 1. for (i=0; i