OpenMP
CMPSC 450
OpenMP
• API for shared memory programming
• Joint effort by several compiler vendors to come up with a standard
o See openmp.org, began 1997
• Not a language: compiler directives, library • C, C++, Fortran bindings
• Chapters 6 and 7 of the textbook • Additional resources on Canvas
CMPSC 450
OpenMP API components
• Compiler directives #pragma omp parallel for #pragma omp critical
• Runtime library routines omp_get_num_threads() omp_get_wtime()
• Environment variables OMP_NUM_THREADS OMP_PROC_BIND
CMPSC 450
OpenMP general code structure
CMPSC 450
Parallel region construct
#pragma omp parallel [clause ….] if (scalar_expression)
default (shared | none)
private (list)
shared (list)
reduction (operator: list) num_threads (integer-expression)
{
}
STRUCTURED_BLOCK
CMPSC 450
Data scope (or data sharing) attribute clauses
• Most variables are shared by default
• Private: automatic variables inside parallel regions
• `Reduction’ clause: shared variable, per-thread private copies are created
• Additional attribute clauses
• Firstprivate – initialize variable from the serial part of code
• Lastprivate – thread that executes the ending loop index copies its value to the master (serial) thread
• Copyprivate
• Copyin
CMPSC 450
Hello World in OpenMP
#include
void main() {
#pragma omp parallel
{
int ID = 0;
printf(“ hello(%d) “, ID);
printf(“ world(%d) \n”, ID);
}
}
CMPSC 450
OpenMP fork-join model
CMPSC 450
OpenMP calculate pi
static long num_steps = 100000;
double step;
void main()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
for (i=0; i < num_steps; i++)
{
x = (i+0.5)*step;
sum = sum + 4.0/(1.0+x*x);
}
pi = step * sum;
}
CMPSC 450
OpenMP calculate pi
#include
#include
static long num_steps = 1000000000;
double step;
#define NUM_THREADS 3
// define some space between our summing variables to // alleviate false sharing
// get total number of threads since
// we aren’t guaranteed what we request
i_nthreads = omp_get_num_threads();
// limit writing to numThreads to a single thread if (i_ID == 0) i_numThreads = i_nthreads;
for (i = i_ID; i < num_steps; i+=i_nthreads)
{
#define SPAN
int main(void)
{
int i_numThreads;
double pi;
64
// step will be shared (read) among all threads step = 1.0 / (double) num_steps;
omp_set_num_threads(NUM_THREADS);
// sum will be shared among all threads
// each thread will write into a separate array element double d_sum[NUM_THREADS][SPAN];
#pragma omp parallel
{
// declare some private variables
int i, i_nthreads, i_ID;
double x;
// this will range from 0 to nThreads
i_ID = omp_get_thread_num();
// finish the summation by summing the individual results of the threads
for (int i = 0; i < i_numThreads; i++)
pi += d_sum[i][0] * step;
printf("PI = %0.9f\n", pi);
}
x = (i+0.5) * step;
d_sum[i_ID][0] = d_sum[i_ID][0] + 4.0 / (1.0 +
x*x);
}
}
CMPSC 450
OpenMP barrier
#pragma omp parallel
{
/* code */
#pragma omp barrier
/* more code */
}
• All threads must execute barrier simultaneously. Faster threads wait for slower threads here.
CMPSC 450
Critical region
#pragma omp parallel
{
/* code */
#pragma omp critical [name]
STRUCTURED_BLOCK
/* code */ }
• Block executed by only one thread at a time • Name is a global identifier
CMPSC 450
Critical region, example
int x = 0;
#pragma omp parallel num_threads(8) shared(x)
{
#pragma omp critical
{
x = x + 1; }
}
• Not very a very efficient example
CMPSC 450
Example: summation
• Summing ‘n’ values (32-bit signed integers) in an array A
• Assume there’s no overflow, sum < 231 – 1.
for (i=0; i