CS代考 Introduction

Introduction

Introduction to OpenMP
(Originally for CS 838, Wisconsin-Madison)
Shuaiwen Leon Song
Slides are derived from online references of
National Laboratory, National Energy Research Scientific Computing Center, University of Minnesota, OpenMP.org

*
(C) 2006

*
Introduction to OpenMP
What is OpenMP?
Open specification for Multi-Processing
“Standard” API for defining multi-threaded shared-memory programs
openmp.org – Talks, examples, forums, etc.
computing.llnl.gov/tutorials/openMP/
portal.xsede.org/online-training
www.nersc.gov/assets/Uploads/XE62011OpenMP.pdf

High-level API
Preprocessor (compiler) directives ( ~ 80% )
Library Calls ( ~ 19% )
Environment Variables ( ~ 1% )

CS267 Lecture 2
*

*
(C) 2006

*
A Programmer’s View of OpenMP
OpenMP is a portable, threaded, shared-memory programming specification with “light” syntax
Exact behavior depends on OpenMP implementation!
Requires compiler support (C, C++ or Fortran)

OpenMP will:
Allow a programmer to separate a program into serial regions and parallel regions, rather than T concurrently-executing threads.
Hide stack management
Provide synchronization constructs

OpenMP will not:
Parallelize automatically
Guarantee speedup
Provide freedom from data races

CS267 Lecture 2
*

*
(C) 2006

CS Architecture Seminar
Outline
Introduction
Motivating example
Parallel Programming is Hard

OpenMP Programming Model
Easier than PThreads

Microbenchmark Performance Comparison
vs. PThreads

Discussion
specOMP

CS Architecture Seminar

*
(C) 2006

CS Architecture Seminar
Current Parallel Programming
Start with a parallel algorithm
Implement, keeping in mind:
Data races
Synchronization
Threading Syntax
Test & Debug
Debug
Debug

CS Architecture Seminar

*
(C) 2006

CS Architecture Seminar
Motivation – Threading Library

void* SayHello(void *foo) {
printf( “Hello, world!\n” );
return NULL;
}

int main() {
pthread_attr_t attr;
pthread_t threads[16];
int tn;
pthread_attr_init(&attr);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
for(tn=0; tn<16; tn++) { pthread_create(&threads[tn], &attr, SayHello, NULL); } for(tn=0; tn<16 ; tn++) { pthread_join(threads[tn], NULL); } return 0; } CS Architecture Seminar * (C) 2006 CS Architecture Seminar Motivation Thread libraries are hard to use P-Threads/Solaris threads have many library calls for initialization, synchronization, thread creation, condition variables, etc. Programmer must code with multiple threads in mind Synchronization between threads introduces a new dimension of program correctness CS Architecture Seminar * (C) 2006 CS Architecture Seminar Motivation Wouldn’t it be nice to write serial programs and somehow parallelize them “automatically”? OpenMP can parallelize many serial programs with relatively few annotations that specify parallelism and independence OpenMP is a small API that hides cumbersome threading calls with simpler directives CS Architecture Seminar * (C) 2006 CS Architecture Seminar Better Parallel Programming Start with some algorithm Embarrassing parallelism is helpful, but not necessary Implement serially, ignoring: Data Races Synchronization Threading Syntax Test and Debug Automatically (magically?) parallelize Expect linear speedup CS Architecture Seminar * (C) 2006 CS Architecture Seminar Motivation – OpenMP int main() { // Do this part in parallel printf( "Hello, World!\n" ); return 0; } CS Architecture Seminar * (C) 2006 CS Architecture Seminar Motivation – OpenMP int main() { omp_set_num_threads(16); // Do this part in parallel #pragma omp parallel { printf( "Hello, World!\n" ); } return 0; } CS Architecture Seminar * (C) 2006 CS Architecture Seminar OpenMP Parallel Programming Start with a parallelizable algorithm Embarrassing parallelism is good, loop-level parallelism is necessary Implement serially, mostly ignoring: Data Races Synchronization Threading Syntax Test and Debug Annotate the code with parallelization (and synchronization) directives Hope for linear speedup Test and Debug CS Architecture Seminar * (C) 2006 LLNL OpenMP Tutorial No better materials then LLNL OpenMP tutorial. We are now going to go through some of the important knowledge points using the tutorial: https://computing.llnl.gov/tutorials/openMP/ * * (C) 2006 Programming Model * Because OpenMP is designed for shared memory parallel programming, it largely limited to single node parallelism. Typically, the number of processing elements (cores) on a node determine how much parallelism can be implemented. * (C) 2006 * Programming Model – Concurrent Loops OpenMP easily parallelizes loops Requires: No data dependencies (reads/write or write/write pairs) between iterations! Preprocessor calculates loop bounds for each thread directly from serial source for( i=0; i < 25; i++ ) { printf(“Foo”); } #pragma omp parallel for ? ? CS267 Lecture 2 * * (C) 2006 Motivation of using OpenMP * * (C) 2006 CS Architecture Seminar Programming Model - Threading Serial regions by default, annotate to create parallel regions Generic parallel regions Parallelized loops Sectioned parallel regions Thread-like Fork/Join model Arbitrary number of logical thread creation/ destruction events Fork Join CS Architecture Seminar * (C) 2006 CS Architecture Seminar Programming Model - Threading int main() { } // serial region printf(“Hello…”); // serial again printf(“!”); Fork Join // parallel region #pragma omp parallel { printf(“World”); } Hello…WorldWorldWorldWorld! CS Architecture Seminar * (C) 2006 CS Architecture Seminar Programming Model – Nested Threading Fork/Join can be nested Nesting complication handled “automagically” at compile-time Independent of the number of threads actually running Fork Join Fork Join CS Architecture Seminar * (C) 2006 CS Architecture Seminar Programming Model – Thread Identification Master Thread Thread with ID=0 Only thread that exists in sequential regions Depending on implementation, may have special purpose inside parallel regions Some special directives affect only the master thread (like master) Fork Join 0 0 1 2 3 4 5 6 7 0 CS Architecture Seminar * (C) 2006 CS Architecture Seminar Programming Model – Data/Control Parallelism Data parallelism Threads perform similar functions, guided by thread identifier Control parallelism Threads perform differing functions One thread for I/O, one for computation, etc… Fork Join CS Architecture Seminar * (C) 2006 CS Architecture Seminar Programming Model – Concurrent Loops OpenMP easily parallelizes loops No data dependencies between iterations! Preprocessor calculates loop bounds for each thread directly from serial source for( i=0; i < 25; i++ ) { printf(“Foo”); } #pragma omp parallel for ? ? CS Architecture Seminar * (C) 2006 CS Architecture Seminar Programming Model – Loop Scheduling schedule clause determines how loop iterations are divided among the thread team static([chunk]) divides iterations statically between threads Each thread receives [chunk] iterations, rounding as necessary to account for all iterations Default [chunk] is ceil( # iterations / # threads ) dynamic([chunk]) allocates [chunk] iterations per thread, allocating an additional [chunk] iterations when a thread finishes Forms a logical work queue, consisting of all loop iterations Default [chunk] is 1 guided([chunk]) allocates dynamically, but [chunk] is exponentially reduced with each allocation CS Architecture Seminar * (C) 2006 CS Architecture Seminar Programming Model – Loop Scheduling for( i=0; i<16; i++ ) { doIteration(i); } // Static Scheduling int chunk = 16/T; int base = tid * chunk; int bound = (tid+1)*chunk; for( i=base; i