Information Technology
FIT3143 – LECTURE WEEK 3
PARALLEL COMPUTING ON SHARED MEMORY WITH OPEN MP
Copyright By PowCoder代写 加微信 powcoder
1. OpenMP for shared memory parallel programming
Associated learning outcomes
• Explain the fundamental principles of parallel computing architectures and algorithms (LO1)
• Design and develop parallel algorithms for various parallel computing architectures (LO3)
FIT3143 Parallel Computing 2
1) Programming with OpenMP
– Thread serial code with basic OpenMP pragmas
– Use OpenMP synchronization pragmas to coordinate thread execution and memory access
FIT3143 Parallel Computing 3
What Is OpenMP (or Open Multi-Processing)?
❑ Compiler directives for multithreaded programming ❑ Easy to create threaded Fortran and C/C++ codes
❑ Supports data parallelism model
❑ Incremental parallelism
❑ Combines serial and parallel code in single source
❑ Incremental parallelism is when you can modify only a portion of the code to test for better performance in parallel, then move on to another position in the code to test. Explicit threading models require a change to all parts of the code affected by the threading.
❑ Serial code can be “retrieved” by not compiling with the OpenMP options turned on. Assuming that there were no drastic changes required to get the code into a state that could be parallelized.
FIT3143 Parallel Computing 4
OpenMP* Architecture
❑ Fork-join model // execution model of thread in OpenMP parallel regions
❑ Work-sharing constructs
❑ Data environment constructs
❑ Synchronization constructs
❑ Extensive Application Program Interface (API) for finer control
// for the pragmas and directives // for the pragmas and directives // for the pragmas and directives
// for the pragmas and directives
FIT3143 Parallel Computing 5
Programming Model
Fork-join parallelism:
❑ Master thread spawns a team of threads as needed
❑ Parallelism is added incrementally: the sequential program evolves into a parallel program
Applications starts execution in serial with Master Thread. At each parallel region encountered, threads are forked off, execute concurrently, and then join together at the end of the region.
Master Thread
Parallel Regions
FIT3143 Parallel Computing 6
OpenMP* Pragma Syntax
Most constructs in OpenMP* are compiler directives or pragmas. For C and C++, the pragmas take the form:
Parallel Regions
❑ Defines parallel region over structured block of code ❑ The ‘parallel’ pragma that defines a parallel region. ❑ Threads are created as ‘parallel’ pragma is crossed ❑ Threads block at end of region
❑ Data is shared among threads unless specified otherwise C/C++ :
#pragma omp parallel
#pragma omp construct [clause [clause]…]
#pragma omp parallel
❑ Pragma will operate over a single statement or block of statements enclosed within curly braces.
❑ Variables accessed within the parallel region are all shared by default.
FIT3143 Parallel Computing 7
How Many Threads?
Set environment variable for number of threads
There is no standard default for this variable ❑ Many systems:
❑ # of threads = # of processors ❑ Intel compilers use this default
set OMP_NUM_THREADS=4
The order in which the system will try to determine the number threads is 1. default
2. environment variable
3. API call
Each successive method (if present) will override the previous.
FIT3143 Parallel Computing 8
Parallel Code
#include
int main() {
#pragma omp parallel {
printf(“Hello World\n”);
#pragma omp for for(i=0;i<6;i++)
printf("Iter:%d\n",i); printf("GoodBye World\n");
Serial Code
#include
int main() { int i;
printf(“Hello World\n”);
for(i=0;i<6;i++) printf("Iter:%d\n",i);
printf("GoodBye World\n");
FIT3143 Parallel Computing 9
Work-sharing Construct - “for” work-sharing pragma
❑ Splits loop iterations into threads ❑ Must be in the parallel region
❑ Must precede the loop
#pragma omp parallel
#pragma omp for
for (i=0; i
FIT3143 Parallel Computing 30
OpenMP* API
Get the thread number within a team
Get the number of threads in a team
Usually not needed for OpenMP codes
❑ Can lead to code not being serially consistent ❑ Does have specific uses (debugging)
❑ Must include a header file
Emphasize that API calls are usually not needed. Assignment of loop iterations and other computations to threads is already built into OpenMP.
int omp_get_thread_num(void);
int omp_get_num_threads(void);
#include
FIT3143 Parallel Computing 31
– Basic architecture
– Parallelismwith#pragmaconstruct
– Criticalsectionandatomicity
– Additionalconstructs(reduction,sections,etc.)
FIT3143 Parallel Computing 32
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com