程序代写代做代考 c++ c/c++ Fortran compiler Microsoft PowerPoint – programmingmodel-1

Microsoft PowerPoint – programmingmodel-1

High Performance Computing
Course Notes

Shared Memory Parallel
Programming

Dr Ligang He

2Computer Science, University of Warwick

OpenMP

OpenMP stands for Open specification for Multi-
processing

used to assist compilers to understand and
parallelise the serial code

Can be used to specify shared memory parallelism in
Fortran, C and C++ programs

OpenMP is a specification for
 a set of compiler directives,
 RUN TIME library routines, and
 environment variables

3Computer Science, University of Warwick

History of OpenMP

 Started late 80s there was emergence of shared memory
parallel computers with proprietary directive-driven
programming environments

 Poor portability, OpenMP emerges as an industry standard

 OpenMP specifications include:

 OpenMP 1.0 for Fortran, 1997, OpenMP 1.0 for C/C++, 1998

 OpenMP 2.0 for Fortran, 2000, OpenMP 2.0 for C/C++ , 2002

 OpenMP 2.5 for C/C++ and Fortran, 2005

 OpenMP 3.0 for C/C++ and Fortran, 2008

 OpenMP 3.1, 2011

 OpenMP 4.0, 2013, OpenMP 4.5, 2015

 OpenMP Architecture Review Board: Compaq, HP, IBM,
Intel, SGI, SUN

4Computer Science, University of Warwick

An implementation of thread models

 Multiple threads running in parallel

Used for shared memory architecture

 Fork-join model

OpenMP programming model

5Computer Science, University of Warwick

How a new process is created

Use the fork function

All three segments and
the program counter are duplicated

6Computer Science, University of Warwick

How Threads are Created?

Only the stack segment and the program counter are duplicated

7Computer Science, University of Warwick

 Used to split a program into separate tasks, one per thread, that
can execute concurrently

 “Light weight process”: multiple threads exist within the context of a
single process, sharing the process’s code, global information,
other resources

 Threads usually communicate by processing shared global data
values

global shared space – global data accessed from single global
address space (heap) shared among the threads

local private space – each thread also has its own local private data
(stack) that is not shared

Threads

8Computer Science, University of Warwick

#include
main () {

int var1, var2, var3;
Serial code

/*Beginning of parallel section. Fork a team of threads. Specify variable scoping*/
#pragma omp parallel private(var1, var2) shared(var3)

{
Parallel section executed by all threads

All threads join master thread and disband

}
Resume serial code
}

OpenMP code structure in C

9Computer Science, University of Warwick

PROGRAM HELLO

INTEGER VAR1, VAR2, VAR3

Serial code …

!Beginning of parallel section. Fork a team of threads. Specify
variable scoping

!$OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3)

Parallel section executed by all threads

All threads join master thread and disband

!$OMP END PARALLEL

Resume serial code

END

OpenMP code structure in Fortran

10Computer Science, University of Warwick

OpenMP Directives Format

C/C++

Fortran

11Computer Science, University of Warwick

 OpenMP directives are ignored by compilers that don’t support
OpenMP. In this case, codes are run as serial codes

 Compiler directives used to specify

 sections of code that can be executed in parallel

 critical sections

 Scope of variables (private or shared)

 Mainly used to parallelize loops, e.g. separate threads to handle
separate iterations of the loop

 There is also a run-time library that has several useful routines
for checking the number of threads and number of processors,
changing the number of threads, etc

OpenMP features

12Computer Science, University of Warwick

Multiple threads are created using the parallel construct

For C and C++

#pragma omp parallel

{

… do stuff

}

For Fortran

!$OMP PARALLEL

… do stuff

!$OMP END PARALLEL

Fork-Join Model

13Computer Science, University of Warwick

The number of threads in a parallel region is
determined by the following factors, in order of
precedence:

Use of the omp_set_num_threads() library
function

Setting of the OMP_NUM_THREADS
environment variable

Implementation default – the number of CPUs
on a node

Threads are numbered from 0 (master thread) to N-
1

How many threads are generated

14Computer Science, University of Warwick

 Compiler directive specifies that loop can be
done in parallel

For C and C++:
#pragma omp parallel for
for (i=0;i++;i

main() {

int x;

x = 0;

#pragma omp parallel shared(x)

{

#pragma omp critical

x = x+1;

} /* end of parallel section */

}

Example of Critical Section

20Computer Science, University of Warwick

#include

#include

int main (int argc, char *argv[]) {

int th_id, nthreads;

#pragma omp parallel private(th_id)

{

th_id = omp_get_thread_num();

printf(“Hello World from thread %d\n”, th_id);

#pragma omp barrier

if ( th_id == 0 ) {

nthreads = omp_get_num_threads();

printf(“There are %d threads\n”,nthreads);

}

return 0;

}

Example of Barrier in OpenMP

21Computer Science, University of Warwick

 OpenMP Data Scope Attribute Clauses are used to
explicitly define how variables should be viewed by
threads

 These clauses are used in conjunction with several
directives (e.g. PARALLEL, DO/for) to control the
scoping of enclosed variables

 Three often encountered clauses:

 Shared

 Private

 Reduction

Data Scope Attributes in OpenMP

22Computer Science, University of Warwick

 private(var) creates a local copy of var for each thread

 shared(var) states that var is a global variable to be
shared among threads

Shared and private data in OpenMP

23Computer Science, University of Warwick

 Reduction –

 The exemplar op is add, logical OR (commutative
operations)

 A local copy of the variable is made for each thread

 The local values of the variable can be updated by the
threads.

 At the end of parallel region, the local values are combined
to create global value through Reduction operation

Reduction Clause

reduction (op : var)

24Computer Science, University of Warwick

An Example of Reduction Clause

double ZZ, res=0.0;

#pragma omp parallel for reduction (+:res) private(ZZ)

for (i=1;i<=N;i++) { ZZ = i; res = res + ZZ: } 25Computer Science, University of Warwick Can perform a variety of functions, including Query the number of threads/thread no. Set the number of threads to be generated Query the number of processors in the computer Changing the number of threads Run-Time Library Routines 26Computer Science, University of Warwick  query routines allow you to get the number of threads and the ID of a specific thread id = omp_get_thread_num(); //thread no. Nthreads = omp_get_num_threads(); //number of threads  Can specify number of threads at runtime omp_set_num_threads(Nthreads); Run-Time Library Routines 27Computer Science, University of Warwick Controlling the execution of parallel code Four environment variables OMP_SCHEDULE: how iterations of a loop are scheduled OMP_NUM_THREADS: maximum number of threads OMP_DYNAMIC: enable or disable dynamic adjustment of the number of threads OMP_NESTED: enable or disable nested parallelism Environment Variable 28Computer Science, University of Warwick Lab session today - Practice OpenMP - Download lab instructions and code from here: https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/cs402_seminar2_openmp.pdf https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/cs402_seminar2_code.zip - Move down to Lab 001 and 003 29Computer Science, University of Warwick Assignment 1 - OpenMP - Use OpenMP to parallelize the deqn code - The overall objective is to achieve good speedup - Write a report - Explain in detail what you did with the sequential code - benchmark the runtime of each relevant loop and the runtime of the whole parallel program against the number of threads; present the runtimes in graph or table; analyze the results - Discuss the iteration scheduling in your program - Analyze the overhead of OpenMP - Presentation skills, spelling, punctuation and grammar - Up to four A4 pages