Intro to Parallel Computing
Topic 5 – Introduction to OpenMP
COSC 407: Intro to Parallel Computing
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Copyright By PowCoder代写 加微信 powcoder
Previously:
– Intro to parallel computing
• important concepts and terminology.
– Intro to POSIX Threads
• Basics, HelloWorld
• Distributing the work
• Example: Summing it all up (as array)
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
• Writing programs that use OpenMP.
• Exploiting a powerful OpenMP feature
• Parallelize serial for loops with only small changes to the source code.
• Using other OpenMP features:
• Task parallelism.
• Explicit thread synchronization.
• Addresses standard problems in shared- memory programming
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Shared Memory
§ OpenMP = Open Multi-Processing
§ An Application Program Interface (API) for multi-
threaded, shared-memory parallel programming.
– Designed for systems in which each thread or process can potentially have access to all available memory.
– System is viewed as a collection of cores or CPU’s, all of which have access to main memory.
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
OpenMP vs Pthread
– Higher level
– Programmer only states that a block of code is to be executed in parallel and delegates “how to” to the compiler & runtime
– Requires compiler support (some compilers cannot compile OpenMP programs)
Topic 5: Intro to OpenMP
§ Pthreads
– Lower level
– Requires programmer to explicitly specify behavior of each thread
– LIbrary of functions to be linked to a C program (can use any compiler)
COSC 407: Intro to Parallel Computing
Advantages of OpenMP
§ Good performance and scalability – If you do it right
§ Requires little programming effort (relatively!)
§ De-facto and mature standard
§ Portable program (large number of compilers)
§ Allows the program to be parallelized incrementally § Ideally suited for multicore
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Compiling and Running From the Command Line
gcc −g −Wall -fopenmp −o omp_hello omp_hello . c . /omp_hello
compiling running (# of threads set with OpenMP env var)
Hello from thread 0 of 4 Hello from thread 1 of 4 Hello from thread 2 of 4 Hello from thread 3 of 4
Topic 5: Intro to OpenMP
possible outputs
Hello from thread 1 of 4 Hello from thread 2 of 4 Hello from thread 0 of 4 Hello from thread 3 of 4
Hello from thread 3 of 4 Hello from thread 1 of 4 Hello from thread 2 of 4 Hello from thread 0 of 4
COSC 407: Intro to Parallel Computing
Compiling and Running with Eclipse
On Eclipse for Scientific Computing:
– MakesureyourMinGWisinstalledandproperlyconfigured. – Createyourcode:
• Create a new “C Project” > “OpenMP Empty C Project”
• Then create a new “Source File” – Buildyourproject
• I created a shortcut for that! – Runyourproject
• Default shortcut: Ctrl+F11
• Eclipse will take care of the flags with the command line.
§ Note that you must have the C compiler, openMP, and Eclipse installed properly (See Canvas)
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Compiling and Running with Eclipse
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Two ways to divide the work: 1. Task parallelism
• Partition various tasks carried out solving the problem among the cores.
1. Data parallelism
• Partition the data used in solving the problem among the cores.
• Each core carries out similar operations on it’s part of the data.
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
OpenMP API
§ OpenMP specifications:
– http://www.openmp.org/specifications
§ OpenMP is based on directives – Simplesyntax
– Risk:youmustunderstandexactlywhatthecompilerwill do, otherwise your program will not function as expected
OpenMP API has three components:
1. Compiler directives (#)
• e.g., #pragma omp parallel
2. Runtime library routines
• e.g.,omp_get_thread_num()
3. Environment variables
• E.g., setting an environment variable set OMP_NUM_THREADS=3
• Will rarely use or refer to them (as we can set number of threads directly in code)
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
OpenMP and Threads
§ OpenMP implements parallelism exclusively using threads § Remember Process vs Thread:
– Threadsexistwithinaprocess. – Execution:
• Both threads and processes are units of execution (tasks)
• Processeswill,bydefault,notsharememory.
• Threads of the same process will, by default, have
access to the same shared memory (the process memory)
– Light weight
Topic 5: Intro to OpenMP
COSC 407: Intro to Parallel Computing
Fork – Join Model
§ OpenMP uses Fork-Join model. Team of Threads
Team of Threads
0 Sequential ParallelRegion Region
Master ThreadSequential
ParallelRegion
Sequential Region
§ Synchronization means that everyone must wait till everyone is finished before proceeding to the next region.
§ The collection of threads executing the parallel block is called a team: the original thread is called the master, and the additional threads are called slaves
§ Each thread takes an ID (master always has ID 0)
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
OpenMP: pragmas
§ Special preprocessor instructions that are added to a system to allow behaviors that aren’t part of the basic C specification.
– Compilers that don’t support the pragmas ignore them.
• i.e. if the compiler does not support them, the program will
still yield correct behavior, but without any parallelism. § Syntax:
#pragma omp directive [clause[clause]..]
– Directive: specifies the required directive
• The most basic directive: #pragma omp parallel
– The parallel construct forms a team of threads and starts parallel execution.
– Clause: information to modify the directive.
• e.g.: #pragma omp parallel num_threads(10)
– Continuation: use \ in pragma
• e.g. #pragma omp parallel \
num_threads(10)
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
synchronization
synchronization
OpenMP pragmas
# pragma omp parallel
– Mostbasicparalleldirective.
– Thenumberofthreadsthatrunthefollowingstructured
block of code is determined by the run-time system.
– Thereisanimplicitbarrierattheendofthisdirective
• barriers are discussed later ….
# pragma omp parallel num_threads( thread_count )
– num_threadsclauseallowstheprogrammertospecifythe number of threads that should execute the following block
– Theremaybesystem-definedlimitationsonthenumberof threads that a program can start
– TheOpenMPstandarddoesn’tguaranteethatthiswill actually start thread_count threads
– Mostcurrentsystemscanstarthundredsoreventhousands of threads Unless we’re trying to start a lot of threads, we will almost always get the desired number of threads
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Hello World Serial Version
#include
int main() {
printf(“Hello World!\n”);
return 0; }
Topic 5: Intro to OpenMP
COSC 407: Intro to Parallel Computing
Hello World!
Hello World Parallel Version
#include
int main() {
#pragma omp parallel
printf(“Hello World!\n”);
return 0; }
Topic 5: Intro to OpenMP
Parallel Region.
Sent to all threads!
COSC 407: Intro to Parallel Computing
Hello World!
Hello World!
Hello World!
Hello World!
Both Serial and Parallel in One Program
#include
#include
int main() {
printf(“Hello Sequential!\n”);
#pragma omp parallel
printf(“Hello Parallel!\n”);
Sequential : Only performed by master!
Parallel: SAME task sent to all threads
Sequential
COSC 407: Intro to Parallel Computing
printf(“Hello Parallel!\n”);
printf(“Hello Parallel!\n”);
printf(“Hello Parallel!\n”);
printf(“Hello Parallel!\n”);
printf(“Hello Sequential!\n”);
Topic 5: Intro to OpenMP
Hello Sequential! Hello Parallel! Hello Parallel! Hello Parallel! Hello Parallel!
Hello World
Parallel Version (again)
#include
#include
int main() {
#pragma omp parallel
Printing the thread ID
int my_id = omp_get_thread_num();
printf(“Hello World from thread %d\n”, my_id);
return 0; }
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Possible Output:
Hello World from thread 2
Hello World from thread 0
Hello World from thread 1
Hello World from thread 3
Use curly braces when having more than one statement
Hello World
Parallel Version (again), cont’d
#include
#include
int main() {
#pragma omp parallel
Printing the thread ID and total num of threads
int my_id = omp_get_thread_num();
int tot = omp_get_num_threads();
printf(“Hello World from thread %d/%d\n”, my_id, tot);
return 0; }
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Possible Output:
Hello World from thread 1/4
Hello World from thread 0/4
Hello World from thread 2/4
Hello World from thread 3/4
Hello World
Parallel Version (again), cont’d
Specifying
required # of
int my_id = omp_get_thread_num();
int tot = omp_get_num_threads();
printf(“Hello World from thread %d/%d\n”, my_id, tot);
return 0; }
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
#include
#include
int main() {
#pragma omp parallel num_threads(3) {
Possible Output:
Hello World from thread 2/3
Hello World from thread 0/3
Hello World from thread 1/3
Hello World
Parallel Version (again), cont’d
#include
#include
void Hello();
int main() {
#pragma omp parallel num_threads(3)
void Hello(){
Using a function in the parallel region
int my_id = omp_get_thread_num();
int count = omp_get_num_threads();
printf(“Hello World from thread %d/%d\n”, my_id,count);
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Possible Output:
Hello World from thread 2/3
Hello World from thread 0/3
Hello World from thread 1/3
Interleaved Execution
#pragma omp parallel num_threads(4)
printf(“T%d:A \n”, omp_get_thread_num());
printf(“T%d:B \n”, omp_get_thread_num());
printf(“T%d:Done \n”, omp_get_thread_num());
• Output is interleaved, BUT any thread has to print A first then B.
• “Done” is only printed after everyone is done (synchronization)
Topic 5: Intro to OpenMP
possible outputs
T0:A T1:A T1:A T2:A
T0:B T2:A T0:A T2:B
T1:A T1:B T1:B T1:A
T1:B T2:B T0:B T1:B
T2:A T0:A T3:A T0:A
T2:B T0:B T3:B T0:B
T3:A T3:A T2:A T3:A
T3:B T3:B T2:B T3:B
T0:Done T0:Done T0:Done T0:Done
COSC 407: Intro to Parallel Computing
Distributing Tasks
§ You can use Thread ID to assign tasks to different threads
#pragma omp parallel num_threads(4)
{ int id = omp_get_thread_num();
printf(“T%d:A\n”, id);
printf(“T%d:B\n”, id);
if(id == 2)
printf(“T2:special task\n”);
printf(“End”);
Topic 5: Intro to OpenMP
Possible output
T2:special task
• Only T2 runs the special task
• No specific order for the threads, BUT the
statements in the same thread are ordered
COSC 407: Intro to Parallel Computing
Possible output
T2:special task
Distributing Data
§ You can also use Thread ID to distribute data over different threads
int list[6] = {0,1,2,3,4,5}; #pragma omp parallel num_threads(2) {
int id = omp_get_thread_num();
int my_a = id * 3;
int my_b = id * 3 + 3;
printf(“T%d will process indexes %d to %d\n”,id,my_a,my_b-1); for (int i = my_a; i < my_b; i++)
printf("T%d:processing list[%d]\n",id,i); }
printf("End"); return 0;
Topic 5: Intro to OpenMP
COSC 407: Intro to Parallel Computing
Possible Output 1:
T0 will process indexes 0 to 2 T0:processing list[0] T0:processing list[1] T0:processing list[2]
T1 will process indexes 3 to 5 T1:processing list[3] T1:processing list[4]
T1:processing list[5] End
Possible Output 2:
T0 will process indexes 0 to 2 T1 will process indexes 3 to 5 T0:processing list[0] T1:processing list[3] T0:processing list[1] T1:processing list[4] T0:processing list[2]
T1:processing list[5] End
A Toy Problem
§ The aim is to compute the sum of values in an array
– While this is a straight-forward problem, it highlights the key things that need to be considered when dealing with parallelization
§ One way to do this is to divide the array of values into into a series of sub-arrays
– Find the sum of each
– Then sum totals from sub-array
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
The Serial Algorithm
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Dividing the Work!
01234012345 01234012345
One thread does all the work
Get some help!
divide array into n sub-arrays and compute sum
COSC 407: Intro to Parallel Computing
Topic 5: Intro to OpenMP
Sum Calculation: Parallel Version
Two types of tasks:
a) Computation of the sum of sub-arrays
b) Adding the partial sums to compute total sum
There is no communication among the tasks in 1(a) (assuming that each thread can access the data separately) but tasks communicates in 1(b) (bringing results back together)
– Potential issues?
We want to assign a single thread to each core
– There could be more sub-arrays than physical cores....
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Sum Calculation:
Parallel Version, cont’d
§ Manually divide the work among threads
– Usethe“threadID”and“threadcount”to:
• Compute local values for your calculations
• Make decisions about which thread executes code • In this example,
– Use “thread count” to calculate number of array slices
– Use thread ID to determine start and end of each sub-array processed by that thread
– Ensurenotwothreadsperformthesamecalculationstwice
§ Use a private (local) variable to hold the local results and eventually either return them or added them to a global variable
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Serial Code
Time: 1.735000 ms n = 1000000
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
To measure
execution time
Parallel Code
Time: 0.564000 ms n = 1000000
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Parallel Code cont.
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
What if the Compiler Doesn’t Support OpenMP?
#ifdef _OPENMP
# include
# ifdef _OPENMP
int my_id = omp_get_thread_num ( );
int thread_count = omp_get_num_threads ( );
int my_id = 0;
int thread_count = 1;
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
Conclusion/Up Next
§ What we covered today (review key concepts): – IntrotoOpenMP
• Basics, HelloWorld
• Distributing the work
• Example: Summing in up
– MutualExclusion(critical,atomic,locks)
– Variablescope(shared,private,firstprivate) – Reduction
– Synchronization(barriers,nowait)
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
§ Please review
– OpenMP Resources (See week three module) – Additional resources on Canvas
– Run the sample code and try the challenge
• You need to be able to run and understand how to approach a problem
Topic 5: Intro to OpenMP COSC 407: Intro to Parallel Computing
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com