CS计算机代考程序代写 chain compiler cuda GPU Fortran Microsoft PowerPoint – COMP528 HAL26 OpenMP for GPUs, perhaps.pptx

Microsoft PowerPoint – COMP528 HAL26 OpenMP for GPUs, perhaps.pptx

Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528

COMP528: Multi-core and
Multi-Processor Programming

26 – HAL

OpenMP for GPUs

Directives for accelerators

Programming Model

• some code on host (the CPU)

• “offload” a “kernel” to the “accelerator”
– offloading possible (in theory) via OpenMP

– can also use
• OpenACC

• OpenCL

• CUDA proprietary, just for NVIDIA GPUs

Why Directives for GPUs?

• CUDA is only for NVIDIA GPUs
– lack of portability

– programming via calling a kernel explicitly,
plus function calls to handle data transfer & usage of memory

• Amount of coding
– one directive may have been several lines of CUDA

• Portability over different heterogeneous architectures
– CPU + NVIDIA GPU

– CPU + AMD GPU

– CPU + XeonPhi (RIP)

– CPU + FPGA (apparently)

Programming GPUs

CUDA

• proprietary

• NVIDIA only GPUs

• non-portable

• performant

Directives

• portable
– in theory?

• less coding

• maybe not so performant

• (some extensions to parallelism on
CPUs, Xeon Phis, FPGAs)

COMP328/COMP528 (c) mkbane, Univ of
Liverpool

OpenMP .v. OpenACC
OpenMP

• 1998 onwards

• offloading @v4 ~20xx

• CPU & accelerator

• FORTRAN, C, C++, …

• prescriptive
– user explicitly specifics actions to be

undertaken by compiler

• slower uptake of new [accelerator]
ideas but generally

• maturity for CPU

OpenACC

• 2012 onwards

• offloading always

• CPU & accelerator

• FORTRAN, C, C++, …

• descriptive
– user describes (guides) compiler but

compiler makes decision how/if to do
parallelism

• generally more reactive to new ideas

• maturity for GPU

https://openmpcon.org/wp-content/uploads/openmpcon2015-james-beyer-comparison.pdf

• OpenACC

• OpenMP 4.x (and 5.0…)

• OpenCL

OpenMP .v. OpenACC
OpenMP

• 1998 onwards

• offloading @v4 ~20xx

• CPU & accelerator

• FORTRAN, C, C++, …

• prescriptive
– user explicitly specifics actions to be

undertaken by compiler

• slower uptake of new [accelerator]
ideas but generally

• maturity for CPU

OpenACC

• 2012 onwards

• offloading always

• CPU & accelerator

• FORTRAN, C, C++, …

• descriptive
– user describes (guides) compiler but

compiler makes decision how/if to do
parallelism

• generally more reactive to new ideas

• maturity for GPU

https://openmpcon.org/wp-content/uploads/openmpcon2015-james-beyer-comparison.pdf

OpenMP for Accelerators

• #pragma omp target
– defines region of code is to be off-loaded to the target (GPU)

– Then need to say what happens within that region of code e.g.

• #pragma omp parallel for
– on CPU: creates threads & spreads iterations over threads

– within ‘target’: runs using threads of GPU

Target Clauses

• device (N)
– run on device #N

• map(A,B)
– ensure A, B vars available on target device

• map(tofrom: C)
– copy C to device, run region on device, copy C back

GPU threads != CPU threads

• OpenMP designed around CPU threads
– high cost of set-up and of synchronisation

• GPUs
– light weight threads, very low cost of switching

– “thread blocks”

• SO… “teams” directive of OpenMP re GPU

COMP328/COMP528 (c) mkbane, Univ of
Liverpool

CHADWICK

COMP328/COMP528
(c) mkbane, Univ of Liverpool

OpenACC

• parallel | kernels

• copy

• copyin

• copyout

• create

• delete

OpenMP

• target / teams / parallel

• map(inout:…)

• map(in:…)

• map(out:…)

• map(alloc:…)

• map(release:…) / map(delete:…)

OpenMP – good for GPUs?

• syntax via examples

• what is in which version

• which compilers support which version

OpenMP versions / GPU support

• v4.0 (2013): support offloading
– Intel v15 & v16

– GCC v4.9.0

• v4.5 (2015): improved support for offloading targets
– Intel v17 onwards

– GCC v6.1 onwards

– Cray CCE 8.7 onwards

COMP328/COMP528
(c) mkbane, Univ of Liverpool

Who supports What?

• Intel make CPUs (and Xeon Phi) but not discrete [compute]
GPUs
– Intel compilers support OpenMP but not OpenACC

• NVIDIA (owners of PGI) make GPUs but not CPUs
– PGI compilers support OpenACC

& (only recently) OpenMP (but only CPU ‘target’)

• Cray no longer make chips, more of an “integrator”
– Cray compilers support OpenMP & OpenACC

COMP328/COMP528
(c) mkbane, Univ of Liverpool

• There are some options to DIY extend LLVM/clang
– IBM compilers

• GNU ?
– Possible in v10 (not avail on Barkla)

– Can do a ‘chain’ of builds within v6 onwards
(if various support commands also avail)
https://gcc.gnu.org/wiki/Offloading

COMP328/COMP528
(c) mkbane, Univ of Liverpool

Conclusion?

• OpenMP for accelerators
– … limited support (Intel for Intel Xeon Phi)

– … clang/LLVM (handbuild or via IBM) for GPUs…
not the most straight forward

• Use OpenACC for ease!

COMP328/COMP528
(c) mkbane, Univ of Liverpool

OpenMP for Accelerators: Further Reading

– OpenMP example of Jacobi:

– OpenMP user group presentation (2020):
openmp-for-cs/slides/pdfs at openmp-ug-2020 · UoB-HPC/openmp-for-cs · GitHub

Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane