CS计算机代考程序代写 compiler cuda GPU Fortran ER Microsoft PowerPoint – COMP528 HAL24 Directives for Accelerators.pptx

Microsoft PowerPoint – COMP528 HAL24 Directives for Accelerators.pptx

Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528

COMP528: Multi-core and
Multi-Processor Programming

24 – HAL

DIRECTIVES FOR ACCELERATORS
(AN OVERVIEW)

OpenMP for Accelerators
Further Reading

“Accelerators” ??

• GPU or GPGPU
– [general purpose] graphics processing unit

– good to accelerate vector-like operations

– GPU typically sits at far end of PCIe to the CPU

• FPGA
– field programmable gate array

– programmable (& run-time re-configurable) logic gates

– harder to program but good acceleration for streaming like operations

– FPGA usually sits at far end of PCIe to the CPU

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

Programming Model

• some code on host (the CPU)

• “offload” a “kernel” to the “accelerator”
– offloading possible (in theory) via OpenMP

– can also use
• OpenACC

• OpenCL

• CUDA proprietary, just for NVIDIA GPUs

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

Programming Model

• directives based:
– OpenMP

– OpenACC

• programming languages/ extensions
• OpenCL

• CUDA proprietary, just for NVIDIA GPUs

• OpenMP 4.0
– introduced “off-loading”

• running a region on another kit (potentially differ ISA)

– directives determine region and “target”

– directives also determine how to run in parallel on the target

– targets can be XeonPhi, GPU, FPGA

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

• OpenACC
– open accelerators

– initially project driven by Cray + CAPS + NVIDIA + PGI
• (NVIDIA later bought out PGI)

– directives to describe offloading, targets and how to use targets

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

OpenMP .v. OpenACC
OpenMP

• 1998 onwards

• offloading @v4 ~2013

• CPU & accelerator

• FORTRAN, C, C++, …

• prescriptive
– user explicitly specifics actions to be

undertaken by compiler

• slower uptake of new [accelerator]
ideas but generally

• maturity for CPU

OpenACC

• 2012 onwards

• offloading always

• CPU & accelerator

• FORTRAN, C, C++, …

• descriptive
– user describes (guides) compiler but

compiler makes decision how/if to do
parallelism

• generally more reactive to new ideas

• maturity for GPU

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

OpenMP .v. OpenACC
OpenMP

• 1998 onwards

• offloading @v4 ~20xx

• CPU & accelerator

• FORTRAN, C, C++, …

• prescriptive
– user explicitly specifics actions to be

undertaken by compiler

• slower uptake of new [accelerator]
ideas but generally

• maturity for CPU

OpenACC

• 2012 onwards

• offloading always

• CPU & accelerator

• FORTRAN, C, C++, …

• descriptive
– user describes (guides) compiler but

compiler makes decision how/if to do
parallelism

• generally more reactive to new ideas

• maturity for GPU

https://openmpcon.org/wp-content/uploads/openmpcon2015-james-beyer-comparison.pdf

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

Why Directives for GPUs?

• CUDA is only for NVIDIA GPUs
– lack of portability

– programming via calling a kernel explicitly,
plus function calls to handle data transfer & usage of memory

• Amount of coding
– one directive may have been several lines of CUDA

• Portability over different heterogeneous architectures
– CPU + NVIDIA GPU

– CPU + AMD GPU

– CPU + XeonPhi (RIP)

– CPU + FPGA (apparently)

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

In practice…

• OpenMP
– v4.0 introduced “offloading” to a “target” device (e.g. GPU)

– not widely supported (vendor wars)

Who supports What?

• Intel make CPUs (and Xeon Phi)
but not discrete [compute] GPUs (yet…)
– Intel compilers support OpenMP but not OpenACC

• NVIDIA (owners of PGI) make GPUs but not CPUs
– PGI compilers support OpenACC & (only recently) OpenMP

• Cray no longer make chips, more of an “integrator”
– Cray compilers support OpenMP & OpenACC

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

• OpenMP
– support for GPU is in OpenMP standard

– But not easy to find an implementation for given GPU

• OpenACC
– some implementations on GPUs available to use more readily

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o
l

OpenMP for Accelerators
Further Reading

• Resources will be added to CANVAS
– OpenMP example of Jacobi:

https://www.slideshare.net/jefflarkin/gtc16-s6510-targeting-gpus-
with-openmp-45