The Open Computing Language (OpenCL)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Computer Graphics
Copyright By PowCoder代写 加微信 powcoder
opencl.pptx
mjb – March 10, 2022
• OpenCL consists of two parts: a C/C++-callable API and a C-ish programming language.
• The OpenCL programming language can run on NVIDIA GPUs, AMD GPUs, Intel CPUs, Intel GPUs, mobile devices, and (supposedly) FPGAs (Field-Programmable Gate Arrays).
• But, OpenCL is at its best on compute devices with large amounts of data parallelism, which usually implies GPU usage.
• You break your computational problem up into lots and lots of small pieces. Each piece gets farmed out to threads on the GPU.
• Each thread wakes up and is able to ask questions about where it lives in the entire collection of (thousands of) threads. From that, it can tell what it is supposed to be working on.
• OpenCL can share data, and interoperate, with OpenGL
• There is a JavaScript implementation of OpenCL, called WebCL
• There is a JavaScript implementation of OpenGL, called WebGL
• WebCL can share data, and interoperate, with WebGL
• The GPU does not have a stack, and so the OpenCL C-ish programming language cannot do
recursion and cannot make function calls. It also can’t use pointers.
Computer Graphics
mjb – March 10, 2022
The Khronos Group
http://www.khronos.org/opencl/ http://en.wikipedia.org/wiki/OpenCL
Who is Part of the Khronos Group?
Computer Graphics
mjb – March 10, 2022
Active OpenCL Members
Computer Graphics
mjb – March 10, 2022
Example of using OpenCL in a System-on-a-Chip: Qualcomm Node – Full Linux and OpenCL
Computer Graphics
mjb – March 10, 2022
The OpenCL Paradigm
C/C++ Program plus OpenCL
Host code OpenCL code
CPU binary on the Host
1. Run CPU code
OpenCL binary on the Device
5. Run CPU code
2. Send data to GPU 3. Run GPU kernel
4. Get data back from GPU
6. Send data to GPU 7. Run GPU kernel
8. Get data back from GPU
Computer Graphics
C/C++ Compiler and Linker
9. Run CPU code
OpenCL Compiler and Linker
mjb – March 10, 2022
If you were writing in C/C++, you would say:
If you were writing in OpenCL, you would say:
Computer Graphics
Think of this as having an implied for-loop around it, looping through all possible values of gid
OpenCL wants you to break the problem up into Pieces
ArrayMult( int n, float *a, float *b, float *c) {
for(inti=0; i
#include “cl.h”
// for timing
mjb – March 10, 2022
2. Allocate the Host Memory Buffers
This could have also been done like this: float hA[ NUM_ELEMENTS ];
Global memory and the heap typically have lots more space than the stack does. So, typically, you do not want to allocate a large array like this as a local variable.
(Here, it’s being done on the heap. It could also have been done in global memory.)
// allocate the host memory buffers:
float * hA = new float [ NUM_ELEMENTS ]; float * hB = new float [ NUM_ELEMENTS ]; float * hC = new float [ NUM_ELEMENTS ];
// fill the host memory buffers:
for( int i = 0; i < NUM_ELEMENTS; i++ ) {
hA[i]=hB[i]= sqrtf( (float)i );
// array size in bytes (will need this later):
size_t dataSize = NUM_ELEMENTS * sizeof( float );
// opencl function return status:
cl_int status; // test against CL_SUCCESS
Computer Graphics
mjb – March 10, 2022
3. Create an OpenCL Context
cl_context context = clCreateContext( NULL, 1, &device, NULL, NULL, &status );
properties
Pass in user data
the device
one device Callback
// create a context:
cl_context context = clCreateContext( NULL, 1, &device, NULL, NULL, &status );
returned status
Computer Graphics
mjb – March 10, 2022
4. Create an OpenCL Command Queue
// create a command queue:
cl_command_queue cmdQueue = clCreateCommandQueue( context, device, 0, &status );
the properties context
cl_command_queue cmdQueue = clCreateCommandQueue( context, device, 0, &status );
the device
returned status
Computer Graphics
mjb – March 10, 2022
5. Allocate the Device Memory Buffers
// allocate memory buffers on the device:
cl_mem dA = clCreateBuffer( context, CL_MEM_READ_ONLY, dataSize, NULL, &status ); cl_mem dB = clCreateBuffer( context, CL_MEM_READ_ONLY, dataSize, NULL, &status ); cl_mem dC = clCreateBuffer( context, CL_MEM_WRITE_ONLY, dataSize, NULL, &status );
how this buffer is restricted
cl_mem dA = clCreateBuffer( context, CL_MEM_READ_ONLY,
buffer data already allocated
dataSize, NULL, &status );
# bytes returned status
The read and write terminology is with respect to the OpenCL device.
So, CL_MEM_READ_ONLY means that the OpenCL device can only get this data – it can’t send it back to the host CPU. Other options are CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE.
Computer Graphics
mjb – March 10, 2022
6. Write the Data from the Host Buffers to the Device Buffers
// enqueue the 2 commands to write data into the device buffers:
status = clEnqueueWriteBuffer( cmdQueue, dA, CL_FALSE, 0, dataSize, hA, 0, NULL, NULL ); status = clEnqueueWriteBuffer( cmdQueue, dB, CL_FALSE, 0, dataSize, hB, 0, NULL, NULL );
command queue
want to block # bytes # events until done?
event object
status = clEnqueueWriteBuffer( cmdQueue, dA, CL_FALSE, 0, dataSize, hA, 0, NULL, NULL );
Computer Graphics
device buffer
offset host event wait buffer list
mjb – March 10, 2022
Enqueuing Works Like a Conveyer Belt
Whopp-a, whopp-a
Read Buffer dC
Execute Kernel
Write Buffer dB
Write Buffer dA
Computer Graphics
mjb – March 10, 2022
The .cl File
gid = which element we are dealing with right now.
Which dimension’s index are we fetching?
0 = X, 1 = Y, 2 = Z
Since this is a 1D problem, X is the only index we need to get.
Computer Graphics
ArrayMult( global const float *dA, global const float *dB, global float *dC ) {
int gid = get_global_id( 0 ); dC[gid] = dA[gid] * dB[gid];
mjb – March 10, 2022
OpenCL code is compiled in the Driver . . .
kernel void
ArrayMult( global float *A, global float *B, global float *C ) {
Computer Graphics
Application Program
int gid = get_global_id ( 0 );
C[gid] = A[gid] * B[gid];
OpenCL Driver does the Compile and Link
OpenCL code in a separate file
mjb – March 10, 2022
(. . . just like OpenGL’s GLSL Shader code is compiled in the driver)
Application Program
GLSL Driver does the Compile and Link
GLSL shader code in a separate file
void main( )
vec3 newcolor = texture2D( uTexUnit, vST) ).rgb; newcolor = mix( newcolor, vColor.rgb, uBlend ); gl_FragColor = vec4(u LightIntensity*newcolor, 1. );
Computer Graphics
mjb – March 10, 2022
7. Read the Kernel Code from a File into a Character Array
“r” should work, since the .cl file is pure ASCII text, but some people report that it doesn’t work unless you use “rb”
const char *CL_FILE_NAME = { “arraymult.cl" }; ...
FILE *fp = fopen( CL_FILE_NAME, "r" ); if( fp == NULL )
fprintf( stderr, "Cannot open OpenCL source file '%s'\n", CL_FILE_NAME ); return 1;
// read the characters from the opencl kernel program:
fseek( fp, 0, SEEK_END );
size_t fileSize = ftell( fp );
fseek( fp, 0, SEEK_SET );
char *clProgramText = new char[ fileSize+1 ]; size_t n = fread( clProgramText, 1, fileSize, fp ); clProgramText[fileSize] = '\0';
fclose( fp );
Watch out for the ‘\r’ + ‘\n’ problem! (See the next slide.)
Computer Graphics
mjb – March 10, 2022
A Warning about Editing on Windows and Ru
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com