OpenCL

From Wikipedia, the free encyclopedia

Jump to: navigation, search

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus APIs that are used to define and then control the heterogeneous platform. OpenCL provides parallel programming using both task-based and data-based parallelism.

The purpose of OpenCL is analogous to that of OpenGL and OpenAL, which are open industry standards for 3D graphics and computer audio respectively. OpenCL extends the power of the GPU beyond graphics (GPGPU). OpenCL is managed by the non-profit technology consortium Khronos Group.

Contents

[edit] History

OpenCL was initially conceived by Apple Inc., which holds trademark rights, and refined into an initial proposal in collaboration with technical teams at AMD, Intel and Nvidia. Apple submitted this initial proposal to the Khronos Group. On June 16, 2008 the Khronos Compute Working Group was formed[1] with representatives from CPU, GPU, embedded-processor, and software companies. This group worked for five months to finish the technical details of the specification for OpenCL 1.0 by November 18, 2008.[2] This technical specification was reviewed by the Khronos members and approved for public release on December 8, 2008.[3]

OpenCL is scheduled to be introduced in Mac OS X v10.6 ('Snow Leopard'). According to an Apple press release:[4]

Snow Leopard further extends support for modern hardware with Open Computing Language (OpenCL), which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications. OpenCL is based on the C programming language and has been proposed as an open standard.

AMD has decided to support OpenCL (and DirectX 11) instead of the now deprecated Close to Metal in its Stream framework.[5][6] RapidMind announced their adoption of OpenCL underneath their development platform, in order to support GPUs from multiple vendors with one interface.[7] Nvidia announced on December 9, 2008 to add full support for the OpenCL 1.0 specification to its GPU Computing Toolkit.[8]

[edit] Example

This example will compute a Fast Fourier Transformation: [9]

// create a compute context with GPU device
context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
 
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
 
// allocate the buffer memory objects
memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float)*2*num_entries, srcA);
memobjs[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float)*2*num_entries, NULL);
 
// create the compute program
program = clCreateProgramFromSource(context, 1, &fft1D_1024_kernel_src, NULL);
 
// build the compute program executable
clBuildProgramExecutable(program, false, NULL, NULL);
 
// create the compute kernel
kernel = clCreateKernel(program, "fft1D_1024");
 
// create N-D range object with work-item dimensions
global_work_size[0] = n;
local_work_size[0] = 64;
range = clCreateNDRangeContainer(context, 0, 1, global_work_size, local_work_size);
 
// set the args values
clSetKernelArg(kernel, 0, (void *)&memobjs[0], sizeof(cl_mem), NULL);
clSetKernelArg(kernel, 1, (void *)&memobjs[1], sizeof(cl_mem), NULL);
clSetKernelArg(kernel, 2, NULL, sizeof(float)*(local_work_size[0]+1)*16, NULL);
clSetKernelArg(kernel, 3, NULL, sizeof(float)*(local_work_size[0]+1)*16, NULL);
 
// execute kernel
clExecuteKernel(queue, kernel, NULL, range, NULL, 0, NULL);

The actual calculation: (Based on Fitting FFT onto the G80 Architecture)[10]

// This kernel computes FFT of length 1024. The 1024 length FFT is decomposed into 
// calls to a radix 16 function, another radix 16 function and then a radix 4 function 
 
__kernel void fft1D_1024 (__global float2 *in, __global float2 *out, 
                          __local float *sMemx, __local float *sMemy) { 
  int tid = get_local_id(0); 
  int blockIdx = get_group_id(0) * 1024 + tid; 
  float2 data[16]; 
 
  // starting index of data to/from global memory 
  in = in + blockIdx;  out = out + blockIdx; 
 
  globalLoads(data, in, 64); // coalesced global reads 
  fftRadix16Pass(data);      // in-place radix-16 pass 
  twiddleFactorMul(data, tid, 1024, 0); 
 
  // local shuffle using local memory 
  localShuffle(data, sMemx, sMemy, tid, (((tid & 15) * 65) + (tid >> 4))); 
  fftRadix16Pass(data);               // in-place radix-16 pass 
  twiddleFactorMul(data, tid, 64, 4); // twiddle factor multiplication 
 
  localShuffle(data, sMemx, sMemy, tid, (((tid >> 4) * 64) + (tid & 15))); 
 
  // four radix-4 function calls 
  fftRadix4Pass(data);
  fftRadix4Pass(data + 4); 
  fftRadix4Pass(data + 8);
  fftRadix4Pass(data + 12); 
 
  // coalesced global writes 
  globalStores(data, out, 64); 
}

[edit] Implementation

On 10th December 2008, both AMD and NVIDIA demonstrated the world's first public OpenCL demo, they shared a 75 minute presentation at Siggraph Asia 2008. AMD showed a sample on CPU accelerated OpenCL demo explaining the scalability of OpenCL on 1 and more cores while NVIDIA showed a sample on GPU accelerated demo.[11][12]

On 26th March 2009 GDC 2008 AMD and Havok demonstrated the first working implementation for OpenCL accelerating Havok Cloth on AMD Radeon HD 4000 series GPU.[13]

[edit] See also

[edit] References

  1. ^ Khronos Group (2008-06-16). Khronos Launches Heterogeneous Computing Initiative. Press release. http://www.khronos.org/news/press/releases/khronos_launches_heterogeneous_computing_initiative/. Retrieved on 2008-06-18. 
  2. ^ "OpenCL gets touted in Texas". MacWorld. 2008-11-20. http://www.macworld.com/article/136921/2008/11/opencl.html?lsrc=top_2. 
  3. ^ Khronos Group (2008-12-08). The Khronos Group Releases OpenCL 1.0 Specification. Press release. http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/. 
  4. ^ Apple Inc. (2008-06-09). Apple Previews Mac OS X Snow Leopard to Developers. Press release. http://www.apple.com/pr/library/2008/06/09snowleopard.html. Retrieved on 2008-06-09. 
  5. ^ AMD (2008-08-06). AMD Drives Adoption of Industry Standards in GPGPU Software Development. Press release. http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~127451,00.html. Retrieved on 2008-08-14. 
  6. ^ "AMD Backs OpenCL, Microsoft DirectX 11". eWeek. 2008-08-06. http://www.eweek.com/c/a/Desktops-and-Notebooks/AMD-Backing-OpenCL-and-Microsoft-DirectX-11/. Retrieved on 2008-08-14. 
  7. ^ "HPCWire: RapidMind Embraces Open Source and Standards Projects". HPCWire. 2008-11-10. http://www.hpcwire.com/topic/applications/RapidMind_Embraces_Open_Source_and_Standards_Projects.html. Retrieved on 2008-11-11. 
  8. ^ Nvidia (2008-12-09). NVIDIA Adds OpenCL To Its Industry Leading GPU Computing Toolkit. Press release. http://www.nvidia.com/object/io_1228825271885.html. Retrieved on 2008-12-10. 
  9. ^ "OpenCL". SIGGRAPH2008. 2008-08-14. http://s08.idav.ucdavis.edu/munshi-opencl.pdf. Retrieved on 2008-08-14. 
  10. ^ "Fitting FFT onto G80 Architecture". Vasily Volkov and Brian Kazian, UC Berkeley CS258 project report. May 2008. http://www.cs.berkeley.edu/~kubitron/courses/cs258-S08/projects/reports/project6_report.pdf. Retrieved on 2008-11-14. 
  11. ^ "OpenCL Demo, AMD CPU". 2008-12-10. http://www.youtube.com/watch?v=sLv_fhQlqis. Retrieved on 2009-03-28. 
  12. ^ "OpenCL Demo, NVIDIA GPU". 2008-12-10. http://www.youtube.com/watch?v=PJ1jydg8mLg. Retrieved on 2009-03-28. 
  13. ^ "AMD and Havok demo OpenCL accelerated physics". PC Perspective. 2009-03-26. http://www.pcper.com/comments.php?nid=6954. Retrieved on 2009-03-28. 

[edit] External links


Personal tools