How to declare local memory in OpenCL?

后端 未结 3 1374
醉梦人生
醉梦人生 2020-12-15 07:53

I\'m running the OpenCL kernel below with a two-dimensional global work size of 1000000 x 100 and a local work size of 1 x 100.

__kernel void myKernel(
              


        
3条回答
  •  醉梦人生
    2020-12-15 08:12

    It's relatively simple, you can pass the local arrays as arguments to your kernel:

    kernel void myKernel(const int length, const int height, local float* LP, 
                         local float* LT, a bunch of other parameters) 
    

    You then set the kernelargument with a value of NULL and a size equal to the size you want to allocate for the argument (in byte). Therefore it should be:

    clSetKernelArg(kernel, 2, length * sizeof(cl_float), NULL);
    clSetKernelArg(kernel, 2, height* sizeof(cl_float), NULL);
    

    local memory is always shared by the workgroup (as opposed to private), so I think the bool and int should be fine, but if not you can always pass those as arguments too.

    Not really related to your problem (and not necessarily relevant, since I do not know what hardware you plan to run this on), but at least gpus don't particulary like workingsizes which are not a multiple of a particular power of two (I think it was 32 for nvidia, 64 for amd), meaning that will probably create workgroups with 128 items, of which the last 28 are basically wasted. So if you are running opencl on gpu it might help performance if you directly use workgroups of size 128 (and change the global work size appropriately)

    As a side note: I never understood why everyone uses the underscore variant for kernel, local and global, seems much uglier to me.

提交回复
热议问题