How to declare local memory in OpenCL?

后端 未结 3 1375
醉梦人生
醉梦人生 2020-12-15 07:53

I\'m running the OpenCL kernel below with a two-dimensional global work size of 1000000 x 100 and a local work size of 1 x 100.

__kernel void myKernel(
              


        
相关标签:
3条回答
  • 2020-12-15 08:09

    You could also declare your arrays like this:

    __local float LP[LENGTH];
    

    And pass the LENGTH as a define in your kernel compile.

    int lp_size = 128; // this is an example; could be dynamically calculated
    char compileArgs[64];
    sprintf(compileArgs, "-DLENGTH=%d", lp_size);
    clBuildProgram(program, 0, NULL, compileArgs, NULL, NULL);
    
    0 讨论(0)
  • 2020-12-15 08:12

    It's relatively simple, you can pass the local arrays as arguments to your kernel:

    kernel void myKernel(const int length, const int height, local float* LP, 
                         local float* LT, a bunch of other parameters) 
    

    You then set the kernelargument with a value of NULL and a size equal to the size you want to allocate for the argument (in byte). Therefore it should be:

    clSetKernelArg(kernel, 2, length * sizeof(cl_float), NULL);
    clSetKernelArg(kernel, 2, height* sizeof(cl_float), NULL);
    

    local memory is always shared by the workgroup (as opposed to private), so I think the bool and int should be fine, but if not you can always pass those as arguments too.

    Not really related to your problem (and not necessarily relevant, since I do not know what hardware you plan to run this on), but at least gpus don't particulary like workingsizes which are not a multiple of a particular power of two (I think it was 32 for nvidia, 64 for amd), meaning that will probably create workgroups with 128 items, of which the last 28 are basically wasted. So if you are running opencl on gpu it might help performance if you directly use workgroups of size 128 (and change the global work size appropriately)

    As a side note: I never understood why everyone uses the underscore variant for kernel, local and global, seems much uglier to me.

    0 讨论(0)
  • 2020-12-15 08:22

    You do not have to allocate all your local memory outside the kernel, especially when it is a simple variable instead of a array.

    The reason that your code cannot compile is that OpenCL does not support local memory initialization. This is specified in the document(https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/local.html). It is also not feasible in CUDA(Is there a way of setting default value for shared memory array?)


    ps:The answer from Grizzly is good enough and it would be better if I can post it as a comment, but I am restricted by the reputation policy. Sorry.

    0 讨论(0)
提交回复
热议问题