Is there a limit to OpenCL local memory?

后端 未结 3 951
渐次进展
渐次进展 2021-01-02 04:58

Today I added four more __local variables to my kernel to dump intermediate results in. But just adding the four more variables to the kernel\'s signature and a

相关标签:
3条回答
  • 2021-01-02 05:44

    I'm not sure, but I felt this must be seen.

    Just go through the following links. Read it.

    A great read : OpenCL – Memory Spaces.

    A bit related stuff's :

    • How do I determine available device memory in OpenCL?
    • How do I use local memory in OpenCL?
    • Strange behaviour using local memory in OpenCL
    0 讨论(0)
  • 2021-01-02 05:49

    Of course there is, since local memory is physical rather than virtual.

    We are used, from working with a virtual address space on CPUs, to theoretically have as much memory as we want - potentially failing at very large sizes due to paging file / swap partition running out, or maybe not even that, until we actually try to use too much memory so that it can't be mapped to the physical RAM and the disk.

    This is not the case for things like a computer's OS kernel (or lower-level parts of it) which need to access specific areas in the actual RAM.

    It is also not the case for GPU global and local memory. There is no* memory paging (remapping of perceived thread addresses to physical memory addresses); and no swapping. Specifically regarding local memory, every compute unit (= every symmetric multiprocessor on a GPU) has a bunch of RAM used as local memory; the green slabs here:

    enter image description here

    the size of each such slab is what you get with

    clGetDeviceInfo( · , CL_DEVICE_LOCAL_MEM_SIZE, · , ·).

    To illustrate, on nVIDIA Kepler GPUs, the local memory size is either 16 KBytes or 48 KBytes (and the complement to 64 KBytes is used for caching accesses to Global Memory). So, as of today, GPU local memory is very small relative to the global device memory.


    1 - On nVIDIA GPUs beginning with the Pascal architecture, paging is supported; but that's not the common way of using device memory.

    0 讨论(0)
  • 2021-01-02 05:59

    The amount of local memory which a device offers on each of its compute units can be queried by using the CL_DEVICE_LOCAL_MEM_SIZE flag with the clGetDeviceInfo function:

    cl_ulong size;
    clGetDeviceInfo(deviceID, CL_DEVICE_LOCAL_MEM_SIZE, sizeof(cl_ulong), &size, 0);
    

    The size returned is in bytes. Each workgroup can allocate this much memory strictly for itself. Note, however, that if it does allocate maximum, this may prevent scheduling other workgrups concurrently on the same compute unit.

    0 讨论(0)
提交回复
热议问题