问题
in my C OpenCL code I use clSetKernelArg
to create 'variable size' __local
memory for use in my kernels, which is not available in OpenCL per se. See my example:
clSetKernelArg(clKernel, ArgCounter++, sizeof(cl_mem), (void *)&d_B);
...
clSetKernelArg(clKernel, ArgCounter++, sizeof(float)*block_size*block_size, NULL);
...
kernel="
matrixMul(__global float* C,
...
__local float* A_temp,
...
)"
{...
My question is now, how to do the same in pyopencl?
I looked through the examples that come with pyopencl, but the only thing I could find was an approach using templates, which seems as to me as I understood it like an overkill. See example.
kernel = """
__kernel void matrixMul(__global float* C,...){
...
__local float A_temp[ %(mem_size) ];
...
}
What do you recommend?
回答1:
It is similar to C. You pass it a fixed size array as a local. Here is an example from Enja's radix sort. Notice the last argument is a local memory array.
def naive_scan(self, num):
nhist = num/2/self.cta_size*16
global_size = (nhist,)
local_size = (nhist,)
extra_space = nhist / 16 #NUM_BANKS defined as 16 in RadixSort.cpp
shared_mem_size = self.uintsz * (nhist + extra_space)
scan_args = ( self.mCountersSum,
self.mCounters,
np.uint32(nhist),
cl.LocalMemory(2*shared_mem_size)
)
self.radix_prg.scanNaive(self.queue, global_size, local_size, *(scan_args)).wait()
回答2:
I am no familiar with Python and its OpenCL implementation, but a local memory can also be created within the kernel with a fixed size (similar what you did):
__kernel void matrixMul(...) {
__local float A_templ[1024];
}
Instead of 1024 a defined preprocessor symbol can be used and can be set during compilation to change the size:
#define SIZE 1024
__kernel void matrixMul(...) {
__local float A_templ[SIZE];
}
SIZE can be defined within the same soure, as compiler parameter for cLBuildProgram or as an additional source for clCreateProgramWithSource.
EDIT: Found something with Google ;-): http://www.google.com/url?sa=t&source=web&cd=4&ved=0CC8QFjAD&url=http%3A%2F%2Flinksceem.eu%2Fjoomla%2Ffiles%2FPRACE_Winter_School%2FLinkSCEMM_pyOpenCL.pdf&rct=j&q=Pyopencl%20__local%20memory&ei=BTbETbWhOsvBswadp62ODw&usg=AFQjCNG6rXEEkDpE1304pmQDu3GFdRA0BQ&sig2=vHOGOqwA1HHUl10c6HO8WQ&cad=rja
来源:https://stackoverflow.com/questions/5912968/how-to-create-variable-sized-local-memory-in-pyopencl