Hello Everyone....
i am new to opencl and trying to explore more @ it.
What is the work of local_work_size in openCL program and how it matters in performance.
I am working on some image processing algo and for my openCL kernel i gave as
size_t local_item_size = 1;
size_t global_item_size = (int) (ceil((float)(D_can_width*D_can_height)/local_item_size))*local_item_size; // Process the entire lists
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,&global_item_size, &local_item_size, 0, NULL, NULL);
and for same kernel when i changed
size_t local_item_size = 16;
keeping everything same.
i got around 4-5 times faster performance.
The local-work-size, aka work-group-size, is the number of work-items in each work-group.
Each work-group is executed on a compute-unit which is able to handle a bunch of work-items, not only one.
So when you are using too small groups you waste some computing power, and only got a coarse parallelization at the compute-unit level.
But if you have too many work-items in a group you can also lose some opportunnity for parallelization as some compute-units may not be used, whereas other would be overused.
So you could test with many values to find the best one or just let OpenCL pick a good one for you by passing NULL as the local-work-size.
PS : I'll be interested in knowing the peformance with OpenCL choice compared to your previous values, so could you please make a test and post the results. Thanks :)
来源:https://stackoverflow.com/questions/13761191/affect-of-local-work-size-on-performance-and-why-it-is