Affect of local_work_size on performance and why it is

与世无争的帅哥 提交于 2019-12-22 12:22:21

问题


Hello Everyone....
i am new to opencl and trying to explore more @ it.

What is the work of local_work_size in openCL program and how it matters in performance.

I am working on some image processing algo and for my openCL kernel i gave as

size_t local_item_size = 1; 
size_t global_item_size = (int) (ceil((float)(D_can_width*D_can_height)/local_item_size))*local_item_size; // Process the entire lists
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,&global_item_size, &local_item_size, 0, NULL, NULL);

and for same kernel when i changed

 size_t local_item_size = 16;

keeping everything same.

i got around 4-5 times faster performance.


回答1:


The local-work-size, aka work-group-size, is the number of work-items in each work-group.

Each work-group is executed on a compute-unit which is able to handle a bunch of work-items, not only one.

So when you are using too small groups you waste some computing power, and only got a coarse parallelization at the compute-unit level.

But if you have too many work-items in a group you can also lose some opportunnity for parallelization as some compute-units may not be used, whereas other would be overused.

So you could test with many values to find the best one or just let OpenCL pick a good one for you by passing NULL as the local-work-size.

PS : I'll be interested in knowing the peformance with OpenCL choice compared to your previous values, so could you please make a test and post the results. Thanks :)



来源:https://stackoverflow.com/questions/13761191/affect-of-local-work-size-on-performance-and-why-it-is

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!