Affect of local_work_size on performance and why it is
Hello Everyone.... i am new to opencl and trying to explore more @ it. What is the work of local_work_size in openCL program and how it matters in performance. I am working on some image processing algo and for my openCL kernel i gave as size_t local_item_size = 1; size_t global_item_size = (int) (ceil((float)(D_can_width*D_can_height)/local_item_size))*local_item_size; // Process the entire lists ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,&global_item_size, &local_item_size, 0, NULL, NULL); and for same kernel when i changed size_t local_item_size = 16; keeping everything