问题
I'm doing some exercise about convolution over images (info here) using OpenCL. When I use images whose size is not a square (like r x c) CLK_LOCAL_MEM_FENCE
makes the program stop with abort trap 6.
What I do is essentially filing up the local memory with proper values, waiting for this process of filling the local memory to finish, using barrier(CLK_LOCAL_MEM_FENCE
) and then calculating the values.
It seems like when I use images like those I've told you about barrier(CLK_LOCAL_MEM_FENCE
) gives issues, if I comment that command everything work fine (which is weird since there's no synchronization). What may cause this problem any idea?
EDIT: the problem comes when the hight or the width or both are not multiple of the the local items size (16 x 16). The global items size is aways a couple of values multiple of 16 like (512 x 512).
int c = get_global_id(0);
int r = get_global_id(1);
int lc = get_local_id(0);
int lr = get_local_id(1);
// this ignores indexes out of the input image.
if (c >= ImageWidth || r >= ImageHeight) return;
// fill a local array...
barrier(CLK_LOCAL_MEM_FENCE);
if (c < outputImageWidth && r < outputImageHeight)
{
// LOCAL DATA PROCESSED
OutputImage[r* outputImageWidth +c] = someValue;
}
回答1:
OpenCL requires that each work-group barrier is executed by every work-item in that work-group.
In the code that you have posted, you have an early exit clause to prevent out-of-range accesses. This is a common trick for getting nice work-group sizes in OpenCL 1.X, but unfortunately this breaks the above condition, and this will lead to undefined behaviour (typically either a hang or a crash).
You will need to modify your kernel to avoid this, by either removing the early exit clause (and perhaps clamping out-of-range work-items instead, if applicable), or by restructuring the kernel so that out-of-range work-items continue at least as far as the barrier before exiting.
回答2:
You can change the code order without affecting the behaviour to fix it:
int c = get_global_id(0);
int r = get_global_id(1);
int lc = get_local_id(0);
int lr = get_local_id(1);
// fill a local array... with all the threads
// ie: for(i=0;i<size;i+=get_local_size(0))
// ...
barrier(CLK_LOCAL_MEM_FENCE);
// this ignores indexes out of the input image.
if (c >= ImageWidth || r >= ImageHeight) return;
if (c < outputImageWidth && r < outputImageHeight)
{
// LOCAL DATA PROCESSED
OutputImage[r* outputImageWidth +c] = someValue;
}
来源:https://stackoverflow.com/questions/35220650/opencl-clk-local-mem-fence-causing-abort-trap-6