OpenCL CLK_LOCAL_MEM_FENCE causing abort trap 6

醉酒当歌 提交于 2021-01-29 22:03:23

问题


I'm doing some exercise about convolution over images (info here) using OpenCL. When I use images whose size is not a square (like r x c) CLK_LOCAL_MEM_FENCE makes the program stop with abort trap 6.

What I do is essentially filing up the local memory with proper values, waiting for this process of filling the local memory to finish, using barrier(CLK_LOCAL_MEM_FENCE) and then calculating the values.

It seems like when I use images like those I've told you about barrier(CLK_LOCAL_MEM_FENCE) gives issues, if I comment that command everything work fine (which is weird since there's no synchronization). What may cause this problem any idea?

EDIT: the problem comes when the hight or the width or both are not multiple of the the local items size (16 x 16). The global items size is aways a couple of values multiple of 16 like (512 x 512).

int c = get_global_id(0); 
int r = get_global_id(1); 

int lc = get_local_id(0);
int lr = get_local_id(1);

// this ignores indexes out of the input image.
if (c >= ImageWidth || r >= ImageHeight) return;

// fill a local array...

barrier(CLK_LOCAL_MEM_FENCE);

if (c < outputImageWidth && r < outputImageHeight)
{
     // LOCAL DATA PROCESSED  
     OutputImage[r* outputImageWidth +c] = someValue;
}

回答1:


OpenCL requires that each work-group barrier is executed by every work-item in that work-group.

In the code that you have posted, you have an early exit clause to prevent out-of-range accesses. This is a common trick for getting nice work-group sizes in OpenCL 1.X, but unfortunately this breaks the above condition, and this will lead to undefined behaviour (typically either a hang or a crash).

You will need to modify your kernel to avoid this, by either removing the early exit clause (and perhaps clamping out-of-range work-items instead, if applicable), or by restructuring the kernel so that out-of-range work-items continue at least as far as the barrier before exiting.




回答2:


You can change the code order without affecting the behaviour to fix it:

int c = get_global_id(0); 
int r = get_global_id(1); 

int lc = get_local_id(0);
int lr = get_local_id(1);

// fill a local array... with all the threads
// ie: for(i=0;i<size;i+=get_local_size(0))
//        ...

barrier(CLK_LOCAL_MEM_FENCE);

// this ignores indexes out of the input image.
if (c >= ImageWidth || r >= ImageHeight) return;

if (c < outputImageWidth && r < outputImageHeight)
{
     // LOCAL DATA PROCESSED  
     OutputImage[r* outputImageWidth +c] = someValue;
}


来源:https://stackoverflow.com/questions/35220650/opencl-clk-local-mem-fence-causing-abort-trap-6

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!