How to accumulate vectors in OpenCL?

时光毁灭记忆、已成空白 提交于 2019-12-24 09:57:49

问题


I have a set of operations running in a loop.

for(int i = 0; i < row; i++)
{
    sum += arr1[0] - arr2[0]
    sum += arr1[0] - arr2[0]
    sum += arr1[0] - arr2[0]
    sum += arr1[0] - arr2[0]

    arr1 += offset1;
    arr2 += offset2;
}

Now I'm trying to vectorize the operations like this

for(int i = 0; i < row; i++)
{
    convert_int4(vload4(0, arr1) - vload4(0, arr2));

    arr1 += offset1;
    arr2 += offset2;
}

But how do I accumulate the resulting vector in the scalar sum without using a loop?

I'm using OpenCL 2.0.


回答1:


The operation is called "reduction" and there seems to be some information on it here.

In OpenCL special functions seem to be implemented, one being work_group_reduce() that might aid you: link.

And a presentation including some code: link.




回答2:


For float2,float4 and similar, easiest version could be dot product. (conversion from int to float could be expensive)

float4 v1=(float4 )(1,2,3,4);
float4 v2=(float4 )(5,6,7,8);

float sum=dot(v1-v2,(float4)(1,1,1,1));

this is equal to

(v1.x-v2.x)*1 + (v1.y-v2.y)*1+(v1.z-v2.z)*1+(v1.w-v2.w)*1 

and if there is any hardware support for it, leaving it to compiler's mercy should be okay. For larger vectors and especially arrays, J.H.Bonarius's answer is the way to go. Only CPU has such vertical sum operations as I know, GPU doesn't have this but for the sake of portability, dot product and work_group_reduce are easiest ways to achieve readability and even performance.

Dot product has extra multiplications so it may not be good always.




回答3:


I have found a solution which seems to be the closest way I could have expected to solve my problem.

uint sum = 0;
uint4 S;

for(int i = 0; i < row; i++)
{
    S += convert_uint4(vload4(0, arr1) - vload4(0, arr2));

    arr1 += offset1;
    arr2 += offset2;
}

S.s01 = S.s01 + S.s23;
sum = S.s0 + S.s1;

OpenCL 2.0 provides this functionality with vectors where the elements of the vectors can successively be replaced with the addition operation as shown above. This can support up to a vector of size 16. Larger operations can be split into factors of smaller operations. For example, for adding the absolute values of differences between two vectors of size 32, we can do the following:

uint sum = 0;
uint16 S0, S1;

for(int i = 0; i < row; i++)
{
    S0 += convert_uint16(abs(vload16(0, arr1) - vload16(0, arr2)));
    S1 += convert_uint16(abs(vload16(1, arr1) - vload16(1, arr2)));

    arr1 += offset1;
    arr2 += offset2;
}

S0 = S0 + S1;
S0.s01234567 = S0.s01234567 + S0.s89abcdef;
S0.s0123 = S0.s0123 + S0.s4567;
S0.s01 = S0.s01 + S0.s23;
sum = S0.s0 + S0.s1;


来源:https://stackoverflow.com/questions/42156691/how-to-accumulate-vectors-in-opencl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!