assignment with intel Intrinsics - horizontal add

青春壹個敷衍的年華 提交于 2021-02-11 15:13:06

问题


I want sum up all elements of a big vector ary. My idea was to do it with a horizontal sum.

const int simd_width = 16/sizeof(float); 
float helper[simd_width];

//take the first 4 elements
const __m128 a4 = _mm_load_ps(ary);

for(int i=0; i<N-simd_width; i+=simd_width){
     const __m128 b4 = _mm_load_ps(ary+i+simd_width);
     //save temporary result in helper array
     _mm_store_ps(helper, _mm_hadd_ps(a4,b4)); //C
     const __m128 a4 = _mm_load_ps(helper);

}

I looked for a method, with which i can assign the resulting vector directly to the quadfloat a4 directly like _mm_store_ps(a4, _mm_hadd_ps(a4,b4)) Is there such a Intel method? (It is my first time to work with SSE -maybe the whole code snippet is wrong)


回答1:


As Peter suggested, do not use horizontal sums. Use vertical sums.

For example, in pseudo-code, with simd width = 2

SIMD sum = {0,0}; // we use 2 accumulators
for (int i = 0; i + 1 < n; i += 2)
    sum = simd_add(sum, simd_load(x+i));
float s = horizzontal_add(sum);
if (n & 1)  // n was not a multiple of 2?
   s += x[n-1]; // deal with last element


来源:https://stackoverflow.com/questions/52936345/assignment-with-intel-intrinsics-horizontal-add

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!