Conditional SSE/AVX add or zero elements based on compare

此生再无相见时 提交于 2021-02-04 21:40:18

问题


I have the following __m128 vectors:

v_weight

v_entropy

I need to add v_entropy to v_weight only where elements in v_weight are not 0f.

Obviously _mm_add_ps() adds all elements regardless.

I can compile up to AVX, but not AVX2.

EDIT

I do know beforehand how many elements in v_weight will be 0 (there will always be either 0 or the last 1, 2, or 3 elements). If it's easier, how do I zero-out the corresponding elements in v_entropy?


回答1:


The cmpeq/cmpgt instructions create a mask, all ones or all zeros. The overall process goes as follows:

auto mask=_mm_cmpeq_ps(_mm_setzero_ps(), w);
mask=_mm_andnot_ps(mask, entropy);
w = _mm_add_ps(w, mask);

Other option is to accumulate anyway, but use blendv to select between added/not added.

auto w2=_mm_add_ps(e,w);
auto mask=_mm_cmpeq_ps(zero,w);
w=_mm_blendv_ps(w2,w, mask);

Third option uses the fact that w+e = 0, when w=0

 m=(w==0); // make mask as in above
 w+=e; // add
 w&=~m; // revert adding for w==0

(I'm using cmpeq instead of cmpneq to make it usable for integers as well.)



来源:https://stackoverflow.com/questions/49982536/conditional-sse-avx-add-or-zero-elements-based-on-compare

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!