问题
I have the following __m128
vectors:
v_weight
v_entropy
I need to add v_entropy
to v_weight
only where elements in v_weight
are not 0f.
Obviously _mm_add_ps()
adds all elements regardless.
I can compile up to AVX, but not AVX2.
EDIT
I do know beforehand how many elements in v_weight
will be 0 (there will always be either 0 or the last 1, 2, or 3 elements). If it's easier, how do I zero-out the corresponding elements in v_entropy
?
回答1:
The cmpeq/cmpgt instructions create a mask, all ones or all zeros. The overall process goes as follows:
auto mask=_mm_cmpeq_ps(_mm_setzero_ps(), w);
mask=_mm_andnot_ps(mask, entropy);
w = _mm_add_ps(w, mask);
Other option is to accumulate anyway, but use blendv to select between added/not added.
auto w2=_mm_add_ps(e,w);
auto mask=_mm_cmpeq_ps(zero,w);
w=_mm_blendv_ps(w2,w, mask);
Third option uses the fact that w+e = 0, when w=0
m=(w==0); // make mask as in above
w+=e; // add
w&=~m; // revert adding for w==0
(I'm using cmpeq instead of cmpneq to make it usable for integers as well.)
来源:https://stackoverflow.com/questions/49982536/conditional-sse-avx-add-or-zero-elements-based-on-compare