Fastest way to do horizontal SSE vector sum (or other reduction)

前端 未结 4 1251
傲寒
傲寒 2020-11-21 07:21

Given a vector of three (or four) floats. What is the fastest way to sum them?

Is SSE (movaps, shuffle, add, movd) always faster than x87? Are the horizontal-add ins

4条回答
  •  攒了一身酷
    2020-11-21 07:48

    You can do it in two HADDPS instructions in SSE3:

    v = _mm_hadd_ps(v, v);
    v = _mm_hadd_ps(v, v);
    

    This puts the sum in all elements.

提交回复
热议问题