adding the components of an SSE register

风流意气都作罢 提交于 2019-12-05 10:47:39

问题


I want to add the four components of an SSE register to get a single float. This is how I do it now:

float a[4];
_mm_storeu_ps(a, foo128);
float x = a[0] + a[1] + a[2] + a[3];

Is there an SSE instruction that directly achieves this?


回答1:


You could probably use the HADDPS SSE3 instruction, or its compiler intrinsic _mm_hadd_ps,

For example, see http://msdn.microsoft.com/en-us/library/yd9wecaa(v=vs.80).aspx

If you have two registers v1 and v2 :

v = _mm_hadd_ps(v1, v2);
v = _mm_hadd_ps(v, v);

Now, v[0] contains the sum of v1's components, and v[1] contains the sum of v2's components.




回答2:


If you want your code to work on pre-SSE3 CPUs (which do not support _mm_hadd_ps), you might use the following code. It uses more instructions, but decodes to less microops on most CPUs.

 __m128 temp = _mm_add_ps(_mm_movehl_ps(foo128, foo128), foo128);
 float x;
 _mm_store_ss(&x, _mm_add_ss(temp, _mm_shuffle_ps(temp, 1)));



回答3:


Well, I don't know about any such function, but it can be done using _mm_hadd_ps() two times.



来源:https://stackoverflow.com/questions/8536032/adding-the-components-of-an-sse-register

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!