adding the components of an SSE register

烂漫一生 提交于 2019-12-04 00:29:44

You could probably use the HADDPS SSE3 instruction, or its compiler intrinsic _mm_hadd_ps,

For example, see http://msdn.microsoft.com/en-us/library/yd9wecaa(v=vs.80).aspx

If you have two registers v1 and v2 :

v = _mm_hadd_ps(v1, v2);
v = _mm_hadd_ps(v, v);

Now, v[0] contains the sum of v1's components, and v[1] contains the sum of v2's components.

If you want your code to work on pre-SSE3 CPUs (which do not support _mm_hadd_ps), you might use the following code. It uses more instructions, but decodes to less microops on most CPUs.

 __m128 temp = _mm_add_ps(_mm_movehl_ps(foo128, foo128), foo128);
 float x;
 _mm_store_ss(&x, _mm_add_ss(temp, _mm_shuffle_ps(temp, 1)));

Well, I don't know about any such function, but it can be done using _mm_hadd_ps() two times.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!