adding the components of an SSE register
问题 I want to add the four components of an SSE register to get a single float. This is how I do it now: float a[4]; _mm_storeu_ps(a, foo128); float x = a[0] + a[1] + a[2] + a[3]; Is there an SSE instruction that directly achieves this? 回答1: You could probably use the HADDPS SSE3 instruction, or its compiler intrinsic _mm_hadd_ps , For example, see http://msdn.microsoft.com/en-us/library/yd9wecaa(v=vs.80).aspx If you have two registers v1 and v2 : v = _mm_hadd_ps(v1, v2); v = _mm_hadd_ps(v, v);