In SSE there is a function _mm_cvtepi32_ps(__m128i input)
which takes input vector of 32 bits wide signed integers (int32_t
) and converts them into
With Paul R's solution and with my previous solution the difference between the rounded floating point and the original integer is less than or equal to 0.75 ULP (Unit in the Last Place). In these methods at two places rounding may occur: in _mm_cvtepi32_ps and in _mm_add_ps. This leads to results that are not as accurate as possible for some inputs.
For example, with Paul R's method 0x2000003=33554435 is converted to 33554432.0, but 33554436.0 also exists as a float, which would have been better here. My previous solution suffers from similar inaccuracies. Such inaccurate results may also occur with compiler generated code, see here.
Following the approach of gcc (see Peter Cordes' answer to that other SO question), an accurate conversion within 0.5 ULP is obtained:
inline __m128 _mm_cvtepu32_ps(const __m128i v)
{
__m128i msk_lo = _mm_set1_epi32(0xFFFF);
__m128 cnst65536f= _mm_set1_ps(65536.0f);
__m128i v_lo = _mm_and_si128(v,msk_lo); /* extract the 16 lowest significant bits of v */
__m128i v_hi = _mm_srli_epi32(v,16); /* 16 most significant bits of v */
__m128 v_lo_flt = _mm_cvtepi32_ps(v_lo); /* No rounding */
__m128 v_hi_flt = _mm_cvtepi32_ps(v_hi); /* No rounding */
v_hi_flt = _mm_mul_ps(cnst65536f,v_hi_flt); /* No rounding */
return _mm_add_ps(v_hi_flt,v_lo_flt); /* Rounding may occur here, mul and add may fuse to fma for haswell and newer */
} /* _mm_add_ps is guaranteed to give results with an error of at most 0.5 ULP */
Note that other high bits/low bits partitions are possible as long as _mm_cvt_ps can convert both pieces to floats without rounding. For example, a partition with 20 high bits and 12 low bits will work equally well.