Is performing complex multiplication and division beneficial through SSE instructions? I know that addition and subtraction perform better when using SSE. Can someone tell me ho
The algorithm in the intel optimization reference does not handle overflows and NaN
s in the input properly.
A single NaN
in the real or imaginary part of the number will incorrectly spread to the other part.
As several operations with infinity (e.g. infinity * 0) end in NaN
, overflows can cause NaN
s to appear in your otherwise well-behaved data.
If overflows and NaN
s are rare, a simple way to avoid this is to just check for NaN
in the result and recompute it with the compilers IEEE compliant implementation:
float complex a[2], b[2];
__m128 res = simd_fast_multiply(a, b);
/* store unconditionally, can be executed in parallel with the check
* making it almost free if there is no NaN in data */
_mm_store_ps(dest, res);
/* check for NaN */
__m128 n = _mm_cmpneq_ps(res, res);
int have_nan = _mm_movemask_ps(n);
if (have_nan != 0) {
/* do it again unvectorized */
dest[0] = a[0] * b[0];
dest[1] = a[1] * b[1];
}