Complex Mul and Div using sse Instructions

后端 未结 3 1096
情歌与酒
情歌与酒 2021-02-08 14:17

Is performing complex multiplication and division beneficial through SSE instructions? I know that addition and subtraction perform better when using SSE. Can someone tell me ho

3条回答
  •  旧巷少年郎
    2021-02-08 14:32

    The algorithm in the intel optimization reference does not handle overflows and NaNs in the input properly.

    A single NaN in the real or imaginary part of the number will incorrectly spread to the other part.

    As several operations with infinity (e.g. infinity * 0) end in NaN, overflows can cause NaNs to appear in your otherwise well-behaved data.

    If overflows and NaNs are rare, a simple way to avoid this is to just check for NaN in the result and recompute it with the compilers IEEE compliant implementation:

    float complex a[2], b[2];
    __m128 res = simd_fast_multiply(a, b);
    
    /* store unconditionally, can be executed in parallel with the check
     * making it almost free if there is no NaN in data */
    _mm_store_ps(dest, res);
    
    /* check for NaN */
    __m128 n = _mm_cmpneq_ps(res, res);
    int have_nan = _mm_movemask_ps(n);
    if (have_nan != 0) {
        /* do it again unvectorized */
        dest[0] = a[0] * b[0];
        dest[1] = a[1] * b[1];
    }
    

提交回复
热议问题