Complex Mul and Div using sse Instructions

后端 未结 3 1091
情歌与酒
情歌与酒 2021-02-08 14:17

Is performing complex multiplication and division beneficial through SSE instructions? I know that addition and subtraction perform better when using SSE. Can someone tell me ho

3条回答
  •  难免孤独
    2021-02-08 14:46

    Just for completeness, the Intel® 64 and IA-32 Architectures Optimization Reference Manual that can be downloaded here contains assembly for complex multiply (Example 6-9) and complex divide (Example 6-10).

    Here's for example the multiply code:

    // Multiplication of (ak + i bk ) * (ck + i dk )
    // a + i b can be stored as a data structure
    movsldup xmm0, src1; load real parts into the destination, a1, a1, a0, a0
    movaps xmm1, src2; load the 2nd pair of complex values, i.e. d1, c1, d0, c0
    mulps xmm0, xmm1; temporary results, a1d1, a1c1, a0d0, a0c0
    shufps xmm1, xmm1, b1; reorder the real and imaginary parts, c1, d1, c0, d0
    movshdup xmm2, src1; load imaginary parts into the destination, b1, b1, b0, b0
    mulps xmm2, xmm1; temporary results, b1c1, b1d1, b0c0, b0d0
    addsubps xmm0, xmm2; b1c1+a1d1, a1c1 -b1d1, b0c0+a0d0, ; a0c0-b0d0
    

    The assembly maps directly to gccs X86 intrinsics (just predicate each instruction with __builtin_ia32_).

提交回复
热议问题