How to multiply two quaternions with minimal instructions?

天涯浪子 提交于 2019-11-30 05:13:44

Never mind. If I compile the code with gcc -msse3 -O1 -S instead, I get the following:

    .text
    .align 4,0x90
    .globl __Z13_mm_cross4_psU8__vectorfS_
__Z13_mm_cross4_psU8__vectorfS_:
LFB644:
    movaps  %xmm0, %xmm5
    movaps  %xmm1, %xmm3
    movaps  %xmm0, %xmm2
    shufps  $27, %xmm0, %xmm5
    movaps  %xmm5, %xmm4
    shufps  $17, %xmm1, %xmm3
    shufps  $187, %xmm1, %xmm1
    mulps   %xmm3, %xmm2
    mulps   %xmm1, %xmm4
    mulps   %xmm5, %xmm3
    mulps   %xmm1, %xmm0
    hsubps  %xmm4, %xmm2
    haddps  %xmm3, %xmm0
    movaps  %xmm2, %xmm1
    shufps  $177, %xmm0, %xmm1
    shufps  $228, %xmm2, %xmm0
    addsubps        %xmm1, %xmm0
    shufps  $156, %xmm0, %xmm0
    ret

That's only 18 instructions now. That's what I expected in the beginning. Oops.

You may be interested in the Agner Fog's C++ vector class library. It provides a Quaternion4f and Quaternion4d classes (including * and *= operators, of course), implemented by using SSE2 and AVX instruction sets respectively. The library is an Open Source project, so you may dig into the code and find a good implementation example to build your function on.

Later on, you may consult the "optimizing subroutines in assembly language" manual and provide an optimized, pure assembly implementation of the function or, while being aware of some low-level tricks, try to redesign the intrinsics approach in C.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!