How to multiply two quaternions with minimal instructions?

前端 未结 2 2017
不知归路
不知归路 2020-12-29 12:54

After some thought, I came up with the following code for multiplying two quaternions using SSE:

#include 

        
相关标签:
2条回答
  • 2020-12-29 13:10

    Never mind. If I compile the code with gcc -msse3 -O1 -S instead, I get the following:

        .text
        .align 4,0x90
        .globl __Z13_mm_cross4_psU8__vectorfS_
    __Z13_mm_cross4_psU8__vectorfS_:
    LFB644:
        movaps  %xmm0, %xmm5
        movaps  %xmm1, %xmm3
        movaps  %xmm0, %xmm2
        shufps  $27, %xmm0, %xmm5
        movaps  %xmm5, %xmm4
        shufps  $17, %xmm1, %xmm3
        shufps  $187, %xmm1, %xmm1
        mulps   %xmm3, %xmm2
        mulps   %xmm1, %xmm4
        mulps   %xmm5, %xmm3
        mulps   %xmm1, %xmm0
        hsubps  %xmm4, %xmm2
        haddps  %xmm3, %xmm0
        movaps  %xmm2, %xmm1
        shufps  $177, %xmm0, %xmm1
        shufps  $228, %xmm2, %xmm0
        addsubps        %xmm1, %xmm0
        shufps  $156, %xmm0, %xmm0
        ret
    

    That's only 18 instructions now. That's what I expected in the beginning. Oops.

    0 讨论(0)
  • 2020-12-29 13:25

    You may be interested in the Agner Fog's C++ vector class library. It provides a Quaternion4f and Quaternion4d classes (including * and *= operators, of course), implemented by using SSE2 and AVX instruction sets respectively. The library is an Open Source project, so you may dig into the code and find a good implementation example to build your function on.

    Later on, you may consult the "optimizing subroutines in assembly language" manual and provide an optimized, pure assembly implementation of the function or, while being aware of some low-level tricks, try to redesign the intrinsics approach in C.

    0 讨论(0)
提交回复
热议问题