After some thought, I came up with the following code for multiplying two quaternions using SSE:
#include
You may be interested in the Agner Fog's C++ vector class library. It provides a Quaternion4f
and Quaternion4d
classes (including *
and *=
operators, of course), implemented by using SSE2 and AVX instruction sets respectively. The library is an Open Source project, so you may dig into the code and find a good implementation example to build your function on.
Later on, you may consult the "optimizing subroutines in assembly language" manual and provide an optimized, pure assembly implementation of the function or, while being aware of some low-level tricks, try to redesign the intrinsics approach in C.