How to multiply two quaternions with minimal instructions?

前端未结

关注

 2  2017

After some thought, I came up with the following code for multiplying two quaternions using SSE:

#include


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2020-12-29 13:10
              
            
            
                                                                       
Never mind. If I compile the code with gcc -msse3 -O1 -S instead, I get the following:

    .text
    .align 4,0x90
    .globl __Z13_mm_cross4_psU8__vectorfS_
__Z13_mm_cross4_psU8__vectorfS_:
LFB644:
    movaps  %xmm0, %xmm5
    movaps  %xmm1, %xmm3
    movaps  %xmm0, %xmm2
    shufps  $27, %xmm0, %xmm5
    movaps  %xmm5, %xmm4
    shufps  $17, %xmm1, %xmm3
    shufps  $187, %xmm1, %xmm1
    mulps   %xmm3, %xmm2
    mulps   %xmm1, %xmm4
    mulps   %xmm5, %xmm3
    mulps   %xmm1, %xmm0
    hsubps  %xmm4, %xmm2
    haddps  %xmm3, %xmm0
    movaps  %xmm2, %xmm1
    shufps  $177, %xmm0, %xmm1
    shufps  $228, %xmm2, %xmm0
    addsubps        %xmm1, %xmm0
    shufps  $156, %xmm0, %xmm0
    ret


That's only 18 instructions now. That's what I expected in the beginning. Oops.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2020-12-29 13:25
              
            
            
                                                                       
You may be interested in the Agner Fog's C++ vector class library. It provides a Quaternion4f and Quaternion4d classes (including * and *= operators, of course), implemented by using SSE2 and AVX instruction sets respectively. The library is an Open Source project, so you may dig into the code and find a good implementation example to build your function on.

Later on, you may consult the "optimizing subroutines in assembly language" manual and provide an optimized, pure assembly implementation of the function or, while being aware of some low-level tricks, try to redesign the intrinsics approach in C.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复