Can GCC be coerced to generate efficient constructors for memory-aligned objects?

前端未结
关注
 1  392
庸人自扰 2021-01-31 17:29
I\'m optimizing a constructor that is called in one of our app\'s innermost loops. The class in question is about 100 bytes wide, consists of a bunch of ints,

      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   有刺的猬
                                             
                
                
                (楼主)
            
              
              
                2021-01-31 18:05
              

            
            
                        
Here's how I would do it. Don't declare any constructor; instead, declare a fixed Frobozz that contains default values:

const Frobozz DefaultFrobozz =
  {
  0, 1, -1, 0,        // int na,nb,nc,nd;
  false, true, false, // bool ba,bb,bc;
  'a', 'b', 'c',      // char ca,cb,cc;
  -1, 1.0             // float fa,fb;
  } ;


Then in OversimplifiedExample:

Frobozz params (DefaultFrobozz) ;


With gcc -O3 (version 4.5.2), the initialisation of params reduces to:

leal    -72(%ebp), %edi
movl    $_DefaultFrobozz, %esi
movl    $16, %ecx
rep movsl


which is about as good as it gets in a 32-bit environment.

Warning: I tried this with the 64-bit g++ version 4.7.0 20110827 (experimental), and it generated an explicit sequence of 64-bit copies instead of a block move. The processor doesn't allow rep movsq, but I would expect rep movsl to be faster than a sequence of 64-bit loads and stores. Perhaps not. (But the -Os switch -- optimise for space -- does use a rep movsl instruction.) Anyway, try this and let us know what happens.

Edited to add: I was wrong about the processor not allowing rep movsq. Intel's documentation says "The MOVS, MOVSB, MOVSW, and MOVSD instructions can be preceded by the REP prefix", but it seems that this is just a documentation glitch. In any case, if I make Frobozz big enough, then the 64-bit compiler generates rep movsq instructions; so it probably knows what it's doing.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复