C++: Structs slower to access than basic variables?

后端未结

关注

 9  2439

I found some code that had \"optimization\" like this:

void somefunc(SomeStruct param){
    float x = param.x; // param.x and x are both floats. supposedly this


                      
              相关标签:


      
      
        
          9条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2021-02-13 04:35
              
            
            
                                                                       
There are good and valid reasons to do that kind of optimization when pointers are used, because consuming all inputs first frees the compiler from possible aliasing issues which prevent it from producing optimal code (there's restrict nowadays too, though).

For non-pointer types, there is in theory an overhead because every member is accessed via the struct's this pointer. This may in theory be noticeable within an inner loop and will in theory be a diminuitive overhead otherwise.
In practice, however, a modern compiler will almost always (unless there is a complex inheritance hierarchy) produce the exact same binary code.

I had asked myself the exact same question as you did about two years ago and did a very extensive test case using gcc 4.4. My findings were that unless you really try to throw sticks between the compiler's legs on purpose, there is absolutely no difference in the generated code.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2021-02-13 04:37
              
            
            
                                                                       
The real answer is given by Piotr. This one is just for fun.

I have tested it. This code:

float somefunc(SomeStruct param, float &sum){
    float x = param.x;
    float y = param.y;
    float z = param.z;
    float xyz = x * y * z;
    sum = x + y + z;
    return xyz;
}


And this code:

float somefunc(SomeStruct param, float &sum){
    float xyz = param.x * param.y * param.z;
    sum = param.x + param.y + param.z;
    return xyz;
}


Generate identical assembly code when compiled with g++ -O2. They do generate different code with optimization turned off, though. Here is the difference:

<   movl    -32(%rbp), %eax
<   movl    %eax, -4(%rbp)
<   movl    -28(%rbp), %eax
<   movl    %eax, -8(%rbp)
<   movl    -24(%rbp), %eax
<   movl    %eax, -12(%rbp)
<   movss   -4(%rbp), %xmm0
<   mulss   -8(%rbp), %xmm0
<   mulss   -12(%rbp), %xmm0
<   movss   %xmm0, -16(%rbp)
<   movss   -4(%rbp), %xmm0
<   addss   -8(%rbp), %xmm0
<   addss   -12(%rbp), %xmm0
---
>   movss   -32(%rbp), %xmm1
>   movss   -28(%rbp), %xmm0
>   mulss   %xmm1, %xmm0
>   movss   -24(%rbp), %xmm1
>   mulss   %xmm1, %xmm0
>   movss   %xmm0, -4(%rbp)
>   movss   -32(%rbp), %xmm1
>   movss   -28(%rbp), %xmm0
>   addss   %xmm1, %xmm0
>   movss   -24(%rbp), %xmm1
>   addss   %xmm1, %xmm0


The lines marked < correspond to the version with "optimization" variables. It seems to me that the "optimized" version is even slower than the one with no extra variables. This is to be expected, though, as x, y and z are allocated on the stack, exactly like the param. What's the point of allocating more stack variables to duplicate existing ones?

If the one who did that "optimization" knew the language better, he would probably have declared those variables as register, but even that leaves the "optimized" version slightly slower and longer, at least on G++/x86-64.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我在风中等你        
                
              
                            
                2021-02-13 04:38
              
            
            
                                                                       
I'm no compiler guru, so take this with a grain of salt.  I'm guessing that the original author of the code is assuming that by copying the values from the struct into local variables, the compiler has "placed" those variables into floating point registers which are available on some platforms (e.g., x86).  If there aren't enough registers to go around, they'd be put in the stack.

That being said, unless this code was in the middle of an intensive computation/loop, I'd strive for clarity rather than speed.  It's pretty rare that anyone is going to notice a few instructions difference in timing.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复