Empty loop is slower than a non-empty one in C

后端未结

关注

 4  1378

Happy的楠姐 2020-12-23 19:00

While trying to know how long a line of C code used to execute, I noticed this weird thing :

int main (char argc, char * argv[]) {
    time_t begin, end;


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   有刺的猬
                                             
                
                
                (楼主)
            
              
              
                2020-12-23 19:43
              

            
            
                        
This answer assumes that you've already understood and addressed the excellent points regarding undefined behavior sharth makes in his answer.  He also points out tricks that the compiler may play on your code.  You should take steps to make sure the compiler doesn't recognize the entire loop as useless.  For example, changing the iterator declaration to volatile uint64_t i; will prevent removal of the loop, and volatile int A; will ensure that the second loop actually does more work than the first.  But even if you do all of that, you may still discover that:

Code later in a program may well execute more quickly than earlier code.

The clock() library function could have caused an icache miss after reading the timer, and before returning.  This would cause some extra time in the first measured interval.  (For later calls, the code is already in cache).  However this effect will be tiny, certainly too small for clock() to measure, even if it was a page fault all the way to disk.  Random context switches can add to either time interval.

More importantly, you have an i5 CPU, which has dynamic clocking.  When your program begins execution, the clock rate is mostly likely low, because the CPU has been idle.  Just running the program makes the CPU no longer idle, so after a short delay the clock speed will increase.  The ratio between idle and TurboBoosted CPU clock frequency can be significant.
(On my ultrabook's Haswell i5-4200U, the former multiplier is 8, and the latter is 26, making startup code run less than 30% as rapidly as later code!  "Calibrated" loops for implementing delays are a terrible idea on modern computers!)

Including a warmup phase (running a benchmark repeatedly, and throwing the first result away) for more precise timing is not only for managed frameworks with JIT compilers!
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复