Why does changing 0.1f to 0 slow down performance by 10x?

前端未结

关注

 5  962

我在风中等你 2020-11-22 04:30

Why does this bit of code,

const float x[16] = {  1.1,   1.2,   1.3,     1.4,   1.5,   1.6,   1.7,   1.8,
                       1.9,   2.0,   2.1,     2.2,


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   旧时难觅i
                                             
                
                
                (楼主)
            
              
              
                2020-11-22 05:03
              

            
            
                        
Dan Neely's comment ought to be expanded into an answer:
It is not the zero constant 0.0f that is denormalized or causes a slow down, it is the values that approach zero each iteration of the loop.  As they come closer and closer to zero, they need more precision to represent and they become denormalized.  These are the y[i] values.  (They approach zero because x[i]/z[i] is less than 1.0 for all i.)
The crucial difference between the slow and fast versions of the code is the statement y[i] = y[i] + 0.1f;.  As soon as this line is executed each iteration of the loop, the extra precision in the float is lost, and the denormalization needed to represent that precision is no longer needed.  Afterwards, floating point operations on y[i] remain fast because they aren't denormalized.
Why is the extra precision lost when you add 0.1f?  Because floating point numbers only have so many significant digits.  Say you have enough storage for three significant digits, then 0.00001 = 1e-5, and 0.00001 + 0.1 = 0.1, at least for this example float format, because it doesn't have room to store the least significant bit in 0.10001.
In short, y[i]=y[i]+0.1f;  y[i]=y[i]-0.1f; isn't the no-op you might think it is.
Mystical said this as well: the content of the floats matters, not just the assembly code.
EDIT: To put a finer point on this, not every floating point operation takes the same amount of time to run, even if the machine opcode is the same.  For some operands/inputs, the same instruction will take more time to run.  This is especially true for denormal numbers.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复