Two very similar functions involving sin() exhibit vastly different performance — why?

后端未结

关注

 2  754

Consider the following two programs that perform the same computations in two different ways:

// v1.c
#include 
#include 
int main(v


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2021-02-14 03:27
              
            
            
                                                                       
Ignore the loop structure all together, and only think about the sequence of calls to sin.  v1 does the following:

x <-- sin(x)
x <-- sin(x)
x <-- sin(x)
...


that is, each computation of sin( ) cannot begin until the result of the previous call is available; it must wait for the entirety of the previous computation.  This means that for N calls to sin, the total time required is 819200000 times the latency of a single sin evaluation.

In v2, by contrast, you do the following:

x[0] <-- sin(x[0])
x[1] <-- sin(x[1])
x[2] <-- sin(x[2])
...


notice that each call to sin does not depend on the previous call.  Effectively, the calls to sin are all independent, and the processor can begin on each as soon as the necessary register and ALU resources are available (without waiting for the previous computation to be completed).  Thus, the time required is a function of the throughput of the sin function, not the latency, and so v2 can finish in significantly less time.



I should also note that DeadMG is right that v1 and v2 are formally equivalent, and in a perfect world the compiler would optimize both of them into a single chain of 100000 sin evaluations (or simply evaluate the result at compile time).  Sadly, we live in an imperfect world.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  庸人自扰        
                
              
                            
                2021-02-14 03:36
              
            
            
                                                                       
In the first example, it runs 100000 loops of sin, 8192 times. 

In the second example, it runs 8192 loops of sin, 100000 times.

Other than that and storing the result differently, I don't see any difference. 

However, what does make a difference is that the input is being changed for each loop in the second case. So I suspect what happens is that the sin value, at certain times in the loop, gets much easier to calculate. And that can make a big difference. Calculating sin is not entirely trivial, and it's a series calculation that loops until the exit condition is hit. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复