Speed up nested for loop with elements exponentiation

后端未结

关注

 1  1016

遇见更好的自我 2021-01-12 16:23

I\'m working on a large code and I find myself in the need to speed up a specific bit of it. I\'ve created a MWE shown below:

import numpy as np


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   说谎
                                             
                
                
                (楼主)
            
              
              
                2021-01-12 17:05
              

            
            
                        
You want to gradually convert this over from using lists and loops to using arrays and broadcasting, grabbing the easiest and/or most time-critical parts first until it's fast enough.

The first step is to not do that zip(*list2) over and over (especially if this is Python 2.x). While we're at it, we might as well store it in an array, and do the same with list1—you can still iterate over them for now. So:

array1 = np.array(list1)
array2 = np.array(zip(*list2))
# …
for elem in array1:
    # …
    for elem2 in array2:


This won't speed things up much—on my machine, it takes us from 14.1 seconds to 12.9—but it gives us somewhere to start working.

You should also remove the double calculation of sum(list3):

sum_list3 = sum(list3)
sum_list3 = sum_list3 if sum_list3>0. else 1e-06


Meanwhile, it's a bit odd that you want value <= 0 to go to 1e-6, but 0 < value < 1e-6 to be left alone. Is that really intentional? If not, you can fix that, and simplify the code at the same time, by just doing this:

sum_list3 = max(array3.sum(), 1e-06)


Now, let's broadcast the A and B calculations:

# Broadcast over elements in list2.
A = np.exp(-0.5*((elem[0]-array2[:,0])/elem[3])**2)
B = np.exp(-0.5*((elem[1]-array2[:, 1])/elem[3])**2)
array3 = A*B

# Sum elements in list3 and append result to list4.
sum_list3 = max(array3.sum(), 1e-06)

list4.append(sum_list3)


And this gets us down from 12.9 seconds to 0.12. You could go a step further by also broadcasting over array1, and replacing list4 with a pre-allocated array, and so forth, but this is probably already fast enough.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复