Optimized method for calculating cosine distance in Python

前端未结

关注

 8  909

被撕碎了的回忆 2021-02-14 21:27

I wrote a method to calculate the cosine distance between two arrays:

def cosine_distance(a, b):
    if len(a) != len(b):
        return False
    numerator = 0


      
      
        
          8条回答        

        
                    
            
            
                         
                
              
              
                
                   广开言路
                                             
                
                
                (楼主)
            
              
              
                2021-02-14 22:14
              

            
            
                        
(I originally thought) you're not going to speed it up a lot without breaking out to C (like numpy or scipy) or changing what you compute. But here's how I'd try that, anyway:

from itertools import imap
from math import sqrt
from operator import mul

def cosine_distance(a, b):
    assert len(a) == len(b)
    return 1 - (sum(imap(mul, a, b))
                / sqrt(sum(imap(mul, a, a))
                       * sum(imap(mul, b, b))))


It's roughly twice as fast in Python 2.6 with 500k-element arrays. (After changing map to imap, following Jarret Hardie.)

Here's a tweaked version of the original poster's revised code:

from itertools import izip

def cosine_distance(a, b):
    assert len(a) == len(b)
    ab_sum, a_sum, b_sum = 0, 0, 0
    for ai, bi in izip(a, b):
        ab_sum += ai * bi
        a_sum += ai * ai
        b_sum += bi * bi
    return 1 - ab_sum / sqrt(a_sum * b_sum)


It's ugly, but it does come out faster. . .

Edit: And try Psyco! It speeds up the final version by another factor of 4. How could I forget?
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它8个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复