Optimized method for calculating cosine distance in Python

前端 未结 8 909
被撕碎了的回忆
被撕碎了的回忆 2021-02-14 21:27

I wrote a method to calculate the cosine distance between two arrays:

def cosine_distance(a, b):
    if len(a) != len(b):
        return False
    numerator = 0
         


        
8条回答
  •  广开言路
    2021-02-14 22:14

    (I originally thought) you're not going to speed it up a lot without breaking out to C (like numpy or scipy) or changing what you compute. But here's how I'd try that, anyway:

    from itertools import imap
    from math import sqrt
    from operator import mul
    
    def cosine_distance(a, b):
        assert len(a) == len(b)
        return 1 - (sum(imap(mul, a, b))
                    / sqrt(sum(imap(mul, a, a))
                           * sum(imap(mul, b, b))))
    

    It's roughly twice as fast in Python 2.6 with 500k-element arrays. (After changing map to imap, following Jarret Hardie.)

    Here's a tweaked version of the original poster's revised code:

    from itertools import izip
    
    def cosine_distance(a, b):
        assert len(a) == len(b)
        ab_sum, a_sum, b_sum = 0, 0, 0
        for ai, bi in izip(a, b):
            ab_sum += ai * bi
            a_sum += ai * ai
            b_sum += bi * bi
        return 1 - ab_sum / sqrt(a_sum * b_sum)
    

    It's ugly, but it does come out faster. . .

    Edit: And try Psyco! It speeds up the final version by another factor of 4. How could I forget?

提交回复
热议问题