Optimized method for calculating cosine distance in Python

前端 未结 8 884
被撕碎了的回忆
被撕碎了的回忆 2021-02-14 21:27

I wrote a method to calculate the cosine distance between two arrays:

def cosine_distance(a, b):
    if len(a) != len(b):
        return False
    numerator = 0
         


        
相关标签:
8条回答
  • 2021-02-14 22:04
    def cd(a,b):
        if(len(a)!=len(b)):
            raise ValueError, "a and b must be the same length"
        rn = range(len(a))
        adb = sum([a[k]*b[k] for k in rn])
        nma = sqrt(sum([a[k]*a[k] for k in rn]))
        nmb = sqrt(sum([b[k]*b[k] for k in rn]))
    
        result = 1 - adb / (nma*nmb)
        return result
    
    0 讨论(0)
  • 2021-02-14 22:14

    (I originally thought) you're not going to speed it up a lot without breaking out to C (like numpy or scipy) or changing what you compute. But here's how I'd try that, anyway:

    from itertools import imap
    from math import sqrt
    from operator import mul
    
    def cosine_distance(a, b):
        assert len(a) == len(b)
        return 1 - (sum(imap(mul, a, b))
                    / sqrt(sum(imap(mul, a, a))
                           * sum(imap(mul, b, b))))
    

    It's roughly twice as fast in Python 2.6 with 500k-element arrays. (After changing map to imap, following Jarret Hardie.)

    Here's a tweaked version of the original poster's revised code:

    from itertools import izip
    
    def cosine_distance(a, b):
        assert len(a) == len(b)
        ab_sum, a_sum, b_sum = 0, 0, 0
        for ai, bi in izip(a, b):
            ab_sum += ai * bi
            a_sum += ai * ai
            b_sum += bi * bi
        return 1 - ab_sum / sqrt(a_sum * b_sum)
    

    It's ugly, but it does come out faster. . .

    Edit: And try Psyco! It speeds up the final version by another factor of 4. How could I forget?

    0 讨论(0)
提交回复
热议问题