I wrote a method to calculate the cosine distance between two arrays:
def cosine_distance(a, b):
if len(a) != len(b):
return False
numerator = 0
Your updated solution still has two square roots. You can reduce this to one by replacing the sqrt line with:
result = 1 - numerator / (sqrt(denoma*denomb))
A multiply is typically quite a bit quicker than a sqrt. It might not seem much as it is only called once in the function, but it sounds like you are calculating a lot of cosine distances, so the improvement will add up.
Your code looks like it should be ripe for vector optimizations. So if cross-platofrm support is not an issue and you want to speed it even further, you could code the cosine distance code in C and make sure your compiler is aggressively vectorizing the resulting code (even Pentium II is capable of some floating point vectorisation)