I came to know that numpy is slow for individual element accesses for a very big matrix. The following part of the code takes about 7-8 minutes to run. Size of the Matrix is abo
Can you post the Distance()
function? If it's common function, scipy.spatial.distance.cdist
can calculate the distance matrix very quickly:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html#scipy.spatial.distance.cdist
Edit:
You can use pdist
indeed, here is an example:
from scipy.spatial.distance import pdist, squareform
coordinates = [(0.0, 0), (1.0, 2.0), (-1.0, 0.5), (3.1, 2.1)]
dist = squareform(pdist(coordinates))
print dist
output:
[[ 0. 2.23606798 1.11803399 3.74432905]
[ 2.23606798 0. 2.5 2.1023796 ]
[ 1.11803399 2.5 0. 4.40113622]
[ 3.74432905 2.1023796 4.40113622 0. ]]
If you want to mask some data:
dist[dist > 3.0] = 0
print dist
output:
[[ 0. 2.23606798 1.11803399 0. ]
[ 2.23606798 0. 2.5 2.1023796 ]
[ 1.11803399 2.5 0. 0. ]
[ 0. 2.1023796 0. 0. ]]