distance matrix of curves in python

↘锁芯ラ 提交于 2019-11-28 21:40:15
jojo

Your question might also be related to this one

This is kind of a hard problem. A possible way would be to implement the euclidian distance on your own, completely abandon scipy and make use of pypy's JIT compiler. But most likely this will not make you gane much.

Personally, I would recommend you to write the routine in C.

The problem is less the implementation but the way you approach this problem. You chose a brute force approach by calculating the euclidian distance for each distinct pair of points in each possible pair of the metric space subsets. This is computationally demanding:

  • Assume you have 500 curves and each of them has 75 points. With the brute force approach you end up calculating the euclidean distance 500 * 499 * 75 * 75 = 1 403 437 500 times. It is not further surprising that this approach takes forever to run.

I'm not an expert with this but I know that the Hausdorff distance is extensively used in image processing. I would suggest you to browse the literature for speed optimized algorithms. A starting point might be this, or this paper. Also, often mentioned in combination with the Hausdorff distance is the Voroni diagram.

I hope these links might help you with this problem.

Akos Gulyban

I recently replied on a similar quiestion here: Hausdorff distance between 3D grids

I hope this helps, I faced with 25 x 25.000 points in a pairwise comparison (25 x 25 x 25.000 points in total), and my code runs from 1 min up to 3-4 hours (depending on the number of points). I don't see much options mathemtically for gaining speed.

Alternatives can be to use different programming languages (C / C++) or bringing this calculation to GPU (CUDA). I am playing with the CUDA approach right now.

Edit on 03/12/2015:

I was able to speed up this comparison by doing parallel CPU based computation. That was the quickest way to go. I used the nice example of the pp package (parallel python) and I run on three different computer and phython combination. Unfortunately I had memory errors all the time with the python 2.7 32-bit, so I installed WinPython 2.7 64-bit and some experimental numpy 64-bit packages.

So to me this effor was quite helpful an it was not as complicated to me as the CUDA.... Good luck

There are several methods you can try:

  1. Using numpy-MKL which makes use of Intel's high performance Math Kernel Library instead of numpy;
  2. Using Bootleneck for array functions;
  3. Using Cpython for computation.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!