scipy.spatial.ckdtree running slowly

后端 未结 2 843
遥遥无期
遥遥无期 2021-01-21 02:54

I\'ve been using spatial.cKDTree in scipy to calculate distances between points. It has always run very quickly (~1 s) for my typical data sets (findin

相关标签:
2条回答
  • 2021-01-21 03:13

    In the next release of SciPy, balanced kd-trees will be created with introselect instead of quickselect, which is much faster on structured datasets. If you use cKDTree on a structured data set such as an image or a grid, you can look forward to a major boost in performance. It is already available if you build SciPy from its master branch on GitHub.

    0 讨论(0)
  • 2021-01-21 03:33

    In 0.16.x I added options to build the cKDTree with median or sliding midpoint rules, as well as choosing whether to recompute the bounding hyperrectangle at each node in the kd-tree. The defaults are based on experiences about the performance of scipy.spatial.cKDTree and sklearn.neighbors.KDTree. In some contrived cases (data that are highly streched along a dimension) it can have negative impact, but usually it should be faster. Experiment with bulding the cKDTree with balanced_tree=False and/or compact_nodes=False. Setting both to False gives you the same behavior as 0.15.x. Unfortunately it is difficult to set defaults that make everyone happy because the performance depends on the data.

    Also note that with balanced_tree=True we compute medians by quickselect when the kd-tree is constructed. If the data for some reason is pre-sorted, it will be very slow. In this case it will help to shuffle the rows of the input data. Or you can set balanced_tree=False to avoid the partial quicksorts.

    There is also a new option to multithread the nearest-neighbor query. Try to call query with n_jobs=-1 and see if it helps for you.

    Update June 2020: SciPy 1.5.0 will use a new algorithm (introselect based partial sort, from C++ STL) which solves the problems reported here.

    0 讨论(0)
提交回复
热议问题