scipy.spatial.ckdtree running slowly

后端未结

关注

 2  844

遥遥无期 2021-01-21 02:54

I\'ve been using spatial.cKDTree in scipy to calculate distances between points. It has always run very quickly (~1 s) for my typical data sets (findin

2条回答

离开以前 (楼主)

2021-01-21 03:33

In 0.16.x I added options to build the cKDTree with median or sliding midpoint rules, as well as choosing whether to recompute the bounding hyperrectangle at each node in the kd-tree. The defaults are based on experiences about the performance of scipy.spatial.cKDTree and sklearn.neighbors.KDTree. In some contrived cases (data that are highly streched along a dimension) it can have negative impact, but usually it should be faster. Experiment with bulding the cKDTree with balanced_tree=False and/or compact_nodes=False. Setting both to False gives you the same behavior as 0.15.x. Unfortunately it is difficult to set defaults that make everyone happy because the performance depends on the data.

Also note that with balanced_tree=True we compute medians by quickselect when the kd-tree is constructed. If the data for some reason is pre-sorted, it will be very slow. In this case it will help to shuffle the rows of the input data. Or you can set balanced_tree=False to avoid the partial quicksorts.

There is also a new option to multithread the nearest-neighbor query. Try to call query with n_jobs=-1 and see if it helps for you.

Update June 2020: SciPy 1.5.0 will use a new algorithm (introselect based partial sort, from C++ STL) which solves the problems reported here.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...