Comparison search time between K-D tree and Brute-force

问题

This is a graph of the execution speed according to the dimension of the k - d tree and brute-force that I wrote. The number of pointer sets was fixed at 1 M (1,000,000), and Query measured the speed performed 1000 times. The increase in the k - d tree is huge, But brute-force is not. I wonder why these results have come out and how they can be improved.

回答1:

Some ideas:

The performance may depend a lot on the characteristics of the data. For example, are the data points evenly distributed, clustered or otherwise arranged?
Also, what is the kind of query you are performing? One explanation would be that you are using a window-query that returns the whole point set, or large parts of it. In that case, brute force will always be faster.
Is there maybe a flaw in the KD-Tree implementation?

Generally it is known that kD-Trees don't scale very well with high dimensionality. So, for example in machine learning, dimensionality is often reduced to be around 10 to 20. However, unless you do the brute force on a GPU, KD-Tree should be faster.

If you are looking for structures that scale better with high dimensions (insertion / window-query), have a look at R*Trees or the PH-Tree (the latter is self-advertisement and currently limited to 60 dimensions, but a high-dim version will be released this week). For k-nearest neighbor search, have a look at CoverTrees or BallTrees. If you are using Java, you can have a look at implementations in my repo. I also implemented an R*Tree here.

来源：https://stackoverflow.com/questions/50551877/comparison-search-time-between-k-d-tree-and-brute-force

标签

algorithm

brute-force

kdtree