How to find the closest 2 points in a 100 dimensional space with 500,000 points?

前端 未结 5 1982
无人及你
无人及你 2021-01-31 11:26

I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it?

Update: Space is Euclidean, Sorry. And thanks

5条回答
  •  温柔的废话
    2021-01-31 11:57

    Run PCA on your data to convert vectors from 100 dimensions to say 20 dimensions. Then create a K-Nearest Neighbor tree (KD-Tree) and get the closest 2 neighbors based on euclidean distance.

    Generally if no. of dimensions are very large then you have to either do a brute force approach (parallel + distributed/map reduce) or a clustering based approach.

提交回复
热议问题