How to get the K most distant points, given their coordinates?

后端 未结 5 2242
无人及你
无人及你 2021-02-20 12:41

We have boring CSV with 10000 rows of ages (float), titles (enum/int), scores (float), ....

  • We have N columns each with int/float values in a table.
5条回答
  •  长发绾君心
    2021-02-20 13:16

    From past experience with a pretty similar problem, a simple solution of computing the mean Euclidean distance of all pairs within each group of K points and then taking the largest mean, works very well. As someone noted above, it's probably hard to avoid a loop on all combinations (not on all pairs). So a possible implementation of all this can be as follows:

    import itertools
    import numpy as np
    from scipy.spatial.distance import pdist
    
    Npoints = 3 # or 4 or 5...
    # making up some data:
    data = np.matrix([[3,2,4,3,4],[23,25,30,21,27],[6,7,8,7,9],[5,5,6,6,7],[0,1,2,0,2],[3,9,1,6,5],[0,0,12,2,7]])
    # finding row indices of all combinations:
    c = [list(x) for x in itertools.combinations(range(len(data)), Npoints )]
    
    distances = []
    for i in c:    
        distances.append(np.mean(pdist(data[i,:]))) # pdist: a method of computing all pairwise Euclidean distances in a condensed way.
    
    ind = distances.index(max(distances)) # finding the index of the max mean distance
    rows = c[ind] # these are the points in question
    

提交回复
热议问题