How to get the K most distant points, given their coordinates?

后端 未结 5 2212
无人及你
无人及你 2021-02-20 12:41

We have boring CSV with 10000 rows of ages (float), titles (enum/int), scores (float), ....

  • We have N columns each with int/float values in a table.
5条回答
  •  盖世英雄少女心
    2021-02-20 13:31

    Assuming that if you read your csv file with N (10000) rows and D dimension (or features) into a N*D martix X. You can calculate the distance between each point and store it in a distance matrix as follows:

    import numpy as np
    X = np.asarray(X) ### convert to numpy array
    distance_matrix = np.zeros((X.shape[0],X.shape[0]))
    for i in range(X.shape[0]):
        for j in range(i+1,X.shape[0]): 
        ## We compute triangle matrix and copy the rest. Distance from point A to point B and distance from point B to point A are the same. 
            distance_matrix[i][j]= np.linalg.norm(X[i]-X[j]) ## Here I am calculating Eucledian distance. Other distance measures can also be used.
    
            #distance_matrix = distance_matrix + distance_matrix.T - np.diag(np.diag(distance_matrix)) ## This syntax can be used to get the lower triangle of distance matrix, which is not really required in your case.
            K = 5 ## Number of points that you want to pick
    
            indexes = np.unravel_index(np.argsort(distance_matrix.ravel())[-1*K:], distance_matrix.shape)
    
            print(indexes)
    

提交回复
热议问题