Sklearn : Mean Distance from Centroid of each cluster

前端 未结 4 1306
难免孤独
难免孤独 2021-01-04 22:52

How can i find the mean distance from the centroid to all the data points in each cluster. I am able to find the euclidean distance of each point (in my dataset) from the ce

相关标签:
4条回答
  • 2021-01-04 23:45

    alphaleonis gave nice answer. For the general case of n dimentions here is some a changes needed for his answer:

    def k_mean_distance(data, cantroid_matrix, i_centroid, cluster_labels):
        # Calculate Euclidean distance for each data point assigned to centroid
        distances = [np.linalg.norm(x-cantroid_matrix) for x in data[cluster_labels == i_centroid]]
        # return the mean value
        return np.mean(distances)
    
    for i, cent_features in enumerate(centroids):
                mean_distance = k_mean_distance(emb_matrix, centroid_matrix, i, kmeans_clusters)
                c_mean_distances.append(mean_distance)
    
    0 讨论(0)
  • 2021-01-04 23:47

    Here's one way. You can substitute another distance measure in the function for k_mean_distance() if you want another distance metric other than Euclidean.

    Calculate distance between data points for each assigned cluster and cluster centers and return the mean value.

    Function for distance calculation:

    def k_mean_distance(data, cx, cy, i_centroid, cluster_labels):
        # Calculate Euclidean distance for each data point assigned to centroid 
        distances = [np.sqrt((x-cx)**2+(y-cy)**2) for (x, y) in data[cluster_labels == i_centroid]]
        # return the mean value
        return np.mean(distances)
    

    And for each centroid, use the function to get the mean distance:

    total_distance = []
    for i, (cx, cy) in enumerate(centroids):
        # Function from above
        mean_distance = k_mean_distance(data, cx, cy, i, cluster_labels)
        total_dist.append(mean_distance)
    

    So, in the context of your question:

    def k_mean_distance(data, cx, cy, i_centroid, cluster_labels):
            distances = [np.sqrt((x-cx)**2+(y-cy)**2) for (x, y) in data[cluster_labels == i_centroid]]
            return np.mean(distances)
    
    t_data=PCA(n_components=2).fit_transform(array_convt)
    k_means=KMeans()
    clusters=k_means.fit_predict(t_data)
    centroids = km.cluster_centers_
    
    c_mean_distances = []
    for i, (cx, cy) in enumerate(centroids):
        mean_distance = k_mean_distance(t_data, cx, cy, i, clusters)
        c_mean_distances.append(mean_distance)
    

    If you plot the results plt.plot(c_mean_distances) you should see something like this:

    0 讨论(0)
  • 2021-01-04 23:49

    You can use following Attribute of KMeans:

    cluster_centers_ : array, [n_clusters, n_features]

    For every point, test to what cluster it belongs using predict(X) and after that calculate distance to cluster predict returns(it returns index).

    0 讨论(0)
  • 2021-01-04 23:50

    Compute all the distance into a numpy array.

    Then use nparray.mean() to get the mean.

    0 讨论(0)
提交回复
热议问题