Sklearn : Mean Distance from Centroid of each cluster

前端未结

关注

 4  1306

How can i find the mean distance from the centroid to all the data points in each cluster. I am able to find the euclidean distance of each point (in my dataset) from the ce

相关标签:

4条回答

甜味超标

2021-01-04 23:45

alphaleonis gave nice answer. For the general case of n dimentions here is some a changes needed for his answer:

def k_mean_distance(data, cantroid_matrix, i_centroid, cluster_labels):
    # Calculate Euclidean distance for each data point assigned to centroid
    distances = [np.linalg.norm(x-cantroid_matrix) for x in data[cluster_labels == i_centroid]]
    # return the mean value
    return np.mean(distances)

for i, cent_features in enumerate(centroids):
            mean_distance = k_mean_distance(emb_matrix, centroid_matrix, i, kmeans_clusters)
            c_mean_distances.append(mean_distance)

0 讨论(0)

一向

2021-01-04 23:47

Here's one way. You can substitute another distance measure in the function for k_mean_distance() if you want another distance metric other than Euclidean.

Calculate distance between data points for each assigned cluster and cluster centers and return the mean value.

Function for distance calculation:

def k_mean_distance(data, cx, cy, i_centroid, cluster_labels):
    # Calculate Euclidean distance for each data point assigned to centroid 
    distances = [np.sqrt((x-cx)**2+(y-cy)**2) for (x, y) in data[cluster_labels == i_centroid]]
    # return the mean value
    return np.mean(distances)

And for each centroid, use the function to get the mean distance:

total_distance = []
for i, (cx, cy) in enumerate(centroids):
    # Function from above
    mean_distance = k_mean_distance(data, cx, cy, i, cluster_labels)
    total_dist.append(mean_distance)

So, in the context of your question:

def k_mean_distance(data, cx, cy, i_centroid, cluster_labels):
        distances = [np.sqrt((x-cx)**2+(y-cy)**2) for (x, y) in data[cluster_labels == i_centroid]]
        return np.mean(distances)

t_data=PCA(n_components=2).fit_transform(array_convt)
k_means=KMeans()
clusters=k_means.fit_predict(t_data)
centroids = km.cluster_centers_

c_mean_distances = []
for i, (cx, cy) in enumerate(centroids):
    mean_distance = k_mean_distance(t_data, cx, cy, i, clusters)
    c_mean_distances.append(mean_distance)

If you plot the results plt.plot(c_mean_distances) you should see something like this:

0 讨论(0)

南方客

2021-01-04 23:49

You can use following Attribute of KMeans:

cluster_centers_ : array, [n_clusters, n_features]

For every point, test to what cluster it belongs using predict(X) and after that calculate distance to cluster predict returns(it returns index).

0 讨论(0)
发布评论:

提交评论
- 加载中...
不知归路

2021-01-04 23:50

Compute all the distance into a numpy array.

Then use nparray.mean() to get the mean.

0 讨论(0)
发布评论:

提交评论
- 加载中...