可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Let's say I'm examining up to 10 clusters, with scipy I usually generate the 'elbow' plot as follows:
from scipy import cluster cluster_array = [cluster.vq.kmeans(my_matrix, i) for i in range(1,10)] pyplot.plot([var for (cent,var) in cluster_array]) pyplot.show()
I have since became motivated to use sklearn for clustering, however I'm not sure how to create the array needed to plot as in the scipy case. My best guess was:
from sklearn.cluster import KMeans km = [KMeans(n_clusters=i) for i range(1,10)] cluster_array = [km[i].fit(my_matrix)]
That unfortunately resulted in an invalid command error. What is the best way sklearn way to go about this?
Thank you
回答1:
You had some syntax problems in the code. They should be fixed now:
Ks = range(1, 10) km = [KMeans(n_clusters=i) for i in Ks] score = [km[i].fit(my_matrix).score(my_matrix) for i in range(len(km))]
The fit
method just returns a self
object. In this line in the original code
cluster_array = [km[i].fit(my_matrix)]
the cluster_array
would end up having the same contents as km
.
You can use the score
method to get the estimate for how well the clustering fits. To see the score for each cluster simply run plot(Ks, score)
.
回答2:
you can use the inertia attribute of Kmeans class.
Assuming X is your dataset:
from sklearn.cluster import KMeans from matplotlib import pyplot as plt X = # <your_data> distorsions = [] for k in range(2, 20): kmeans = KMeans(n_clusters=k) kmeans.fit(X) distorsions.append(kmeans.inertia_) fig = plt.figure(figsize=(15, 5)) plt.plot(range(2, 20), distorsions) plt.grid(True) plt.title('Elbow curve')