Sklearn kmeans equivalent of elbow method

匿名 (未验证) 提交于 2019-12-03 08:54:24

问题:

Let's say I'm examining up to 10 clusters, with scipy I usually generate the 'elbow' plot as follows:

from scipy import cluster cluster_array = [cluster.vq.kmeans(my_matrix, i) for i in range(1,10)]  pyplot.plot([var for (cent,var) in cluster_array]) pyplot.show() 

I have since became motivated to use sklearn for clustering, however I'm not sure how to create the array needed to plot as in the scipy case. My best guess was:

from sklearn.cluster import KMeans  km = [KMeans(n_clusters=i) for i range(1,10)] cluster_array = [km[i].fit(my_matrix)] 

That unfortunately resulted in an invalid command error. What is the best way sklearn way to go about this?

Thank you

回答1:

You had some syntax problems in the code. They should be fixed now:

Ks = range(1, 10) km = [KMeans(n_clusters=i) for i in Ks] score = [km[i].fit(my_matrix).score(my_matrix) for i in range(len(km))] 

The fit method just returns a self object. In this line in the original code

cluster_array = [km[i].fit(my_matrix)] 

the cluster_array would end up having the same contents as km.

You can use the score method to get the estimate for how well the clustering fits. To see the score for each cluster simply run plot(Ks, score).



回答2:

you can use the inertia attribute of Kmeans class.

Assuming X is your dataset:

from sklearn.cluster import KMeans from matplotlib import pyplot as plt  X = # <your_data> distorsions = [] for k in range(2, 20):     kmeans = KMeans(n_clusters=k)     kmeans.fit(X)     distorsions.append(kmeans.inertia_)  fig = plt.figure(figsize=(15, 5)) plt.plot(range(2, 20), distorsions) plt.grid(True) plt.title('Elbow curve') 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!