How does sklearn.cluster.KMeans handle an init ndarray parameter with missing centroids (available centroids less than n_clusters)?

蹲街弑〆低调 提交于 2021-01-28 12:12:49

问题


In Python sklearn KMeans (see documentation), I was wondering what happens internally when passing an ndarray of shape (n, n_features) to the init parameter, When n<n_clusters

  1. Does it drop the given centroids and just starts a kmeans++ initialization which is the default choice for the init parameter ? (PDF paper kmeans++) (How does Kmeans++ work)
  2. Does it consider the given centroids and fill accordingly the remaining centroids using kmeans++ ?
  3. Does it consider the given centroids and fill the remaining centroids using random values ?

I didn't expect that this method returns no warning in this case. That's why I need to know how it manages this.


回答1:


If you give it a mismatching init it will adjust the number of clusters, as you can see from the source. This is not documented and I would consider it a bug. I'll propose to fix it.



来源:https://stackoverflow.com/questions/30169378/how-does-sklearn-cluster-kmeans-handle-an-init-ndarray-parameter-with-missing-ce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!