sklearn agglomerative clustering: dynamically updating the number of clusters

爷,独闯天下 提交于 2019-12-30 11:28:18

问题


The documentation for sklearn.cluster.AgglomerativeClustering mentions that,

when varying the number of clusters and using caching, it may be advantageous to compute the full tree.

This seems to imply that it is possible to first compute the full tree, and then quickly update the number of desired clusters as necessary, without recomputing the tree (with caching).

However this procedure for changing the number of clusters does not seem to be documented. I would like to do this but am unsure how to proceed.

Update: To clarify, the fit method does not take number of clusters as an input: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit


回答1:


You set a cacheing directory with the paramater memory = 'mycachedir' and then if you set compute_full_tree=True, when you rerun fit with different values of n_clusters, it will used the cached tree rather than recomputing each time. To give you an example of how to do this with sklearn's gridsearch API:

from sklearn.cluster import AgglomerativeClustering
from sklearn.grid_search import GridSearchCV

ac = AgglomerativeClustering(memory='mycachedir', 
                             compute_full_tree=True)
classifier = GridSearchCV(ac, 
                          {n_clusters: range(2,6)}, 
                          scoring = 'adjusted_rand_score', 
                          n_jobs=-1, verbose=2)
classifier.fit(X,y)


来源:https://stackoverflow.com/questions/36490241/sklearn-agglomerative-clustering-dynamically-updating-the-number-of-clusters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!