sklearn agglomerative clustering: dynamically updating the number of clusters

后端未结

关注

 2  409

Happy的楠姐 2021-01-15 10:57

The documentation for sklearn.cluster.AgglomerativeClustering mentions that,

when varying the number of clusters and using caching, it may be advant

2条回答

广开言路 (楼主)

2021-01-15 11:31

You set a cacheing directory with the paramater memory = 'mycachedir' and then if you set compute_full_tree=True, when you rerun fit with different values of n_clusters, it will used the cached tree rather than recomputing each time. To give you an example of how to do this with sklearn's gridsearch API:

from sklearn.cluster import AgglomerativeClustering
from sklearn.grid_search import GridSearchCV

ac = AgglomerativeClustering(memory='mycachedir', 
                             compute_full_tree=True)
classifier = GridSearchCV(ac, 
                          {n_clusters: range(2,6)}, 
                          scoring = 'adjusted_rand_score', 
                          n_jobs=-1, verbose=2)
classifier.fit(X,y)

0 讨论(0)

查看其它2个回答