sklearn agglomerative clustering: dynamically updating the number of clusters

后端 未结 2 409
Happy的楠姐
Happy的楠姐 2021-01-15 10:57

The documentation for sklearn.cluster.AgglomerativeClustering mentions that,

when varying the number of clusters and using caching, it may be advant

2条回答
  •  广开言路
    2021-01-15 11:31

    You set a cacheing directory with the paramater memory = 'mycachedir' and then if you set compute_full_tree=True, when you rerun fit with different values of n_clusters, it will used the cached tree rather than recomputing each time. To give you an example of how to do this with sklearn's gridsearch API:

    from sklearn.cluster import AgglomerativeClustering
    from sklearn.grid_search import GridSearchCV
    
    ac = AgglomerativeClustering(memory='mycachedir', 
                                 compute_full_tree=True)
    classifier = GridSearchCV(ac, 
                              {n_clusters: range(2,6)}, 
                              scoring = 'adjusted_rand_score', 
                              n_jobs=-1, verbose=2)
    classifier.fit(X,y)
    

提交回复
热议问题