I want to cluster 2d points (latitude/longitude) on a map. The number of points is 400K so the input matrix would be 400k x 2.
When I run scikit-learn's Agglomerative Clustering I run out of memory and my memory is about 500GB.
class sklearn.cluster.AgglomerativeClustering(n_clusters=2, affinity='euclidean', memory=Memory(cachedir=None), connectivity=None, n_components=None, compute_full_tree='auto', linkage='ward', pooling_func=<function mean at 0x2b8085912398>)[source]
I also tried the memory=Memory(cachedir) option with no success. Does anybody have a suggestion (another library or change in the scikit code) so that I can run the clustering algorithm on the data?
I have run the algorithm successfully on small datasets.
来源:https://stackoverflow.com/questions/32293459/memory-efficient-agglomerative-clustering-with-linkage-in-python