Memory Efficient Agglomerative Clustering with Linkage in Python

落爺英雄遲暮 提交于 2019-12-10 10:18:05

问题


I want to cluster 2d points (latitude/longitude) on a map. The number of points is 400K so the input matrix would be 400k x 2.

When I run scikit-learn's Agglomerative Clustering I run out of memory and my memory is about 500GB.

class sklearn.cluster.AgglomerativeClustering(n_clusters=2, affinity='euclidean', memory=Memory(cachedir=None), connectivity=None, n_components=None, compute_full_tree='auto', linkage='ward', pooling_func=<function mean at 0x2b8085912398>)[source]

I also tried the memory=Memory(cachedir) option with no success. Does anybody have a suggestion (another library or change in the scikit code) so that I can run the clustering algorithm on the data?

I have run the algorithm successfully on small datasets.

来源:https://stackoverflow.com/questions/32293459/memory-efficient-agglomerative-clustering-with-linkage-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!