Is it possible to run a clustering algorithm with chunked distance matrices?

问题

I have a distance/dissimilarity matrix (30K rows 30K columns) that is calculated in a loop and stored in ROM.

I would like to do clustering over the matrix. I import and cluster it as below:

Mydata<-read.csv("Mydata.csv")
Mydata<-as.dist(Mydata)
Results<-hclust(Mydata)

But when I convert the matrix to dist object, I get RAM limitation error. How can I handle it? Can I run hclust algorithm in a loop/chunking? I mean I divide the distance matrix into chunks and run them in a loop?

回答1:

You may try the following:

Mydata<-read.csv("Mydata.csv")
Mydata<-as.matrix(Mydata)
Mydata<-as.dist(Mydata)
Results<-hclust(Mydata)

Read the following to track what's happening in your session: http://adv-r.had.co.nz/memory.html

This might be helpful in general: https://cran.r-project.org/web/packages/fastcluster/ And also this question: hclust() in R on large datasets

It also depends on your OS, but maybe you can change the RAM limit (or just run this code on someone else's computer with more RAM, store the object using saveRDS and then read it in your own computer using readRDS).

来源：https://stackoverflow.com/questions/53032431/is-it-possible-to-run-a-clustering-algorithm-with-chunked-distance-matrices

标签

cluster-analysis

hclust

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!