Using R to cluster based on euclidean distance and a complete linkage metric, too many vectors?

这一生的挚爱 提交于 2019-12-02 04:02:32

You're running out of RAM. That's it. You can't allocate a vector that exceeds your memory space. Move to a computer with more memory or maybe, try use bigmemory (I've never tried it).

In case anybody was wondering, the answer to my second question is below. I was calling as.matrix on a matrix, and it was screwing up the data. The following code works now!

exprs <- as.matrix(read.table("small_RMA_table.txt", header=TRUE, sep = "\t", row.names = 1,
eucl_dist=dist(exprs,method = 'euclidean')
hie_clust=hclust(eucl_dist,method = 'complete')

Do you want to cluster on columns (detect similarities between treatments) or on rows (detect similarities between genes)? It sounds like you want the former, given that you're expecting 7 dendrogram branches for 7 treatments.

If so, then you need to transpose your dataset. dist computes a distance matrix for rows, not columns, which is not what you want.

Once you've done the transpose, your clustering should take no time at all, and minimal memory.
