I am trying to figure out how to read in a counts matrix into R, and then cluster based on euclidean distance and a complete linkage metric. The original matrix has 56,000 rows
In case anybody was wondering, the answer to my second question is below. I was calling as.matrix
on a matrix, and it was screwing up the data. The following code works now!
exprs <- as.matrix(read.table("small_RMA_table.txt", header=TRUE, sep = "\t", row.names = 1, as.is=TRUE))
eucl_dist=dist(exprs,method = 'euclidean')
hie_clust=hclust(eucl_dist,method = 'complete')
plot(hie_clust)
You're running out of RAM. That's it. You can't allocate a vector that exceeds your memory space. Move to a computer with more memory or maybe, try use bigmemory
(I've never tried it).
https://support.bioconductor.org/p/53848/
Do you want to cluster on columns (detect similarities between treatments) or on rows (detect similarities between genes)? It sounds like you want the former, given that you're expecting 7 dendrogram branches for 7 treatments.
If so, then you need to transpose your dataset. dist
computes a distance matrix for rows, not columns, which is not what you want.
Once you've done the transpose, your clustering should take no time at all, and minimal memory.