Using R to cluster based on euclidean distance and a complete linkage metric, too many vectors?

后端未结

关注

 3  1250

I am trying to figure out how to read in a counts matrix into R, and then cluster based on euclidean distance and a complete linkage metric. The original matrix has 56,000 rows

相关标签:

3条回答

不要未来只要你来

2021-01-24 01:50
In case anybody was wondering, the answer to my second question is below. I was calling as.matrix on a matrix, and it was screwing up the data. The following code works now!
```
exprs <- as.matrix(read.table("small_RMA_table.txt", header=TRUE, sep = "\t", row.names = 1, as.is=TRUE))
eucl_dist=dist(exprs,method = 'euclidean')
hie_clust=hclust(eucl_dist,method = 'complete')
plot(hie_clust)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2021-01-24 02:16

You're running out of RAM. That's it. You can't allocate a vector that exceeds your memory space. Move to a computer with more memory or maybe, try use bigmemory (I've never tried it).

https://support.bioconductor.org/p/53848/

0 讨论(0)
发布评论:

提交评论
- 加载中...
名媛妹妹

2021-01-24 02:16

Do you want to cluster on columns (detect similarities between treatments) or on rows (detect similarities between genes)? It sounds like you want the former, given that you're expecting 7 dendrogram branches for 7 treatments.

If so, then you need to transpose your dataset. dist computes a distance matrix for rows, not columns, which is not what you want.

Once you've done the transpose, your clustering should take no time at all, and minimal memory.

0 讨论(0)
发布评论:

提交评论
- 加载中...