Using R to cluster based on euclidean distance and a complete linkage metric, too many vectors?

后端 未结 3 1249
渐次进展
渐次进展 2021-01-24 01:37

I am trying to figure out how to read in a counts matrix into R, and then cluster based on euclidean distance and a complete linkage metric. The original matrix has 56,000 rows

相关标签:
3条回答
  • In case anybody was wondering, the answer to my second question is below. I was calling as.matrix on a matrix, and it was screwing up the data. The following code works now!

    exprs <- as.matrix(read.table("small_RMA_table.txt", header=TRUE, sep = "\t", row.names = 1, as.is=TRUE))
    eucl_dist=dist(exprs,method = 'euclidean')
    hie_clust=hclust(eucl_dist,method = 'complete')
    plot(hie_clust)
    
    0 讨论(0)
  • 2021-01-24 02:16

    You're running out of RAM. That's it. You can't allocate a vector that exceeds your memory space. Move to a computer with more memory or maybe, try use bigmemory (I've never tried it).

    https://support.bioconductor.org/p/53848/

    0 讨论(0)
  • 2021-01-24 02:16

    Do you want to cluster on columns (detect similarities between treatments) or on rows (detect similarities between genes)? It sounds like you want the former, given that you're expecting 7 dendrogram branches for 7 treatments.

    If so, then you need to transpose your dataset. dist computes a distance matrix for rows, not columns, which is not what you want.

    Once you've done the transpose, your clustering should take no time at all, and minimal memory.

    0 讨论(0)
提交回复
热议问题