How do I manipulate/access elements of an instance of “dist” class using core R?

前端 未结 12 1965
傲寒
傲寒 2021-02-02 10:50

A basic/common class in R is called \"dist\", and is a relatively efficient representation of a symmetric distance matrix. Unlike a \"matrix\" object,

相关标签:
12条回答
  • 2021-02-02 11:01

    This response is really just an extended follow on to Christian A's earlier response. It is warranted because some readers of the question (myself included) may query the dist object as if it were symmetric ( not just (7,13) as below but also (13,7). I don't have edit privileges and the earlier answer was correct as long as the user was treating the dist object as a dist object and not a sparse matrix which is why I have a separate response rather than an edit. Vote up Christian A for doing the heavy lifting if this answer is useful. The original answer with my edits pasted in :

    distdex<-function(i,j,n) #given row, column, and n, return index
        n*(i-1) - i*(i-1)/2 + j-i
    
    rowcol<-function(ix,n) { #given index, return row and column
        nr=ceiling(n-(1+sqrt(1+4*(n^2-n-2*ix)))/2)
        nc=n-(2*n-nr+1)*nr/2+ix+nr
        cbind(nr,nc)
    }
    #A little test harness to show it works:
    
    dist(rnorm(20))->testd
    as.matrix(testd)[7,13]   #row<col
    distdex(7,13,20) # =105
    testd[105]   #same as above
    

    But...

    distdex(13,7,20) # =156
    testd[156]   #the wrong answer
    

    Christian A's function only works if i < j. For i = j and i > j it returns the wrong answer. Modifying the distdex function to return 0 when i == j and to transpose i and j when i > j solves the problem so:

    distdex2<-function(i,j,n){ #given row, column, and n, return index
      if(i==j){0
      }else if(i > j){
        n*(j-1) - j*(j-1)/2 + i-j
      }else{
        n*(i-1) - i*(i-1)/2 + j-i  
      }
    }
    
    as.matrix(testd)[7,13]   #row<col
    distdex2(7,13,20) # =105
    testd[105]   #same as above
    distdex2(13,7,20) # =105
    testd[105]   #the same answer
    
    0 讨论(0)
  • 2021-02-02 11:02

    as.matrix(d) will turn the dist object d into a matrix, while as.dist(m) will turn the matrix m back into a dist object. Note that the latter doesn't actually check that m is a valid distance matrix; it just extracts the lower triangular part.

    0 讨论(0)
  • 2021-02-02 11:04

    You could do this:

    d <- function(distance, selection){
      eval(parse(text = paste("as.matrix(distance)[",
                   selection, "]")))
    }
    
    `d<-` <- function(distance, selection, value){
      eval(parse(text = paste("as.matrix(distance)[",
                   selection, "] <- value")))
      as.dist(distance)
    }
    

    Which would allow you to do this:

     mat <- matrix(1:12, nrow=4)
     mat.d <- dist(mat)
     mat.d
            1   2   3
        2 1.7        
        3 3.5 1.7    
        4 5.2 3.5 1.7
    
     d(mat.d, "3, 2")
        [1] 1.7
     d(mat.d, "3, 2") <- 200
     mat.d
              1     2     3
        2   1.7            
        3   3.5 200.0      
        4   5.2   3.5   1.7
    

    However, any changes you make to the diagonal or upper triangle are ignored. That may or may not be the right thing to do. If it isn't, you'll need to add some kind of sanity check or appropriate handling for those cases. And probably others.

    0 讨论(0)
  • 2021-02-02 11:04

    Here's my practical solution for getting values from of a dist object by name. Want to get item 9 as a vector of values?

    as.matrix(mat1)[grepl("9", labels(mat1))]
    
    0 讨论(0)
  • 2021-02-02 11:08

    Seems dist objects are treated pretty much the same way as simple vector objects. As far as I can see its a vector with attributes. So to get the values out:

    x = as.vector(distobject)
    

    See? dist for a formula to extract the distance between a specific pair of objects using their indices.

    0 讨论(0)
  • 2021-02-02 11:14

    You can acces the atributes of any object with str()

    for a "dist" object of some of my data (dist1), it look like this:

    > str(dist1)
    Class 'dist'  atomic [1:4560] 7.3 7.43 7.97 7.74 7.55 ...
      ..- attr(*, "Size")= int 96
      ..- attr(*, "Labels")= chr [1:96] "1" "2" "3" "4" ...
      ..- attr(*, "Diag")= logi FALSE
      ..- attr(*, "Upper")= logi FALSE
      ..- attr(*, "method")= chr "euclidean"
      ..- attr(*, "call")= language dist(x = dist1) 
    

    you can see that for this particular data set, the "Labels" attribute is a character string of length = 96 with numbers from 1 to 96 as characters.

    you could change directly that character string doing:

    > attr(dist1,"Labels") <- your.labels
    

    "your.labels" should be some id. or factor vector, presumably in the original data from with the "dist" object was made.

    0 讨论(0)
提交回复
热议问题