How do I manipulate/access elements of an instance of “dist” class using core R?

前端 未结 12 1947
傲寒
傲寒 2021-02-02 10:50

A basic/common class in R is called \"dist\", and is a relatively efficient representation of a symmetric distance matrix. Unlike a \"matrix\" object,

12条回答
  •  闹比i
    闹比i (楼主)
    2021-02-02 11:14

    There do not seem to be tools in stats package for this. Thanks to @flodel for an alternative implementation in a non-core package.

    I dug into the definition of the "dist" class in the core R source, which is old-school S3 with no tools in the dist.R source file like what I'm asking about in this question.

    The documentation of the dist() function does point out, usefully, that (and I quote):

    The lower triangle of the distance matrix stored by columns in a vector, say do. If n is the number of observations, i.e., n <- attr(do, "Size"), then for i < j ≤ n, the dissimilarity between (row) i and j is:

    do[n*(i-1) - i*(i-1)/2 + j-i]

    The length of the vector is n*(n-1)/2, i.e., of order n^2.

    (end quote)

    I took advantage of this in the following example code for a define-yourself "dist" accessor. Note that this example can only return one value at a time.

    ################################################################################
    # Define dist accessor
    ################################################################################
    setOldClass("dist")
    getDistIndex <- function(x, i, j){
        n <- attr(x, "Size")
        if( class(i) == "character"){ i <- which(i[1] == attr(x, "Labels")) }
        if( class(j) == "character"){ j <- which(j[1] == attr(x, "Labels")) }
        # switch indices (symmetric) if i is bigger than j
        if( i > j ){
            i0 <- i
            i  <- j
            j  <- i0
        }
        # for i < j <= n
        return( n*(i-1) - i*(i-1)/2 + j-i )
    }
    # Define the accessor
    "[.dist" <- function(x, i, j, ...){
        x[[getDistIndex(x, i, j)]]
    }
    ################################################################################
    

    And this seems to work fine, as expected. However, I'm having trouble getting the replacement function to work.

    ################################################################################
    # Define the replacement function
    ################################################################################
    "[.dist<-" <- function(x, i, j, value){
        x[[get.dist.index(x, i, j)]] <- value
        return(x)
    }
    ################################################################################
    

    A test-run of this new assignment operator

    dist1["5", "3"] <- 7000
    

    Returns:

    "R> Error in dist1["5", "3"] <- 7000 : incorrect number of subscripts on matrix"

    As-asked, I think @flodel answered the question better, but still thought this "answer" might also be useful.

    I also found some nice S4 examples of square-bracket accessor and replacement definitions in the Matrix package, which could be adapted from this current example pretty easily.

提交回复
热议问题