How do I manipulate/access elements of an instance of “dist” class using core R?

前端未结

关注

 12  1966

A basic/common class in R is called \"dist\", and is a relatively efficient representation of a symmetric distance matrix. Unlike a \"matrix\" object,

相关标签:

12条回答

闹比i

2021-02-02 11:14
There do not seem to be tools in stats package for this. Thanks to @flodel for an alternative implementation in a non-core package.

I dug into the definition of the "dist" class in the core R source, which is old-school S3 with no tools in the dist.R source file like what I'm asking about in this question.

The documentation of the dist() function does point out, usefully, that (and I quote):

The lower triangle of the distance matrix stored by columns in a vector, say do. If n is the number of observations, i.e., n <- attr(do, "Size"), then for i < j ≤ n, the dissimilarity between (row) i and j is:

do[n*(i-1) - i*(i-1)/2 + j-i]

The length of the vector is n*(n-1)/2, i.e., of order n^2.

(end quote)

I took advantage of this in the following example code for a define-yourself "dist" accessor. Note that this example can only return one value at a time.
```
################################################################################
# Define dist accessor
################################################################################
setOldClass("dist")
getDistIndex <- function(x, i, j){
    n <- attr(x, "Size")
    if( class(i) == "character"){ i <- which(i[1] == attr(x, "Labels")) }
    if( class(j) == "character"){ j <- which(j[1] == attr(x, "Labels")) }
    # switch indices (symmetric) if i is bigger than j
    if( i > j ){
        i0 <- i
        i  <- j
        j  <- i0
    }
    # for i < j <= n
    return( n*(i-1) - i*(i-1)/2 + j-i )
}
# Define the accessor
"[.dist" <- function(x, i, j, ...){
    x[[getDistIndex(x, i, j)]]
}
################################################################################
```
And this seems to work fine, as expected. However, I'm having trouble getting the replacement function to work.
```
################################################################################
# Define the replacement function
################################################################################
"[.dist<-" <- function(x, i, j, value){
    x[[get.dist.index(x, i, j)]] <- value
    return(x)
}
################################################################################
```
A test-run of this new assignment operator
```
dist1["5", "3"] <- 7000
```
Returns:

"R> Error in dist1["5", "3"] <- 7000 : incorrect number of subscripts on matrix"

As-asked, I think @flodel answered the question better, but still thought this "answer" might also be useful.

I also found some nice S4 examples of square-bracket accessor and replacement definitions in the Matrix package, which could be adapted from this current example pretty easily.
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2021-02-02 11:15
I don't have a straight answer to your question, but if you are using the Euclidian distance, have a look at the rdist function from the fields package. Its implementation (in Fortran) is faster than dist, and the output is of class matrix. At the very least, it shows that some developers have chosen to move away from this dist class, maybe for the exact reason you are mentioning. If you are concerned that using a full matrix for storing a symmetric matrix is an inefficient use of memory, you could convert it to a triangular matrix.
```
library("fields")
points <- matrix(runif(1000*100), nrow=1000, ncol=100)

system.time(dist1 <- dist(points))
#    user  system elapsed 
#   7.277   0.000   7.338 

system.time(dist2 <- rdist(points))
#   user  system elapsed 
#  2.756   0.060   2.851 

class(dist2)
# [1] "matrix"
dim(dist2)
# [1] 1000 1000
dist2[1:3, 1:3]
#              [,1]         [,2]         [,3]
# [1,] 0.0000000001 3.9529674733 3.8051198575
# [2,] 3.9529674733 0.0000000001 3.6552146293
# [3,] 3.8051198575 3.6552146293 0.0000000001
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2021-02-02 11:18

You may find this useful [from ??dist]:

The lower triangle of the distance matrix stored by columns in a vector, say ‘do’. If ‘n’ is the number of observations, i.e., ‘n <- attr(do, "Size")’, then for i < j <= n, the dissimilarity between (row) i and j is ‘do[n*(i-1) - i*(i-1)/2 + j-i]’. The length of the vector is n*(n-1)/2, i.e., of order n^2.

0 讨论(0)
发布评论:

提交评论
- 加载中...

南旧

2021-02-02 11:20

There aren't standard ways of doing this, unfortunately. Here's are two functions that convert between the 1D index into the 2D matrix coordinates. They aren't pretty, but they work, and at least you can use the code to make something nicer if you need it. I'm posting it just because the equations aren't obvious.

distdex<-function(i,j,n) #given row, column, and n, return index
    n*(i-1) - i*(i-1)/2 + j-i

rowcol<-function(ix,n) { #given index, return row and column
    nr=ceiling(n-(1+sqrt(1+4*(n^2-n-2*ix)))/2)
    nc=n-(2*n-nr+1)*nr/2+ix+nr
    cbind(nr,nc)
}

A little test harness to show it works:

dist(rnorm(20))->testd
as.matrix(testd)[7,13]   #row<col
distdex(7,13,20) # =105
testd[105]   #same as above

testd[c(42,119)]
rowcol(c(42,119),20)  # = (3,8) and (8,15)
as.matrix(testd)[3,8]
as.matrix(testd)[8,15]

0 讨论(0)

轮回少年

2021-02-02 11:26
Converting to a matrix was also out of question for me, because the resulting matrix would be 35K by 35K, so I left it as a vector (result of dist) and wrote a function to find the place in the vector where the distance should be:
```
distXY <- function(X,Y,n){
  A=min(X,Y)
  B=max(X,Y)

  d=eval(parse(text=
               paste0("(A-1)*n  -",paste0((1:(A-1)),collapse="-"),"+ B-A")))

  return(d)

}
```
Where you provide X and Y, the original rows of the elements in the matrix from which you calculated dist, and n is the total number of elements in that matrix. The result is the position in the dist vector where the distance will be. I hope it makes sense.
0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2021-02-02 11:26

disto package provides a class that wraps distance matrices in R (in-memory and out-of-core) and provides much more than the convenience operators like [. Please check the vignette here.

PS: I am the author of the package.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2