A basic/common class in R is called \"dist\"
, and is a relatively efficient representation of a symmetric distance matrix. Unlike a \"matrix\"
object,
There do not seem to be tools in stats
package for this. Thanks to @flodel for an alternative implementation in a non-core package.
I dug into the definition of the "dist"
class in the core R source, which is old-school S3 with no tools in the dist.R
source file like what I'm asking about in this question.
The documentation of the dist()
function does point out, usefully, that (and I quote):
The lower triangle of the distance matrix stored by columns in a vector, say do
. If n
is the number of observations, i.e., n <- attr(do, "Size")
, then for i < j ≤ n, the dissimilarity between (row) i
and j
is:
do[n*(i-1) - i*(i-1)/2 + j-i]
The length of the vector is n*(n-1)/2
, i.e., of order n^2
.
(end quote)
I took advantage of this in the following example code for a define-yourself "dist"
accessor. Note that this example can only return one value at a time.
################################################################################
# Define dist accessor
################################################################################
setOldClass("dist")
getDistIndex <- function(x, i, j){
n <- attr(x, "Size")
if( class(i) == "character"){ i <- which(i[1] == attr(x, "Labels")) }
if( class(j) == "character"){ j <- which(j[1] == attr(x, "Labels")) }
# switch indices (symmetric) if i is bigger than j
if( i > j ){
i0 <- i
i <- j
j <- i0
}
# for i < j <= n
return( n*(i-1) - i*(i-1)/2 + j-i )
}
# Define the accessor
"[.dist" <- function(x, i, j, ...){
x[[getDistIndex(x, i, j)]]
}
################################################################################
And this seems to work fine, as expected. However, I'm having trouble getting the replacement function to work.
################################################################################
# Define the replacement function
################################################################################
"[.dist<-" <- function(x, i, j, value){
x[[get.dist.index(x, i, j)]] <- value
return(x)
}
################################################################################
A test-run of this new assignment operator
dist1["5", "3"] <- 7000
Returns:
"R> Error in dist1["5", "3"] <- 7000
: incorrect number of subscripts on matrix"
As-asked, I think @flodel answered the question better, but still thought this "answer" might also be useful.
I also found some nice S4 examples of square-bracket accessor and replacement definitions in the Matrix package, which could be adapted from this current example pretty easily.