R Interclass distance matrix

☆樱花仙子☆ 提交于 2019-12-06 08:46:19

For general n-dimensional Euclidean distance, we can exploit the equation (not R, but algebra):

square_dist(b,a) = sum_i(b[i]*b[i]) + sum_i(a[i]*a[i]) - 2*inner_prod(b,a)

where the sums are over the dimensions of vectors a and b for i=[1,n]. Here, a and b are one pair from A and B. The key here is that this equation can be written as a matrix equation for all pairs in A and B.

In code:

## First split the data with respect to the class
n <- 2   ## the number of dimensions, for this example is 2
tmp <- split(df[,1:n], df$class)

d <- sqrt(matrix(rowSums(expand.grid(rowSums(tmp$B*tmp$B),rowSums(tmp$A*tmp$A))),
                 nrow=nrow(tmp$B)) - 
          2. * as.matrix(tmp$B) %*% t(as.matrix(tmp$A)))

Notes:

  1. The inner rowSums compute sum_i(b[i]*b[i]) and sum_i(a[i]*a[i]) for each b in B and a in A, respectively.
  2. expand.grid then generates all pairs between B and A.
  3. The outer rowSums computes the sum_i(b[i]*b[i]) + sum_i(a[i]*a[i]) for all these pairs.
  4. This result is then reshaped into a matrix. Note that the number of rows of this matrix is the number of points of class B as you requested.
  5. Then subtract two times the inner product of all pairs. This inner product can be written as a matrix multiply tmp$B %*% t(tmp$A) where I left out the coercion to matrix for clarity.
  6. Finally, take the square root.

Using this code with your data:

print(d)
##          1         2         3         8         10
##4 0.0030000 0.3111688 0.4072174 0.0030000 0.01029563
##5 0.6061394 0.3000000 0.2000000 0.6061394 0.59682493
##6 0.2213707 0.1000000 0.2000000 0.2213707 0.21023796
##7 0.0010000 0.3149635 0.4110985 0.0010000 0.01272792
##9 0.3140143 0.0000000 0.1000000 0.3140143 0.30364453

Note that this code will work for any n > 1. We can recover your previous 1-d result by setting n to 1 and not perform the inner rowSums (because there is now only one column in tmp$A and tmp$B):

n <- 1   ## the number of dimensions, set this now to 1
tmp <- split(df[,1:n], df$class)

d <- sqrt(matrix(rowSums(expand.grid(tmp$B*tmp$B,tmp$A*tmp$A)),
                 nrow=length(tmp$B)) - 
          2. * as.matrix(tmp$B) %*% t(as.matrix(tmp$A)))
print(d)
##      [,1]  [,2]  [,3]  [,4]  [,5]
##[1,] 0.003 0.295 0.395 0.003 0.005
##[2,] 0.598 0.300 0.200 0.598 0.590
##[3,] 0.198 0.100 0.200 0.198 0.190
##[4,] 0.001 0.299 0.399 0.001 0.009
##[5,] 0.298 0.000 0.100 0.298 0.290

Here's an attempt via generating each combination and then simply taking the difference from each value:

abs(matrix(Reduce(`-`, expand.grid(split(df$values, df$class))), nrow=5, byrow=TRUE))
#      [,1]  [,2]  [,3]  [,4]  [,5]
#[1,] 0.003 0.295 0.395 0.003 0.005
#[2,] 0.598 0.300 0.200 0.598 0.590
#[3,] 0.198 0.100 0.200 0.198 0.190
#[4,] 0.001 0.299 0.399 0.001 0.009
#[5,] 0.298 0.000 0.100 0.298 0.290
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!