Extract diagonals from a distance matrix in R

后端 未结 2 374
礼貌的吻别
礼貌的吻别 2020-12-22 03:43

I would like to know how can I extract the values of the first diagonal from a distance matrix.

For example:

> mymatrix
     [,1] [,2]
[1,]    1           


        
相关标签:
2条回答
  • 2020-12-22 04:06

    One work around is to convert the dist object to matrix and then extract elements where row index is one larger than the column index:

    mat = as.matrix(dist(mymatrix))
    mat[row(mat) == col(mat) + 1]
    # [1] 2.828427 3.000000 2.828427
    
    0 讨论(0)
  • 2020-12-22 04:09

    A seemingly complicated but extremely efficient solution for extracting the dth sub-diagonal of a "dist" matrix.

    subdiag <- function (dist_obj, d) {
      if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object!")
      n <- attr(dist_obj, "Size")
      if (d < 1 || d > (n - 1)) stop(sprintf("'d' needs be between 1 and %d", n - 1L))
      j_1 <- c(0, seq.int(from = n - 1, by = -1, length = n - d - 1))
      subdiag_ind <- d + cumsum(j_1)
      dist_obj[subdiag_ind]
      }
    

    See R - How to get row & column subscripts of matched elements from a distance matrix for details of packed storage of the "dist" object. Inside this function, j_1 is the number of "X" in the (j - 1)th column. cumsum gives the 1D index for main diagonals (whose values are all zeros). A further offset by d gives 1D index of the dth sub-diagonal.

    set.seed(0)
    x <- dist(matrix(runif(10), 5))
    #          1         2         3         4
    #2 0.9401067                              
    #3 0.9095143 0.1162289                    
    #4 0.5618382 0.3884722 0.3476762          
    #5 0.4275871 0.6968296 0.6220650 0.3368478
    
    subdiag(x, 1)
    #[1] 0.9401067 0.1162289 0.3476762 0.3368478
    
    lapply(1:4, subdiag, dist_obj = x)
    #[[1]]
    #[1] 0.9401067 0.1162289 0.3476762 0.3368478
    #
    #[[2]]
    #[1] 0.9095143 0.3884722 0.6220650
    #
    #[[3]]
    #[1] 0.5618382 0.6968296
    #
    #[[4]]
    #[1] 0.4275871
    

    Good performance for big to large "dist" matrix.

    ## mimic a "dist" object without actually calling function `dist`
    n <- 2000
    x <- structure(numeric(n * (n - 1) / 2), class = "dist", Size = n)
    
    library(bench)
    bench::mark("Psidom" = {mat = as.matrix(x); mat[row(mat) == col(mat) + 1]},
                "zheyuan" = subdiag(x, 1))
    ## A tibble: 2 x 14
    #  expression      min     mean  median      max `itr/sec` mem_alloc  n_gc n_itr
    #  <chr>      <bch:tm> <bch:tm> <bch:t> <bch:tm>     <dbl> <bch:byt> <dbl> <int>
    #1 Psidom        553ms    553ms   553ms 552.74ms      1.81   251.8MB     5     1
    #2 zheyuan       106µs    111µs   108µs   3.85ms   9045.      62.7KB     2  4519
    ## ... with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
    ##   time <list>, gc <list>
    

    subdiag is 5120 times faster (553ms / 108µs), and 4112 times more memory efficient (251.8MB / 62.7KB).

    0 讨论(0)
提交回复
热议问题