all combinations of k numbers between 0 and n whose sum equals n, speed optimization

后端 未结 5 1855
Happy的楠姐
Happy的楠姐 2021-01-25 18:42

I have this R function to generate a matrix of all combinations of k numbers between 0 and n whose sum equals n. This is one of the bottlenecks of my program as it becomes extre

相关标签:
5条回答
  • 2021-01-25 19:15

    The following can be done with lapply.

    ls1 <- list()
    for(i in 1:k) {
      ls1[[i]] <- 0:n
    }
    

    Try substituting this is and see if you get any speed up.

    ls1 = lapply(1:k,function(x) 0:n)
    

    I changed 'ls' to 'ls1' because ls() is an R function.

    0 讨论(0)
  • 2021-01-25 19:16

    It is hard to tell if it will be useful unless you answer my question regarding your typical values for n and k (please do.) Here is a version using recursion that seems to be faster than josilber's using his benchmark test:

    sum.comb3 <- function(n, k) {
    
       stopifnot(k > 0L)
    
       REC <- function(n, k) {
          if (k == 1L) list(n) else
          unlist(lapply(0:n, function(i)Map(c, i, REC(n - i, k - 1L))),
                 recursive = FALSE)
       }
    
       matrix(unlist(REC(n, k)), ncol = k, byrow = TRUE)
    }
    
    microbenchmark(sum.comb(3, 10), sum.comb2(3, 10), sum.comb3(3, 10))
    # Unit: milliseconds
    #              expr      min       lq   median       uq      max neval
    #  sum.comb2(3, 10) 39.55612 40.60798 41.91954 44.26756 70.44944   100
    #  sum.comb3(3, 10) 25.86008 27.74415 28.37080 29.65567 34.18620   100
    
    0 讨论(0)
  • 2021-01-25 19:21

    Here's a different approach, which incrementally expands the set from size 1 to k, at each iteration pruning the combinations whose sums exceed n. This should result in speedups where you have a large k relative to n, because you won't need to compute anything close to the size of the power set.

    sum.comb2 <- function(n, k) {
      combos <- 0:n
      sums <- 0:n
      for (width in 2:k) {
        combos <- apply(expand.grid(combos, 0:n), 1, paste, collapse=" ")
        sums <- apply(expand.grid(sums, 0:n), 1, sum)
        if (width == k) {
          return(combos[sums == n])
        } else {
          combos <- combos[sums <= n]
          sums <- sums[sums <= n]
        }
      }
    }
    
    # Simple test
    sum.comb2(3, 2)
    # [1] "3 0" "2 1" "1 2" "0 3"
    

    Here's an example of the speedups with small n and large k:

    library(microbenchmark)
    microbenchmark(sum.comb2(1, 100))
    # Unit: milliseconds
    #               expr      min      lq   median       uq      max neval
    #  sum.comb2(1, 100) 149.0392 158.716 162.1919 174.0482 236.2095   100
    

    This approach runs in under a second, while of course the approach with the power set would never get past the call to expand.grid, since you'll end up with 2^100 rows in your resulting matrix.

    Even in a less extreme case, with n=3 and k=10, we see a 20x speedup compared to function in the original post:

    microbenchmark(sum.comb(3, 10), sum.comb2(3, 10))
    # Unit: milliseconds
    #              expr       min        lq    median        uq       max neval
    #   sum.comb(3, 10) 404.00895 439.94472 446.67452 461.24909 574.80426   100
    #  sum.comb2(3, 10)  23.27445  24.53771  25.60409  26.97439  65.59576   100
    
    0 讨论(0)
  • 2021-01-25 19:26

    See the partitions package partitcularly compositions() and blockparts() they'll be faster both as whole matrix generators and for iterative operation. Then if that still isn't fast enough see the wide variety of publications on composition and partition generation algorithms (loopless, gray code, and parallel) like Daniel Page's research.

    library(partitions)
    library(microbenchmark)
    
    # rcpp_comps is an Rcpp implementation of compositions using loop
    # free grey code, just for illustrative purposes.
    
    # Just get the matrix
    microbenchmark( compositions(3,10), compositions(10,3),
                    blockparts(rep(10,3),10), blockparts(rep(3,10),3),
                    rcpp_comps(10), times=10)
    ## Unit: microseconds
    ##                        expr    min     lq median   uq    max neval
    ##         compositions(3, 10) 1967.4 2050.9 2097.1 2173 3189.6    10
    ##         compositions(10, 3)  618.2  638.5  654.6  688  700.7    10
    ##  blockparts(rep(10, 3), 10)  612.2  620.8  645.6  663  963.5    10
    ##   blockparts(rep(3, 10), 3) 2057.2 2089.2 2176.0 2242 3116.4    10
    ##              rcpp_comps(10)  359.9  360.7  367.6  378  404.2    10
    
    0 讨论(0)
  • 2021-01-25 19:40

    What about something short like:

    comb = function(n, k) {
        all = combn(0:n, k)
        sums = colSums(all)
        all[, sums == n]
    }
    

    Then something like:

    comb(5, 3)
    

    which produces a matrix as you requested:

         [,1] [,2]
    [1,]    0    0
    [2,]    1    2
    [3,]    4    3
    

    Thanks to @josilber and the original poster for pointing out that the OP required all the permutations with repetition rather than combinations. A similar approach for the permutations would look like:

    perm = function(n, k) {
        grid = matrix(rep(0:n, k), n + 1, k)
        all = expand.grid(data.frame(grid))
        sums = rowSums(all)
        all[sums == n,]
    }
    

    Then something like:

    perm(5, 3)
    

    produces a matrix as you requested:

        X1 X2 X3
    6    5  0  0
    11   4  1  0
    16   3  2  0
    21   2  3  0
    26   1  4  0
    31   0  5  0
    ...
    
    0 讨论(0)
提交回复
热议问题