I have this R function to generate a matrix of all combinations of k numbers between 0 and n whose sum equals n. This is one of the bottlenecks of my program as it becomes extre
The following can be done with lapply.
ls1 <- list()
for(i in 1:k) {
ls1[[i]] <- 0:n
}
Try substituting this is and see if you get any speed up.
ls1 = lapply(1:k,function(x) 0:n)
I changed 'ls' to 'ls1' because ls() is an R function.
It is hard to tell if it will be useful unless you answer my question regarding your typical values for n
and k
(please do.) Here is a version using recursion that seems to be faster than josilber's using his benchmark test:
sum.comb3 <- function(n, k) {
stopifnot(k > 0L)
REC <- function(n, k) {
if (k == 1L) list(n) else
unlist(lapply(0:n, function(i)Map(c, i, REC(n - i, k - 1L))),
recursive = FALSE)
}
matrix(unlist(REC(n, k)), ncol = k, byrow = TRUE)
}
microbenchmark(sum.comb(3, 10), sum.comb2(3, 10), sum.comb3(3, 10))
# Unit: milliseconds
# expr min lq median uq max neval
# sum.comb2(3, 10) 39.55612 40.60798 41.91954 44.26756 70.44944 100
# sum.comb3(3, 10) 25.86008 27.74415 28.37080 29.65567 34.18620 100
Here's a different approach, which incrementally expands the set from size 1 to k, at each iteration pruning the combinations whose sums exceed n. This should result in speedups where you have a large k relative to n, because you won't need to compute anything close to the size of the power set.
sum.comb2 <- function(n, k) {
combos <- 0:n
sums <- 0:n
for (width in 2:k) {
combos <- apply(expand.grid(combos, 0:n), 1, paste, collapse=" ")
sums <- apply(expand.grid(sums, 0:n), 1, sum)
if (width == k) {
return(combos[sums == n])
} else {
combos <- combos[sums <= n]
sums <- sums[sums <= n]
}
}
}
# Simple test
sum.comb2(3, 2)
# [1] "3 0" "2 1" "1 2" "0 3"
Here's an example of the speedups with small n and large k:
library(microbenchmark)
microbenchmark(sum.comb2(1, 100))
# Unit: milliseconds
# expr min lq median uq max neval
# sum.comb2(1, 100) 149.0392 158.716 162.1919 174.0482 236.2095 100
This approach runs in under a second, while of course the approach with the power set would never get past the call to expand.grid
, since you'll end up with 2^100 rows in your resulting matrix.
Even in a less extreme case, with n=3 and k=10, we see a 20x speedup compared to function in the original post:
microbenchmark(sum.comb(3, 10), sum.comb2(3, 10))
# Unit: milliseconds
# expr min lq median uq max neval
# sum.comb(3, 10) 404.00895 439.94472 446.67452 461.24909 574.80426 100
# sum.comb2(3, 10) 23.27445 24.53771 25.60409 26.97439 65.59576 100
See the partitions
package partitcularly compositions()
and blockparts()
they'll be faster both as whole matrix generators and for iterative operation. Then if that still isn't fast enough see the wide variety of publications on composition and partition generation algorithms (loopless, gray code, and parallel) like Daniel Page's research.
library(partitions)
library(microbenchmark)
# rcpp_comps is an Rcpp implementation of compositions using loop
# free grey code, just for illustrative purposes.
# Just get the matrix
microbenchmark( compositions(3,10), compositions(10,3),
blockparts(rep(10,3),10), blockparts(rep(3,10),3),
rcpp_comps(10), times=10)
## Unit: microseconds
## expr min lq median uq max neval
## compositions(3, 10) 1967.4 2050.9 2097.1 2173 3189.6 10
## compositions(10, 3) 618.2 638.5 654.6 688 700.7 10
## blockparts(rep(10, 3), 10) 612.2 620.8 645.6 663 963.5 10
## blockparts(rep(3, 10), 3) 2057.2 2089.2 2176.0 2242 3116.4 10
## rcpp_comps(10) 359.9 360.7 367.6 378 404.2 10
What about something short like:
comb = function(n, k) {
all = combn(0:n, k)
sums = colSums(all)
all[, sums == n]
}
Then something like:
comb(5, 3)
which produces a matrix as you requested:
[,1] [,2]
[1,] 0 0
[2,] 1 2
[3,] 4 3
Thanks to @josilber and the original poster for pointing out that the OP required all the permutations with repetition rather than combinations. A similar approach for the permutations would look like:
perm = function(n, k) {
grid = matrix(rep(0:n, k), n + 1, k)
all = expand.grid(data.frame(grid))
sums = rowSums(all)
all[sums == n,]
}
Then something like:
perm(5, 3)
produces a matrix as you requested:
X1 X2 X3
6 5 0 0
11 4 1 0
16 3 2 0
21 2 3 0
26 1 4 0
31 0 5 0
...