问题
In this thread, I am trying to include all the commonly asked questions and their answers here. I hope this will be useful for someone.
General question: How to generate sequences of r
objects from n
objects?
- combination vs permutation.
- with replacement vs without replacement.
- distinct items vs non-distinct items (multisets).
There are in total 2^3=8
questions of this type.
[Update]
Josh O'Brien suggests that the 8 questions are related to twelvefold way. Indeed, the "distinct" questions are included in twelvefold way, while the "non-distinct" questions are not included. Anyway, it is interesting to compare the 8 questions here with twelvefold way. See the comments for further readings.
回答1:
EDIT: I have updated the answer to use a more efficient package arrangements
Getting start of using arrangement
arrangements contains some efficient generators and iterators for permutations and combinations. It has been demonstrated that arrangements
outperforms most of the existing packages of similar kind. Some benchmarks could be found here.
Here are the answers to the above questions
# 1) combinations: without replacement: distinct items
combinations(5, 2)
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 3
[6,] 2 4
[7,] 2 5
[8,] 3 4
[9,] 3 5
[10,] 4 5
# 2) combinations: with replacement: distinct items
combinations(5, 2, replace=TRUE)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 1 5
[6,] 2 2
[7,] 2 3
[8,] 2 4
[9,] 2 5
[10,] 3 3
[11,] 3 4
[12,] 3 5
[13,] 4 4
[14,] 4 5
[15,] 5 5
# 3) combinations: without replacement: non distinct items
combinations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "c"
# 4) combinations: with replacement: non distinct items
combinations(x = c("a", "b", "c"), k = 2, replace = TRUE) # as `freq` does not matter
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "b"
[5,] "b" "c"
[6,] "c" "c"
# 5) permutations: without replacement: distinct items
permutations(5, 2)
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 1
[6,] 2 3
[7,] 2 4
[8,] 2 5
[9,] 3 1
[10,] 3 2
[11,] 3 4
[12,] 3 5
[13,] 4 1
[14,] 4 2
[15,] 4 3
[16,] 4 5
[17,] 5 1
[18,] 5 2
[19,] 5 3
[20,] 5 4
# 6) permutations: with replacement: distinct items
permutations(5, 2, replace = TRUE)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 1 5
[6,] 2 1
[7,] 2 2
[8,] 2 3
[9,] 2 4
[10,] 2 5
[11,] 3 1
[12,] 3 2
[13,] 3 3
[14,] 3 4
[15,] 3 5
[16,] 4 1
[17,] 4 2
[18,] 4 3
[19,] 4 4
[20,] 4 5
[21,] 5 1
[22,] 5 2
[23,] 5 3
[24,] 5 4
[25,] 5 5
# 7) permutations: without replacement: non distinct items
permutations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "a"
[5,] "b" "c"
[6,] "c" "a"
[7,] "c" "b"
# 8) permutations: with replacement: non distinct items
permutations(x = c("a", "b", "c"), k = 2, replace = TRUE) # as `freq` doesn't matter
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "a"
[5,] "b" "b"
[6,] "b" "c"
[7,] "c" "a"
[8,] "c" "b"
[9,] "c" "c"
Compare to other packages
There are few advantages of using arrangements
over the existing packages.
Integral framework: you don't have to use different packages for different methods.
It is very efficient. See https://randy3k.github.io/arrangements/articles/benchmark.html for some benchmarks.
It is memory efficient, it is able to generate all 13! permutation of 1 to 13, existing packages will fail to do so because of the limitation of matrix size. The
getnext()
method of the iterators allow users to get the arrangements one by one.The generated arrangements are in dictionary order which may be desired for some users.
回答2:
A Walk Through a Slice of Combinatorics in R*
Below, we examine packages equipped with the capabilities of generating combinations & permutations. If I have left out any package, please forgive me and please leave a comment or better yet, edit this post.
Outline of analysis:
- Introduction
- Combinations
- Permutations
- Multisets
- Summary
- Memory
Before we begin, we note that combinations/permutations with replacement of distinct vs. non-distint items chosen m at a time are equivalent. This is so, because when we have replacement, it is not specific. Thus, no matter how many times a particular element originally occurs, the output will have an instance(s) of that element repeated 1 to m times.
1. Introduction
gtools
v 3.8.1combinat
v 0.0-8multicool
v 0.1-10partitions
v 1.9-19RcppAlgos
v 2.0.1 (I am the author)arrangements
v 1.1.0gRbase
v 1.8-3
I did not include permute
, permutations
, or gRbase::aperm/ar_perm
as they are not really meant to attack these types of problems.
|--------------------------------------- OVERVIEW ----------------------------------------|
|_______________| gtools | combinat | multicool | partitions |
| comb rep | Yes | | | |
| comb NO rep | Yes | Yes | | |
| perm rep | Yes | | | |
| perm NO rep | Yes | Yes | Yes | Yes |
| perm multiset | | | Yes | |
| comb multiset | | | | |
|accepts factors| | Yes | | |
| m at a time | Yes | Yes/No | | |
|general vector | Yes | Yes | Yes | |
| iterable | | | Yes | |
|parallelizable | | | | |
| big integer | | | | |
|_______________| iterpc | arrangements | RcppAlgos | gRbase |
| comb rep | Yes | Yes | Yes | |
| comb NO rep | Yes | Yes | Yes | Yes |
| perm rep | Yes | Yes | Yes | |
| perm NO rep | Yes | Yes | Yes | * |
| perm multiset | Yes | Yes | Yes | |
| comb multiset | Yes | Yes | Yes | |
|accepts factors| | Yes | Yes | |
| m at a time | Yes | Yes | Yes | Yes |
|general vector | Yes | Yes | Yes | Yes |
| iterable | | Yes | Partially | |
|parallelizable | | Yes | Yes | |
| big integer | | Yes | | |
The tasks, m at a time
and general vector
, refer to the capability of generating results "m at a time" (when m is less than the length of the vector) and rearranging a "general vector" as opposed to 1:n
. In practice, we are generally concerned with finding rearrangements of a general vector, therefore all examinations below will reflect this (when possible).
All benchmarks were ran on 3 different set-ups.
- Macbook Pro i7 16Gb
- Macbook Air i5 4Gb
- Lenovo Running Windows 7 i5 8Gb
The listed results were obtained from setup #1 (i.e. MBPro). The results for the other two systems were similar. Also, gc()
is periodically called to ensure all memory is available (See ?gc
).
2. Combinations
First, we examine combinations without replacement chosen m at a time.
RcppAlgos
combinat
(orutils
)gtools
arrangements
gRbase
How to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(13)
testVector1 <- sort(sample(100, 17))
m <- 9
t1 <- comboGeneral(testVector1, m) ## returns matrix with m columns
t3 <- combinat::combn(testVector1, m) ## returns matrix with m rows
t4 <- gtools::combinations(17, m, testVector1) ## returns matrix with m columns
identical(t(t3), t4) ## must transpose to compare
#> [1] TRUE
t5 <- combinations(testVector1, m)
identical(t1, t5)
#> [1] TRUE
t6 <- gRbase::combnPrim(testVector1, m)
identical(t(t6)[do.call(order, as.data.frame(t(t6))),], t1)
#> [1] TRUE
Benchmark:
microbenchmark(cbRcppAlgos = comboGeneral(testVector1, m),
cbGRbase = gRbase::combnPrim(testVector1, m),
cbGtools = gtools::combinations(17, m, testVector1),
cbCombinat = combinat::combn(testVector1, m),
cbArrangements = combinations(17, m, testVector1),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.064 1.079 1.160 1.012 1.086 2.318 100
#> cbGRbase 7.335 7.509 5.728 6.807 5.390 1.608 100
#> cbGtools 426.536 408.807 240.101 310.848 187.034 63.663 100
#> cbCombinat 97.756 97.586 60.406 75.415 46.391 41.089 100
#> cbArrangements 1.000 1.000 1.000 1.000 1.000 1.000 100
Now, we examine combinations with replacement chosen m at a time.
RcppAlgos
gtools
arrangements
How to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(97)
testVector2 <- sort(rnorm(10))
m <- 8
t1 <- comboGeneral(testVector2, m, repetition = TRUE)
t3 <- gtools::combinations(10, m, testVector2, repeats.allowed = TRUE)
identical(t1, t3)
#> [1] TRUE
## arrangements
t4 <- combinations(testVector2, m, replace = TRUE)
identical(t1, t4)
#> [1] TRUE
Benchmark:
microbenchmark(cbRcppAlgos = comboGeneral(testVector2, m, TRUE),
cbGtools = gtools::combinations(10, m, testVector2, repeats.allowed = TRUE),
cbArrangements = combinations(testVector2, m, replace = TRUE),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.000 1.000 1.000 1.000 1.000 1.00000 100
#> cbGtools 384.990 269.683 80.027 112.170 102.432 3.67517 100
#> cbArrangements 1.057 1.116 0.618 1.052 1.002 0.03638 100
3. Permutations
First, we examine permutations without replacement chosen m at a time.
RcppAlgos
gtools
arrangements
How to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(101)
testVector3 <- as.integer(c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29))
## RcppAlgos... permuteGeneral same as comboGeneral above
t1 <- permuteGeneral(testVector3, 6)
## gtools... permutations same as combinations above
t3 <- gtools::permutations(10, 6, testVector3)
identical(t1, t3)
#> [1] TRUE
## arrangements
t4 <- permutations(testVector3, 6)
identical(t1, t4)
#> [1] TRUE
Benchmark:
microbenchmark(cbRcppAlgos = permuteGeneral(testVector3, 6),
cbGtools = gtools::permutations(10, 6, testVector3),
cbArrangements = permutations(testVector3, 6),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.079 1.027 1.106 1.037 1.003 5.37 100
#> cbGtools 158.720 92.261 85.160 91.856 80.872 45.39 100
#> cbArrangements 1.000 1.000 1.000 1.000 1.000 1.00 100
Next, we examine permutations without replacement with a general vector (returning all permutations).
RcppAlgos
gtools
combinat
multicool
arrangements
How to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(89)
testVector3 <- as.integer(c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29))
testVector3Prime <- testVector3[1:7]
## For RcppAlgos, & gtools (see above)
## combinat
t4 <- combinat::permn(testVector3Prime) ## returns a list of vectors
## convert to a matrix
t4 <- do.call(rbind, t4)
## multicool.. we must first call initMC
t5 <- multicool::allPerm(multicool::initMC(testVector3Prime)) ## returns a matrix with n columns
all.equal(t4[do.call(order,as.data.frame(t4)),],
t5[do.call(order,as.data.frame(t5)),])
#> [1] TRUE
Benchmark:
microbenchmark(cbRcppAlgos = permuteGeneral(testVector3Prime, 7),
cbGtools = gtools::permutations(7, 7, testVector3Prime),
cbCombinat = combinat::permn(testVector3Prime),
cbMulticool = multicool::allPerm(multicool::initMC(testVector3Prime)),
cbArrangements = permutations(x = testVector3Prime, k = 7),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.152 1.275 0.7508 1.348 1.342 0.3159 100
#> cbGtools 965.465 817.645 340.4159 818.137 661.068 12.7042 100
#> cbCombinat 280.207 236.853 104.4777 238.228 208.467 9.6550 100
#> cbMulticool 2573.001 2109.246 851.3575 2039.531 1638.500 28.3597 100
#> cbArrangements 1.000 1.000 1.0000 1.000 1.000 1.0000 100
Now, we examine permutations without replacement for 1:n
(returning all permutations).
RcppAlgos
gtools
combinat
multicool
partitions
arrangements
How to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(89)
t1 <- partitions::perms(7) ## returns an object of type 'partition' with n rows
identical(t(as.matrix(t1)), permutations(7,7))
#> [1] TRUE
Benchmark:
microbenchmark(cbRcppAlgos = permuteGeneral(7, 7),
cbGtools = gtools::permutations(7, 7),
cbCombinat = combinat::permn(7),
cbMulticool = multicool::allPerm(multicool::initMC(1:7)),
cbPartitions = partitions::perms(7),
cbArrangements = permutations(7, 7),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max
#> cbRcppAlgos 1.235 1.429 1.412 1.503 1.484 1.720
#> cbGtools 1152.826 1000.736 812.620 939.565 793.373 499.029
#> cbCombinat 347.446 304.866 260.294 296.521 248.343 284.001
#> cbMulticool 3001.517 2416.716 1903.903 2237.362 1811.006 1311.219
#> cbPartitions 2.469 2.536 2.801 2.692 2.999 2.472
#> cbArrangements 1.000 1.000 1.000 1.000 1.000 1.000
#> neval
#> 100
#> 100
#> 100
#> 100
#> 100
#> 100
Lastly, we examine permutations with replacement.
RcppAlgos
iterpc
gtools
arrangements
How to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(34)
testVector3 <- as.integer(c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29))
t1 <- permuteGeneral(testVector3, 5, repetition = TRUE)
t3 <- gtools::permutations(10, 5, testVector3, repeats.allowed = TRUE)
t4 <- permutations(x = testVector3, k = 5, replace = TRUE)
This next benchmark is a little surprising given the results up until now.
microbenchmark(cbRcppAlgos = permuteGeneral(testVector3, 5, TRUE),
cbGtools = gtools::permutations(10, 5, testVector3, repeats.allowed = TRUE),
cbArrangements = permutations(x = testVector3, k = 5, replace = TRUE),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.106 0.9183 1.200 1.030 1.063 1.701 100
#> cbGtools 2.426 2.1815 2.068 1.996 2.127 1.367 100
#> cbArrangements 1.000 1.0000 1.000 1.000 1.000 1.000 100
That is not a typo... gtools::permutations
is almost as fast as the other compiled functions. I encourage the reader to go check out the source code of gtools::permutations
as it is one of the most elegant displays of programming out there (R
or otherwise).
4. Multisets
First, we examine combinations of multisets.
RcppAlgos
arrangements
To find combinations/permutations of multisets, with RcppAlgos
use the freqs
arguments to specify how many times each element of the source vector, v
, is repeated.
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(496)
myFreqs <- sample(1:5, 10, replace = TRUE)
## This is how many times each element will be repeated
myFreqs
#> [1] 2 4 4 5 3 2 2 2 3 4
testVector4 <- as.integer(c(1, 2, 3, 5, 8, 13, 21, 34, 55, 89))
t1 <- comboGeneral(testVector4, 12, freqs = myFreqs)
t3 <- combinations(freq = myFreqs, k = 12, x = testVector4)
identical(t1, t3)
#> [1] TRUE
Benchmark:
microbenchmark(cbRcppAlgos = comboGeneral(testVector4, 12, freqs = myFreqs),
cbArrangements = combinations(freq = myFreqs, k = 12, x = testVector4),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.000 1.000 1.000 1.000 1.000 1.000 100
#> cbArrangements 1.254 1.221 1.287 1.259 1.413 1.173 100
For permutations of multisets chosen m at a time, we have:
RcppAlgos
arrangements
How to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(8128)
myFreqs <- sample(1:3, 5, replace = TRUE)
testVector5 <- sort(runif(5))
myFreqs
#> [1] 2 2 2 1 3
t1 <- permuteGeneral(testVector5, 7, freqs = myFreqs)
t3 <- permutations(freq = myFreqs, k = 7, x = testVector5)
identical(t1, t3)
#> [1] TRUE
Benchmark:
microbenchmark(cbRcppAlgos = permuteGeneral(testVector5, 7, freqs = myFreqs),
cbArrangements = permutations(freq = myFreqs, k = 7, x = testVector5),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.461 1.327 1.282 1.177 1.176 1.101 100
#> cbArrangements 1.000 1.000 1.000 1.000 1.000 1.000 100
For permutations of multisets returning all permutations, we have:
RcppAlgos
multicool
arrangements
How to:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(8128)
myFreqs2 <- c(2,1,2,1,2)
testVector6 <- (1:5)^3
## For multicool, you must have the elements explicitly repeated
testVector6Prime <- rep(testVector6, times = myFreqs2)
t3 <- multicool::allPerm(multicool::initMC(testVector6Prime))
## for comparison
t1 <- permuteGeneral(testVector6, freqs = myFreqs2)
identical(t1[do.call(order,as.data.frame(t1)),],
t3[do.call(order,as.data.frame(t3)),])
#> [1] TRUE
Benchmark:
microbenchmark(cbRcppAlgos = permuteGeneral(testVector6, freqs = myFreqs2),
cbMulticool = multicool::allPerm(multicool::initMC(testVector6Prime)),
cbArrangements = permutations(freq = myFreqs2, x = testVector6),
unit = "relative")
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgos 1.276 1.374 1.119 1.461 1.39 0.8856 100
#> cbMulticool 2434.652 2135.862 855.946 2026.256 1521.74 31.0651 100
#> cbArrangements 1.000 1.000 1.000 1.000 1.00 1.0000 100
5. Summary
Both gtools
and combinat
are well established packages for rearranging elements of a vector. With gtools
there are a few more options (see the overview above) and with combinat
, you can rearrange factors
. With multicool
, one is able to rearrange multisets. Although partitions
and gRbase
are limited for the purposes of this question, they are powerhouses packed with highly efficient functions for dealing with partitions and array objects respectively.
arrangements
- The output is in dictionary order.
- Allows the user to specify the format via the
layout
argument (r = row-major
,c = column-major
, andl = list
). - Offers convenient methods such as
collect
&getnext
when working with iterators. - Allows for the generation of more than
2^31 - 1
combinations/permutations viagetnext
. N.B.RcppAlgos
(vialower/upper
see below) andmulticool
(vianextPerm
) are also capable of doing this. - Speaking of
getnext
, this function, allows for a specific number of results by utilizing thed
argument. - Supports gmp's big integers to compute number of combinations/permutations.
Observe:
library(arrangements)
icomb <- icombinations(1000, 7)
icomb$getnext(d = 5)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 2 3 4 5 6 7
#> [2,] 1 2 3 4 5 6 8
#> [3,] 1 2 3 4 5 6 9
#> [4,] 1 2 3 4 5 6 10
#> [5,] 1 2 3 4 5 6 11
This feature is really nice when you only want a few combinations/permutations. With traditional methods, you would have to generate all combinations/permutations and then subset. This would render the previous example impossible as there are more than 10^17
results (i.e. ncombinations(1000, 7, bigz = TRUE)
= 194280608456793000).
This feature along with the improvements to the generators in arrangements
, allow it to be very efficient with respect to memory.
RcppAlgos
- The output is in dictionary order.
- There are convenient constraint features that we will not discuss here as they are off-topic for this question. I will only note that the types of problems that can be solved by utilizing these features were the motivation for creating this package.
- There is an argument
upper
(formallyrowCap
) that is analogous to thed
argument ofgetnext
.
Observe:
library(RcppAlgos)
comboGeneral(1000, 7, upper = 5)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 2 3 4 5 6 7
#> [2,] 1 2 3 4 5 6 8
#> [3,] 1 2 3 4 5 6 9
#> [4,] 1 2 3 4 5 6 10
#> [5,] 1 2 3 4 5 6 11
- Additionally, as of
2.0.0
, there is an argument calledlower
that allows one to start generation at a specific combination/permutation. This sets up nicely for parallelization and allows for fast generation beyond2^31 - 1
as chunks are generated independently.
Parallel example with more than 6 billion combinations:
system.time(parallel::mclapply(seq(1,6397478649,4390857), function(x) {
a <- comboGeneral(25, 15, freqs = c(rep(1:5, 5)), lower = x, upper = x + 4390856)
## do something
x
}, mc.cores = 7))
#> user system elapsed
#> 510.623 140.970 109.496
In case you were wondering how each package scales, I will leave you with this final example that measures how fast each package can generate over 100 million results (N.B. gtools::combinations
is left out here as it will throw the error: evaluation nested too deeply...
). Also, we are explicitly calling combn
from the utils
package because I was unable to get a successful run from combinat::combn
. The differences in memory usage between these two is quite bizarre given that they are only marginally different (see ?utils::combn
under the "Authors" section).
Observe:
library(RcppAlgos)
library(arrangements)
library(microbenchmark)
options(digits = 4)
set.seed(2187)
testVector7 <- sort(sample(10^7, 10^3))
system.time(utils::combn(testVector7, 3))
#> user system elapsed
#> 179.956 5.687 187.159
system.time(RcppAlgos::comboGeneral(testVector7, 3))
#> user system elapsed
#> 1.136 0.758 1.937
system.time(arrangements::combinations(x = testVector7, k = 3))
#> user system elapsed
#> 1.963 0.930 2.910
system.time(RcppAlgos::permuteGeneral(testVector7[1:500], 3))
#> user system elapsed
#> 1.095 0.631 1.738
system.time(arrangements::permutations(x = testVector7[1:500], k = 3))
#> user system elapsed
#> 1.399 0.584 1.993
6. Memory
When executing comboGeneral
as well as arrangements::combinations
, the memory will jump almost 2 Gbs before calling gc
. This seems about right as #rows * #nols * bytesPerCell / 2^30 bytes = choose(1000,3) * 3 * 4 / 2^30 bytes = (166167000 * 3 * 4)/2^30 = 1.857 Gbs
). However, when executing combn
, the memory behavior was eratic (e.g. sometimes it would use all 16 Gb of memory and other times it would only spike a couple of Gbs). When I tested this on the Windows set-up, it would often crash.
We can confirm this using Rprof
along with summaryRporf
. Observe:
Rprof("RcppAlgos.out", memory.profiling = TRUE)
t1 <- RcppAlgos::comboGeneral(testVector7, 3)
Rprof(NULL)
summaryRprof("RcppAlgos.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"CombinatoricsRcpp" 1.2 100 1901.6 1.2 100
"RcppAlgos::comboGeneral" 1.2 100 1901.6 0.0 0
Rprof("arrangements.out", memory.profiling = TRUE)
t3 <- arrangements::combinations(10^3, 3, testVector7)
Rprof(NULL)
summaryRprof("arrangements.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
".Call" 2.08 99.05 1901.6 2.08 99.05
With RcppAlgos
& arrangements
, mem.total
registers just over 1900 Mb
.
And here is the memory profile on a smaller vector comparing gtools
, utils
, and combinat
.
testVector7Prime <- testVector7[1:300]
Rprof("combinat.out", memory.profiling = TRUE)
t3 <- combinat::combn(testVector7Prime, 3)
Rprof(NULL)
summaryRprof("combinat.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"combinat::combn" 3.98 100.00 1226.9 3.72 93.47
Rprof("utils.out", memory.profiling = TRUE)
t4 <- utils::combn(testVector7Prime, 3)
Rprof(NULL)
summaryRprof("utils.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"utils::combn" 2.52 100.00 1952.7 2.50 99.21
Rprof("gtools.out", memory.profiling = TRUE)
t5 <- gtools::combinations(300, 3, testVector7Prime)
Rprof(NULL)
summaryRprof("gtools.out", memory = "both")$by.total
total.time total.pct mem.total self.time self.pct
"rbind" 4.94 95.00 6741.6 4.40 84.62
Interestingly, utils::combn
and combinat::combn
use different amounts of memory and take different amounts of time to execute. This does not hold up with smaller vectors:
microbenchmark(combinat::combn(2:13, 6), utils::combn(2:13, 6))
Unit: microseconds
expr min lq mean median uq max neval
combinat::combn(2:13, 6) 527.378 567.946 629.1268 577.163 604.3270 1816.744 100
utils::combn(2:13, 6) 663.150 712.872 750.8008 725.716 771.1345 1205.697 100
And with gtools
the total memory used is a little over 3x as much as utils
. It should be noted that for these 3 packages, I obtained different results every-time I ran them (e.g. for combinat::combn
sometimes I would get 9000 Mb and then I would get 13000 Mb).
Still, none can match RcppAlgos
OR arrangements
. Both only use 51 Mb when ran on the example above.
benchmark script: https://gist.github.com/randy3k/bd5730a6d70101c7471f4ae6f453862e (rendered by https://github.com/tidyverse/reprex)
*: An homage to A Walk through Combinatorics by Miklós Bóna
来源:https://stackoverflow.com/questions/22569176/r-permutations-and-combinations-with-without-replacement-and-for-distinct-non-d