问题
I want to write an R code to generate all distinct permutations of a list with a repeated characters in an efficient way. For example,
x<-c(1,1,2,2,3,4);
library(combinat);
unique(permn(x))
works, but it is very inefficient and dose not work if the length of the vector x is a bit longer. Does anybody know how to generate the unique permutations of above sequence in an efficient way?
回答1:
Permutations are unwieldy beasts. The number of permutations you get when selecting r objects from a set of n is
When selecting all of them, meaning r = n, this reduces to
For a set of 6 values, this is only 720, which isn't so impressive, but just look how quickly these numbers explode as you increase the size of the set:
data.frame(n=1:12,P=factorial(1:12));
## n P
## 1 1 1
## 2 2 2
## 3 3 6
## 4 4 24
## 5 5 120
## 6 6 720
## 7 7 5040
## 8 8 40320
## 9 9 362880
## 10 10 3628800
## 11 11 39916800
## 12 12 479001600
I doubt you're interested in building a list with 479,001,600 components!
Now, in your exact problem, you actually have what's called a multiset, which means you're dealing with multiset permutations, which reduces the total number of permutations. You probably think this will save you from the unwieldy-ness. Well, think again!
The formula for multiset permutations is
Starting with your example vector x
, we have 180 (factorial(6)/factorial(2)^2
), which again sounds wieldy, but let's try adding pairs to your vector and see what happens:
data.frame(m=paste('2 singles and',2:7,'pairs'),P=factorial(2+2*2:7)/factorial(2)^(2:7));
## m P
## 1 2 singles and 2 pairs 180
## 2 2 singles and 3 pairs 5040
## 3 2 singles and 4 pairs 226800
## 4 2 singles and 5 pairs 14968800
## 5 2 singles and 6 pairs 1362160800
## 6 2 singles and 7 pairs 163459296000
I decided to stop at 7 pairs because beyond that R starts to do its scientific notation thing which is annoying to look at.
How about a couple of pairs and adding sets of 4? Let's try it.
data.frame(m=paste('2 pairs and',1:3,'quartets'),P=factorial(2*2+4*1:3)/(factorial(2)^2*factorial(4)^(1:3)));
## m P
## 1 2 pairs and 1 quartets 420
## 2 2 pairs and 2 quartets 207900
## 3 2 pairs and 3 quartets 378378000
This time I had to stop at 3 to avoid scientific notation.
The point I'm trying to make with all of the above is that you can't go much beyond your current vector x
and hope to generate all (multiset) permutations, using any algorithm; the numbers are just too big.
Despite all of the above, I did try to find a solution for your problem, which could be useful for a narrow range of multisets that are too large for combinat::permn()
, but that are not too large to be completely impossible for any computer system to handle. I came up with the following recursive function (plus wrapper function):
gpermuteImpl <- function(uf) do.call(rbind,lapply(1:nrow(uf),function(r) { u <- uf$u[r]; if (uf$f[r] == 1L) if (nrow(uf) == 1L) return(u) else uf <- uf[-r,] else uf$f[r] <- uf$f[r]-1L; return(cbind(u,gpermuteImpl(uf))); }));
gpermute <- function(x) unname(gpermuteImpl(data.frame(u=unique(x),f=tabulate(x))));
This actually generates a matrix rather than a list. There's a lot of looping and rbind()
ing and cbind()
ing here, so it may not be the most efficient design possible, but it makes the code rather concise, since the cbind()
ing automatically copies each selected value for all permutations that use that value in that position.
Here's a demo of it in action on your vector x
:
x <- c(1,1,2,2,3,4);
gpermute(x);
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 1 2 2 3 4
## [2,] 1 1 2 2 4 3
## [3,] 1 1 2 3 2 4
## [4,] 1 1 2 3 4 2
## [5,] 1 1 2 4 2 3
## [6,] 1 1 2 4 3 2
## [7,] 1 1 3 2 2 4
## [8,] 1 1 3 2 4 2
## [9,] 1 1 3 4 2 2
## [10,] 1 1 4 2 2 3
## [11,] 1 1 4 2 3 2
## [12,] 1 1 4 3 2 2
## [13,] 1 2 1 2 3 4
## [14,] 1 2 1 2 4 3
## [15,] 1 2 1 3 2 4
## [16,] 1 2 1 3 4 2
## [17,] 1 2 1 4 2 3
## [18,] 1 2 1 4 3 2
## [19,] 1 2 2 1 3 4
## [20,] 1 2 2 1 4 3
## [21,] 1 2 2 3 1 4
## [22,] 1 2 2 3 4 1
## [23,] 1 2 2 4 1 3
## [24,] 1 2 2 4 3 1
## [25,] 1 2 3 1 2 4
## [26,] 1 2 3 1 4 2
## [27,] 1 2 3 2 1 4
## [28,] 1 2 3 2 4 1
## [29,] 1 2 3 4 1 2
## [30,] 1 2 3 4 2 1
## [31,] 1 2 4 1 2 3
## [32,] 1 2 4 1 3 2
## [33,] 1 2 4 2 1 3
## [34,] 1 2 4 2 3 1
## [35,] 1 2 4 3 1 2
## [36,] 1 2 4 3 2 1
## [37,] 1 3 1 2 2 4
## [38,] 1 3 1 2 4 2
## [39,] 1 3 1 4 2 2
## [40,] 1 3 2 1 2 4
## [41,] 1 3 2 1 4 2
## [42,] 1 3 2 2 1 4
## [43,] 1 3 2 2 4 1
## [44,] 1 3 2 4 1 2
## [45,] 1 3 2 4 2 1
## [46,] 1 3 4 1 2 2
## [47,] 1 3 4 2 1 2
## [48,] 1 3 4 2 2 1
## [49,] 1 4 1 2 2 3
## [50,] 1 4 1 2 3 2
## [51,] 1 4 1 3 2 2
## [52,] 1 4 2 1 2 3
## [53,] 1 4 2 1 3 2
## [54,] 1 4 2 2 1 3
## [55,] 1 4 2 2 3 1
## [56,] 1 4 2 3 1 2
## [57,] 1 4 2 3 2 1
## [58,] 1 4 3 1 2 2
## [59,] 1 4 3 2 1 2
## [60,] 1 4 3 2 2 1
## [61,] 2 1 1 2 3 4
## [62,] 2 1 1 2 4 3
## [63,] 2 1 1 3 2 4
## [64,] 2 1 1 3 4 2
## [65,] 2 1 1 4 2 3
## [66,] 2 1 1 4 3 2
## [67,] 2 1 2 1 3 4
## [68,] 2 1 2 1 4 3
## [69,] 2 1 2 3 1 4
## [70,] 2 1 2 3 4 1
## [71,] 2 1 2 4 1 3
## [72,] 2 1 2 4 3 1
## [73,] 2 1 3 1 2 4
## [74,] 2 1 3 1 4 2
## [75,] 2 1 3 2 1 4
## [76,] 2 1 3 2 4 1
## [77,] 2 1 3 4 1 2
## [78,] 2 1 3 4 2 1
## [79,] 2 1 4 1 2 3
## [80,] 2 1 4 1 3 2
## [81,] 2 1 4 2 1 3
## [82,] 2 1 4 2 3 1
## [83,] 2 1 4 3 1 2
## [84,] 2 1 4 3 2 1
## [85,] 2 2 1 1 3 4
## [86,] 2 2 1 1 4 3
## [87,] 2 2 1 3 1 4
## [88,] 2 2 1 3 4 1
## [89,] 2 2 1 4 1 3
## [90,] 2 2 1 4 3 1
## [91,] 2 2 3 1 1 4
## [92,] 2 2 3 1 4 1
## [93,] 2 2 3 4 1 1
## [94,] 2 2 4 1 1 3
## [95,] 2 2 4 1 3 1
## [96,] 2 2 4 3 1 1
## [97,] 2 3 1 1 2 4
## [98,] 2 3 1 1 4 2
## [99,] 2 3 1 2 1 4
## [100,] 2 3 1 2 4 1
## [101,] 2 3 1 4 1 2
## [102,] 2 3 1 4 2 1
## [103,] 2 3 2 1 1 4
## [104,] 2 3 2 1 4 1
## [105,] 2 3 2 4 1 1
## [106,] 2 3 4 1 1 2
## [107,] 2 3 4 1 2 1
## [108,] 2 3 4 2 1 1
## [109,] 2 4 1 1 2 3
## [110,] 2 4 1 1 3 2
## [111,] 2 4 1 2 1 3
## [112,] 2 4 1 2 3 1
## [113,] 2 4 1 3 1 2
## [114,] 2 4 1 3 2 1
## [115,] 2 4 2 1 1 3
## [116,] 2 4 2 1 3 1
## [117,] 2 4 2 3 1 1
## [118,] 2 4 3 1 1 2
## [119,] 2 4 3 1 2 1
## [120,] 2 4 3 2 1 1
## [121,] 3 1 1 2 2 4
## [122,] 3 1 1 2 4 2
## [123,] 3 1 1 4 2 2
## [124,] 3 1 2 1 2 4
## [125,] 3 1 2 1 4 2
## [126,] 3 1 2 2 1 4
## [127,] 3 1 2 2 4 1
## [128,] 3 1 2 4 1 2
## [129,] 3 1 2 4 2 1
## [130,] 3 1 4 1 2 2
## [131,] 3 1 4 2 1 2
## [132,] 3 1 4 2 2 1
## [133,] 3 2 1 1 2 4
## [134,] 3 2 1 1 4 2
## [135,] 3 2 1 2 1 4
## [136,] 3 2 1 2 4 1
## [137,] 3 2 1 4 1 2
## [138,] 3 2 1 4 2 1
## [139,] 3 2 2 1 1 4
## [140,] 3 2 2 1 4 1
## [141,] 3 2 2 4 1 1
## [142,] 3 2 4 1 1 2
## [143,] 3 2 4 1 2 1
## [144,] 3 2 4 2 1 1
## [145,] 3 4 1 1 2 2
## [146,] 3 4 1 2 1 2
## [147,] 3 4 1 2 2 1
## [148,] 3 4 2 1 1 2
## [149,] 3 4 2 1 2 1
## [150,] 3 4 2 2 1 1
## [151,] 4 1 1 2 2 3
## [152,] 4 1 1 2 3 2
## [153,] 4 1 1 3 2 2
## [154,] 4 1 2 1 2 3
## [155,] 4 1 2 1 3 2
## [156,] 4 1 2 2 1 3
## [157,] 4 1 2 2 3 1
## [158,] 4 1 2 3 1 2
## [159,] 4 1 2 3 2 1
## [160,] 4 1 3 1 2 2
## [161,] 4 1 3 2 1 2
## [162,] 4 1 3 2 2 1
## [163,] 4 2 1 1 2 3
## [164,] 4 2 1 1 3 2
## [165,] 4 2 1 2 1 3
## [166,] 4 2 1 2 3 1
## [167,] 4 2 1 3 1 2
## [168,] 4 2 1 3 2 1
## [169,] 4 2 2 1 1 3
## [170,] 4 2 2 1 3 1
## [171,] 4 2 2 3 1 1
## [172,] 4 2 3 1 1 2
## [173,] 4 2 3 1 2 1
## [174,] 4 2 3 2 1 1
## [175,] 4 3 1 1 2 2
## [176,] 4 3 1 2 1 2
## [177,] 4 3 1 2 2 1
## [178,] 4 3 2 1 1 2
## [179,] 4 3 2 1 2 1
## [180,] 4 3 2 2 1 1
We can prove that the result is identical to the result returned by unique(permn(x))
with the following code, which unfortunately has to be slightly involved, because (1) we have a list vs. matrix type mismatch, and (2) the permutation order happens to be different between the two solutions.
library('combinat');
mcombinat <- do.call(rbind,unique(permn(x)));
mcombinat.sorted <- mcombinat[do.call(order,lapply(1:ncol(mcombinat),function(c) mcombinat[,c])),];
mbgoldst <- gpermute(x);
identical(mcombinat.sorted,mbgoldst);
## [1] TRUE
Finally, let me demonstrate the code on the 2 pairs and 2 quartets input:
x2 <- c(1,1,1,1,2,2,2,2,3,3,4,4);
system.time({ m <- gpermute(x2); });
## user system elapsed
## 36.547 0.000 36.593
head(m);
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1 1 1 1 2 2 2 2 3 3 4 4
## [2,] 1 1 1 1 2 2 2 2 3 4 3 4
## [3,] 1 1 1 1 2 2 2 2 3 4 4 3
## [4,] 1 1 1 1 2 2 2 2 4 3 3 4
## [5,] 1 1 1 1 2 2 2 2 4 3 4 3
## [6,] 1 1 1 1 2 2 2 2 4 4 3 3
nrow(m);
## [1] 207900
So it took some time, but it got done. I tried running unique(permn(x2))
, but it didn't finish after tens of minutes, and I think we can probably assume it would never finish, since it would have to generate 479,001,600 non-unique permutations, which unique()
would then have the task of making unique. A quick calculation also suggests it would require about 46GB of RAM, which almost doubles the 24GB of RAM I have available on my system. That could also be a problem...
回答2:
Use the RccpAlgos
package.
> library(RcppAlgos)
> x <- permuteGeneral(c(1,2,3,4), freqs = c(2,2,1,1))
> dim(x)
[1] 180 6
> head(x)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 2 3 4
[2,] 1 1 2 2 4 3
[3,] 1 1 2 3 2 4
[4,] 1 1 2 3 4 2
[5,] 1 1 2 4 2 3
[6,] 1 1 2 4 3 2
回答3:
I have no idea if this is any faster, the bottleneck might be the high dimensionality of your problem as x
gets any longer:
library(gtools)
x=c(1,1,2,2,3,4)
order = permutations(n=length(x), r=length(x))
x = matrix(x[order], ncol=6)
x = x[!duplicated(x), ]
来源:https://stackoverflow.com/questions/30963085/distinct-permutations-of-a-list-with-repetitions