distinct permutations of a list with repetitions

我的梦境 提交于 2021-01-27 19:26:26

问题


I want to write an R code to generate all distinct permutations of a list with a repeated characters in an efficient way. For example,

x<-c(1,1,2,2,3,4);    
library(combinat);
unique(permn(x))    

works, but it is very inefficient and dose not work if the length of the vector x is a bit longer. Does anybody know how to generate the unique permutations of above sequence in an efficient way?


回答1:


Permutations are unwieldy beasts. The number of permutations you get when selecting r objects from a set of n is

perm

When selecting all of them, meaning r = n, this reduces to

permall

For a set of 6 values, this is only 720, which isn't so impressive, but just look how quickly these numbers explode as you increase the size of the set:

data.frame(n=1:12,P=factorial(1:12));
##     n         P
## 1   1         1
## 2   2         2
## 3   3         6
## 4   4        24
## 5   5       120
## 6   6       720
## 7   7      5040
## 8   8     40320
## 9   9    362880
## 10 10   3628800
## 11 11  39916800
## 12 12 479001600

I doubt you're interested in building a list with 479,001,600 components!


Now, in your exact problem, you actually have what's called a multiset, which means you're dealing with multiset permutations, which reduces the total number of permutations. You probably think this will save you from the unwieldy-ness. Well, think again!

The formula for multiset permutations is

permmulti

Starting with your example vector x, we have 180 (factorial(6)/factorial(2)^2), which again sounds wieldy, but let's try adding pairs to your vector and see what happens:

data.frame(m=paste('2 singles and',2:7,'pairs'),P=factorial(2+2*2:7)/factorial(2)^(2:7));
##                       m            P
## 1 2 singles and 2 pairs          180
## 2 2 singles and 3 pairs         5040
## 3 2 singles and 4 pairs       226800
## 4 2 singles and 5 pairs     14968800
## 5 2 singles and 6 pairs   1362160800
## 6 2 singles and 7 pairs 163459296000

I decided to stop at 7 pairs because beyond that R starts to do its scientific notation thing which is annoying to look at.

How about a couple of pairs and adding sets of 4? Let's try it.

data.frame(m=paste('2 pairs and',1:3,'quartets'),P=factorial(2*2+4*1:3)/(factorial(2)^2*factorial(4)^(1:3)));
##                        m         P
## 1 2 pairs and 1 quartets       420
## 2 2 pairs and 2 quartets    207900
## 3 2 pairs and 3 quartets 378378000

This time I had to stop at 3 to avoid scientific notation.

The point I'm trying to make with all of the above is that you can't go much beyond your current vector x and hope to generate all (multiset) permutations, using any algorithm; the numbers are just too big.


Despite all of the above, I did try to find a solution for your problem, which could be useful for a narrow range of multisets that are too large for combinat::permn(), but that are not too large to be completely impossible for any computer system to handle. I came up with the following recursive function (plus wrapper function):

gpermuteImpl <- function(uf) do.call(rbind,lapply(1:nrow(uf),function(r) { u <- uf$u[r]; if (uf$f[r] == 1L) if (nrow(uf) == 1L) return(u) else uf <- uf[-r,] else uf$f[r] <- uf$f[r]-1L; return(cbind(u,gpermuteImpl(uf))); }));
gpermute <- function(x) unname(gpermuteImpl(data.frame(u=unique(x),f=tabulate(x))));      

This actually generates a matrix rather than a list. There's a lot of looping and rbind()ing and cbind()ing here, so it may not be the most efficient design possible, but it makes the code rather concise, since the cbind()ing automatically copies each selected value for all permutations that use that value in that position.

Here's a demo of it in action on your vector x:

x <- c(1,1,2,2,3,4);
gpermute(x);
##        [,1] [,2] [,3] [,4] [,5] [,6]
##   [1,]    1    1    2    2    3    4
##   [2,]    1    1    2    2    4    3
##   [3,]    1    1    2    3    2    4
##   [4,]    1    1    2    3    4    2
##   [5,]    1    1    2    4    2    3
##   [6,]    1    1    2    4    3    2
##   [7,]    1    1    3    2    2    4
##   [8,]    1    1    3    2    4    2
##   [9,]    1    1    3    4    2    2
##  [10,]    1    1    4    2    2    3
##  [11,]    1    1    4    2    3    2
##  [12,]    1    1    4    3    2    2
##  [13,]    1    2    1    2    3    4
##  [14,]    1    2    1    2    4    3
##  [15,]    1    2    1    3    2    4
##  [16,]    1    2    1    3    4    2
##  [17,]    1    2    1    4    2    3
##  [18,]    1    2    1    4    3    2
##  [19,]    1    2    2    1    3    4
##  [20,]    1    2    2    1    4    3
##  [21,]    1    2    2    3    1    4
##  [22,]    1    2    2    3    4    1
##  [23,]    1    2    2    4    1    3
##  [24,]    1    2    2    4    3    1
##  [25,]    1    2    3    1    2    4
##  [26,]    1    2    3    1    4    2
##  [27,]    1    2    3    2    1    4
##  [28,]    1    2    3    2    4    1
##  [29,]    1    2    3    4    1    2
##  [30,]    1    2    3    4    2    1
##  [31,]    1    2    4    1    2    3
##  [32,]    1    2    4    1    3    2
##  [33,]    1    2    4    2    1    3
##  [34,]    1    2    4    2    3    1
##  [35,]    1    2    4    3    1    2
##  [36,]    1    2    4    3    2    1
##  [37,]    1    3    1    2    2    4
##  [38,]    1    3    1    2    4    2
##  [39,]    1    3    1    4    2    2
##  [40,]    1    3    2    1    2    4
##  [41,]    1    3    2    1    4    2
##  [42,]    1    3    2    2    1    4
##  [43,]    1    3    2    2    4    1
##  [44,]    1    3    2    4    1    2
##  [45,]    1    3    2    4    2    1
##  [46,]    1    3    4    1    2    2
##  [47,]    1    3    4    2    1    2
##  [48,]    1    3    4    2    2    1
##  [49,]    1    4    1    2    2    3
##  [50,]    1    4    1    2    3    2
##  [51,]    1    4    1    3    2    2
##  [52,]    1    4    2    1    2    3
##  [53,]    1    4    2    1    3    2
##  [54,]    1    4    2    2    1    3
##  [55,]    1    4    2    2    3    1
##  [56,]    1    4    2    3    1    2
##  [57,]    1    4    2    3    2    1
##  [58,]    1    4    3    1    2    2
##  [59,]    1    4    3    2    1    2
##  [60,]    1    4    3    2    2    1
##  [61,]    2    1    1    2    3    4
##  [62,]    2    1    1    2    4    3
##  [63,]    2    1    1    3    2    4
##  [64,]    2    1    1    3    4    2
##  [65,]    2    1    1    4    2    3
##  [66,]    2    1    1    4    3    2
##  [67,]    2    1    2    1    3    4
##  [68,]    2    1    2    1    4    3
##  [69,]    2    1    2    3    1    4
##  [70,]    2    1    2    3    4    1
##  [71,]    2    1    2    4    1    3
##  [72,]    2    1    2    4    3    1
##  [73,]    2    1    3    1    2    4
##  [74,]    2    1    3    1    4    2
##  [75,]    2    1    3    2    1    4
##  [76,]    2    1    3    2    4    1
##  [77,]    2    1    3    4    1    2
##  [78,]    2    1    3    4    2    1
##  [79,]    2    1    4    1    2    3
##  [80,]    2    1    4    1    3    2
##  [81,]    2    1    4    2    1    3
##  [82,]    2    1    4    2    3    1
##  [83,]    2    1    4    3    1    2
##  [84,]    2    1    4    3    2    1
##  [85,]    2    2    1    1    3    4
##  [86,]    2    2    1    1    4    3
##  [87,]    2    2    1    3    1    4
##  [88,]    2    2    1    3    4    1
##  [89,]    2    2    1    4    1    3
##  [90,]    2    2    1    4    3    1
##  [91,]    2    2    3    1    1    4
##  [92,]    2    2    3    1    4    1
##  [93,]    2    2    3    4    1    1
##  [94,]    2    2    4    1    1    3
##  [95,]    2    2    4    1    3    1
##  [96,]    2    2    4    3    1    1
##  [97,]    2    3    1    1    2    4
##  [98,]    2    3    1    1    4    2
##  [99,]    2    3    1    2    1    4
## [100,]    2    3    1    2    4    1
## [101,]    2    3    1    4    1    2
## [102,]    2    3    1    4    2    1
## [103,]    2    3    2    1    1    4
## [104,]    2    3    2    1    4    1
## [105,]    2    3    2    4    1    1
## [106,]    2    3    4    1    1    2
## [107,]    2    3    4    1    2    1
## [108,]    2    3    4    2    1    1
## [109,]    2    4    1    1    2    3
## [110,]    2    4    1    1    3    2
## [111,]    2    4    1    2    1    3
## [112,]    2    4    1    2    3    1
## [113,]    2    4    1    3    1    2
## [114,]    2    4    1    3    2    1
## [115,]    2    4    2    1    1    3
## [116,]    2    4    2    1    3    1
## [117,]    2    4    2    3    1    1
## [118,]    2    4    3    1    1    2
## [119,]    2    4    3    1    2    1
## [120,]    2    4    3    2    1    1
## [121,]    3    1    1    2    2    4
## [122,]    3    1    1    2    4    2
## [123,]    3    1    1    4    2    2
## [124,]    3    1    2    1    2    4
## [125,]    3    1    2    1    4    2
## [126,]    3    1    2    2    1    4
## [127,]    3    1    2    2    4    1
## [128,]    3    1    2    4    1    2
## [129,]    3    1    2    4    2    1
## [130,]    3    1    4    1    2    2
## [131,]    3    1    4    2    1    2
## [132,]    3    1    4    2    2    1
## [133,]    3    2    1    1    2    4
## [134,]    3    2    1    1    4    2
## [135,]    3    2    1    2    1    4
## [136,]    3    2    1    2    4    1
## [137,]    3    2    1    4    1    2
## [138,]    3    2    1    4    2    1
## [139,]    3    2    2    1    1    4
## [140,]    3    2    2    1    4    1
## [141,]    3    2    2    4    1    1
## [142,]    3    2    4    1    1    2
## [143,]    3    2    4    1    2    1
## [144,]    3    2    4    2    1    1
## [145,]    3    4    1    1    2    2
## [146,]    3    4    1    2    1    2
## [147,]    3    4    1    2    2    1
## [148,]    3    4    2    1    1    2
## [149,]    3    4    2    1    2    1
## [150,]    3    4    2    2    1    1
## [151,]    4    1    1    2    2    3
## [152,]    4    1    1    2    3    2
## [153,]    4    1    1    3    2    2
## [154,]    4    1    2    1    2    3
## [155,]    4    1    2    1    3    2
## [156,]    4    1    2    2    1    3
## [157,]    4    1    2    2    3    1
## [158,]    4    1    2    3    1    2
## [159,]    4    1    2    3    2    1
## [160,]    4    1    3    1    2    2
## [161,]    4    1    3    2    1    2
## [162,]    4    1    3    2    2    1
## [163,]    4    2    1    1    2    3
## [164,]    4    2    1    1    3    2
## [165,]    4    2    1    2    1    3
## [166,]    4    2    1    2    3    1
## [167,]    4    2    1    3    1    2
## [168,]    4    2    1    3    2    1
## [169,]    4    2    2    1    1    3
## [170,]    4    2    2    1    3    1
## [171,]    4    2    2    3    1    1
## [172,]    4    2    3    1    1    2
## [173,]    4    2    3    1    2    1
## [174,]    4    2    3    2    1    1
## [175,]    4    3    1    1    2    2
## [176,]    4    3    1    2    1    2
## [177,]    4    3    1    2    2    1
## [178,]    4    3    2    1    1    2
## [179,]    4    3    2    1    2    1
## [180,]    4    3    2    2    1    1

We can prove that the result is identical to the result returned by unique(permn(x)) with the following code, which unfortunately has to be slightly involved, because (1) we have a list vs. matrix type mismatch, and (2) the permutation order happens to be different between the two solutions.

library('combinat');
mcombinat <- do.call(rbind,unique(permn(x)));
mcombinat.sorted <- mcombinat[do.call(order,lapply(1:ncol(mcombinat),function(c) mcombinat[,c])),];
mbgoldst <- gpermute(x);
identical(mcombinat.sorted,mbgoldst);
## [1] TRUE

Finally, let me demonstrate the code on the 2 pairs and 2 quartets input:

x2 <- c(1,1,1,1,2,2,2,2,3,3,4,4);
system.time({ m <- gpermute(x2); });
##    user  system elapsed
##  36.547   0.000  36.593
head(m);
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]    1    1    1    1    2    2    2    2    3     3     4     4
## [2,]    1    1    1    1    2    2    2    2    3     4     3     4
## [3,]    1    1    1    1    2    2    2    2    3     4     4     3
## [4,]    1    1    1    1    2    2    2    2    4     3     3     4
## [5,]    1    1    1    1    2    2    2    2    4     3     4     3
## [6,]    1    1    1    1    2    2    2    2    4     4     3     3
nrow(m);
## [1] 207900

So it took some time, but it got done. I tried running unique(permn(x2)), but it didn't finish after tens of minutes, and I think we can probably assume it would never finish, since it would have to generate 479,001,600 non-unique permutations, which unique() would then have the task of making unique. A quick calculation also suggests it would require about 46GB of RAM, which almost doubles the 24GB of RAM I have available on my system. That could also be a problem...




回答2:


Use the RccpAlgos package.

> library(RcppAlgos)
> x <- permuteGeneral(c(1,2,3,4), freqs = c(2,2,1,1))
> dim(x)
[1] 180   6
> head(x)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    2    3    4
[2,]    1    1    2    2    4    3
[3,]    1    1    2    3    2    4
[4,]    1    1    2    3    4    2
[5,]    1    1    2    4    2    3
[6,]    1    1    2    4    3    2



回答3:


I have no idea if this is any faster, the bottleneck might be the high dimensionality of your problem as x gets any longer:

library(gtools)
x=c(1,1,2,2,3,4)
order = permutations(n=length(x), r=length(x))
x = matrix(x[order], ncol=6)
x = x[!duplicated(x), ]


来源:https://stackoverflow.com/questions/30963085/distinct-permutations-of-a-list-with-repetitions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!