Permute all unique enumerations of a vector in R

前端 未结 7 679
忘了有多久
忘了有多久 2020-11-29 06:21

I\'m trying to find a function that will permute all the unique permutations of a vector, while not counting juxtapositions within subsets of the same element type.

相关标签:
7条回答
  • 2020-11-29 06:27

    One option that hasn't been mentioned here is the allPerm function from the multicool package. It can be used pretty easily to get all the unique permutations:

    library(multicool)
    perms <- allPerm(initMC(dat))
    dim(perms)
    # [1] 18900    10
    head(perms)
    #      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    # [1,]    4    4    3    3    1    1    0    0    0     0
    # [2,]    0    4    4    3    3    1    1    0    0     0
    # [3,]    4    0    4    3    3    1    1    0    0     0
    # [4,]    4    4    0    3    3    1    1    0    0     0
    # [5,]    3    4    4    0    3    1    1    0    0     0
    # [6,]    4    3    4    0    3    1    1    0    0     0
    

    In benchmarking I found it to be faster on dat than the solutions from the OP and daroczig but slower than the solution from Aaron.

    0 讨论(0)
  • 2020-11-29 06:29

    I don't actually know R, but here's how I'd approach the problem:

    Find how many of each element type, i.e.

    4 X 0
    2 X 1
    2 X 3
    2 X 4
    

    Sort by frequency (which the above already is).

    Start with the most frequent value, which takes up 4 of the 10 spots. Determine the unique combinations of 4 values within the 10 available spots. (0,1,2,3),(0,1,2,4),(0,1,2,5),(0,1,2,6) ... (0,1,2,9),(0,1,3,4),(0,1,3,5) ... (6,7,8,9)

    Go to the second most frequent value, it takes up 2 of 6 available spots, and determine it's unique combinations of 2 of 6. (0,1),(0,2),(0,3),(0,4),(0,5),(1,2),(1,3) ... (4,6),(5,6)

    Then 2 of 4: (0,1),(0,2),(0,3),(1,2),(1,3),(2,3)

    And the remaining values, 2 of 2: (0,1)

    Then you need to combine them into each possible combination. Here's some pseudocode (I'm convinced there's a more efficient algorithm for this, but this shouldn't be too bad):

    lookup = (0,1,3,4)
    For each of the above sets of combinations, example: input = ((0,2,4,6),(0,2),(2,3),(0,1))
    newPermutation = (-1,-1,-1,-1,-1,-1,-1,-1,-1,-1)
    for i = 0 to 3
      index = 0
      for j = 0 to 9
        if newPermutation(j) = -1
          if index = input(i)(j)
            newPermutation(j) = lookup(i)
            break
          else
            index = index + 1
    
    0 讨论(0)
  • 2020-11-29 06:31

    The following function (which implements the classic formula for repeated permutations just like you did manually in your question) seems quite fast to me:

    upermn <- function(x) {
        n <- length(x)
        duplicates <- as.numeric(table(x))
        factorial(n) / prod(factorial(duplicates))
    }
    

    It does compute n! but not like permn function which generates all permutations first.

    See it in action:

    > dat <- c(1,0,3,4,1,0,0,3,0,4)
    > upermn(dat)
    [1] 18900
    > system.time(uperm(dat))
       user  system elapsed 
      0.000   0.000   0.001 
    

    UPDATE: I have just realized that the question was about generating all unique permutations not just specifying the number of them - sorry for that!

    You could improve the unique(perm(...)) part with specifying unique permutations for one less element and later adding the uniqe elements in front of them. Well, my explanation may fail, so let the source speak:

    uperm <- function(x) {
    u <- unique(x)                    # unique values of the vector
    result <- x                       # let's start the result matrix with the vector
    for (i in 1:length(u)) {
        v <- x[-which(x==u[i])[1]]    # leave the first occurance of duplicated values
        result <- rbind(result, cbind(u[i], do.call(rbind, unique(permn(v)))))
    }
    return(result)
    }
    

    This way you could gain some speed. I was lazy to run the code on the vector you provided (took so much time), here is a small comparison on a smaller vector:

    > dat <- c(1,0,3,4,1,0,0)
    > system.time(unique(permn(dat)))
       user  system elapsed 
      0.264   0.000   0.268 
    > system.time(uperm(dat))
       user  system elapsed 
      0.147   0.000   0.150 
    

    I think you could gain a lot more by rewriting this function to be recursive!


    UPDATE (again): I have tried to make up a recursive function with my limited knowledge:

    uperm <- function(x) {
        u <- sort(unique(x))
        l <- length(u)
        if (l == length(x)) {
            return(do.call(rbind,permn(x)))
        }
        if (l == 1) return(x)
        result <- matrix(NA, upermn(x), length(x))
        index <- 1
        for (i in 1:l) {
            v <- x[-which(x==u[i])[1]]
            newindex <- upermn(v)
            if (table(x)[i] == 1) {
                result[index:(index+newindex-1),] <- cbind(u[i], do.call(rbind, unique(permn(v))))
                } else {
                    result[index:(index+newindex-1),] <- cbind(u[i], uperm(v))
                }
            index <- index+newindex
        }
        return(result)
    }
    

    Which has a great gain:

    > system.time(unique(permn(c(1,0,3,4,1,0,0,3,0))))
       user  system elapsed 
     22.808   0.103  23.241 
    
    > system.time(uperm(c(1,0,3,4,1,0,0,3,0)))
       user  system elapsed 
      4.613   0.003   4.645 
    

    Please report back if this would work for you!

    0 讨论(0)
  • 2020-11-29 06:43

    EDIT: Here's a faster answer; again based on the ideas of Louisa Grey and Bryce Wagner, but with faster R code thanks to better use of matrix indexing. It's quite a bit faster than my original:

    > ffffd <- c(1,0,3,4,1,0,0,3,0,4)
    > system.time(up1 <- uniqueperm(d))
       user  system elapsed 
      0.183   0.000   0.186 
    > system.time(up2 <- uniqueperm2(d))
       user  system elapsed 
      0.037   0.000   0.038 
    

    And the code:

    uniqueperm2 <- function(d) {
      dat <- factor(d)
      N <- length(dat)
      n <- tabulate(dat)
      ng <- length(n)
      if(ng==1) return(d)
      a <- N-c(0,cumsum(n))[-(ng+1)]
      foo <- lapply(1:ng, function(i) matrix(combn(a[i],n[i]),nrow=n[i]))
      out <- matrix(NA, nrow=N, ncol=prod(sapply(foo, ncol)))
      xxx <- c(0,cumsum(sapply(foo, nrow)))
      xxx <- cbind(xxx[-length(xxx)]+1, xxx[-1])
      miss <- matrix(1:N,ncol=1)
      for(i in seq_len(length(foo)-1)) {
        l1 <- foo[[i]]
        nn <- ncol(miss)
        miss <- matrix(rep(miss, ncol(l1)), nrow=nrow(miss))
        k <- (rep(0:(ncol(miss)-1), each=nrow(l1)))*nrow(miss) + 
                   l1[,rep(1:ncol(l1), each=nn)]
        out[xxx[i,1]:xxx[i,2],] <- matrix(miss[k], ncol=ncol(miss))
        miss <- matrix(miss[-k], ncol=ncol(miss))
      }
      k <- length(foo)
      out[xxx[k,1]:xxx[k,2],] <- miss
      out <- out[rank(as.numeric(dat), ties="first"),]
      foo <- cbind(as.vector(out), as.vector(col(out)))
      out[foo] <- d
      t(out)
    }
    

    It doesn't return the same order, but after sorting, the results are identical.

    up1a <- up1[do.call(order, as.data.frame(up1)),]
    up2a <- up2[do.call(order, as.data.frame(up2)),]
    identical(up1a, up2a)
    

    For my first attempt, see the edit history.

    0 讨论(0)
  • 2020-11-29 06:44

    Another option is by using the Rcpp package. The difference is that it returns a list.

    //[[Rcpp::export]]
    std::vector<std::vector< int > > UniqueP(std::vector<int> v){
    std::vector< std::vector<int> > out;
    std::sort (v.begin(),v.end());
    do {
        out.push_back(v);
    } while ( std::next_permutation(v.begin(),v.end()));
    return out;
    }
     Unit: milliseconds
             expr       min      lq     mean    median       uq      max neval cld
     uniqueperm2(dat) 10.753426 13.5283 15.61438 13.751179 16.16061 34.03334   100   b
     UniqueP(dat)      9.090222  9.6371 10.30185  9.838324 10.20819 24.50451   100   a 
    
    0 讨论(0)
  • 2020-11-29 06:45

    Another option is the iterpc package, I believe it is the fastest of the existing method. More importantly, the result is in dictionary order (which may be somehow preferable).

    dat <- c(1, 0, 3, 4, 1, 0, 0, 3, 0, 4)
    library(iterpc)
    getall(iterpc(table(dat), order=TRUE))
    

    The benchmark indicates that iterpc is significant faster than all other methods described here

    library(multicool)
    library(microbenchmark)
    microbenchmark(uniqueperm2(dat), 
                   allPerm(initMC(dat)), 
                   getall(iterpc(table(dat), order=TRUE))
                  )
    
    Unit: milliseconds
                                         expr         min         lq        mean      median
                             uniqueperm2(dat)   23.011864   25.33241   40.141907   27.143952
                         allPerm(initMC(dat)) 1713.549069 1771.83972 1814.434743 1810.331342
     getall(iterpc(table(dat), order = TRUE))    4.332674    5.18348    7.656063    5.989448
              uq        max neval
       64.147399   74.66312   100
     1855.869670 1937.48088   100
        6.705741   49.98038   100
    
    0 讨论(0)
提交回复
热议问题