Efficiently create derangement of a vector in R

问题

I'm looking into a way of efficiently creating a derangement (and conversely specific permutations) of a vector in R. As far as I've seen, there's no base function that does that and also there's not much about it here on SO.

An obvious start is sample which creates a permutation of a vector. But I need this permutation to have no fixed points, hence be a derangement of the vector. For a nice explanation of this topic, see this Cross Validated post.

This is my first approach:

derangr <- function(x){

  while(TRUE){

    xp <- sample(x)

     if(sum(xp == x) == 0) break

  }

  return(xp)

}

So within a while loop, I'm checking if there's a fixed point between a vector x and a given permutation of x called xp. If there is none, I break the loop and return the vector.

As the results show, it works fine:

> derangr(1:10)
 [1]  4  5  6 10  7  2  1  9  3  8

> derangr(LETTERS)
 [1] "C" "O" "L" "J" "A" "I" "Y" "M" "G" "T" "S" "R" "Z" "V" "N" "K" "D" "Q" "B" "H" "F" "E" "X" "W" "U" "P"

So I'm wondering if there's a better way of doing that, potentially with substituting while by a vectorization of some kind. I also want to keep an eye on scalability.

Here's the microbenchmark for both examples:

library(microbenchmark)

> microbenchmark(derangr(1:10),times = 10000)
Unit: microseconds
          expr   min     lq    mean  median      uq      max neval
 derangr(1:10) 8.359 15.492 40.1807 28.3195 49.4435 6866.453 10000

> microbenchmark(derangr(LETTERS),times = 10000)
Unit: microseconds
             expr    min     lq     mean  median      uq      max neval
 derangr(LETTERS) 24.385 31.123 34.75819 32.4475 34.3225 10200.17 10000

The same question applies to the converse, producing permutations with a given number of fixed points n:

arrangr <- function(x,n){

  while(TRUE){

    xp <- sample(x)

     if(sum(xp == x) == n) break
  }

  return(xp)

}

回答1:

If you don't have only unique values, you could rearrange an index like and use it for subsetting the input vector in a new order. In this case if you have for example rep(LETTERS, 2) the first A and the second A would be interchangeable. The derangr() function proposed in the Q would also rearrange these.

derangr2 <- function(x){
  ind <- seq_along(x)
  while(TRUE){
    indp <- sample(ind)
    if(sum(indp == ind) == 0) break

  }
  return(x[indp])
}

Some Benchmark results:

microbenchmark(derangr(rep(LETTERS, 4)), 
               derangr2(rep(LETTERS, 4)), times = 1000)

# Unit: microseconds
#                      expr   min       lq       mean  median      uq      max neval
#  derangr(rep(LETTERS, 4)) 6.258 113.4895 441.831094 251.724 549.384 5837.143  1000
# derangr2(rep(LETTERS, 4)) 6.542   7.3960  23.173800  12.800  22.755 4645.936  1000

However, if you face only unique values, this approach doesn't hold a lot of improvement.

microbenchmark(derangr(1:1000), derangr2(1:1000), times = 1000)
# Unit: microseconds
#             expr    min     lq     mean median      uq      max neval
#  derangr(1:1000) 19.341 21.333 61.55154 40.959 78.0775 2770.382  1000
# derangr2(1:1000) 23.608 25.884 72.76647 46.079 84.1930 2674.243  1000

来源：https://stackoverflow.com/questions/45459623/efficiently-create-derangement-of-a-vector-in-r

标签

performance

vector

permutation