permuting a data frame by rows and columns

懵懂的女人 提交于 2020-01-13 18:56:27


Someone know how can I randomize all the data inside my dataframe? I mean, I would get a new data frame where data are permuted by rows and by columns, to obtain an aleatory new data frame with the same numbers that I have in the first.

Something like this:



It would be a lot faster to do this on a matrix:

dm <- matrix(1:25, ncol = 5); dm
dm[] <- sample(dm); dm

Edit: This is wrong: "I'm pretty sure that permuting first on columns and then on rows should give you the same result as permuting the entire vector and then reshaping to the original dimensions." <\s>

The "Simpson method" would give different results and may be what was requested (but it will be faster with a matrix testbed if this it to be done as part of a simulation effort):

 dm <- dm[ sample(nrow(dm)), sample( ncol(dm)) ]


Just use sample() separately on the number of rows and number of columns and then index with the results from sample().

df <- data.frame(matrix(1:25, ncol = 5))

permDF <- function(x) {
  nr <- nrow(x)
  nc <- ncol(x)
  x[sample(nr), sample(nc)]

> permDF(df)
  X3 X4 X2 X1 X5
4 14 19  9  4 24
5 15 20 10  5 25
1 11 16  6  1 21
3 13 18  8  3 23
2 12 17  7  2 22
> permDF(df)
  X1 X2 X4 X3 X5
2  2  7 17 12 22
4  4  9 19 14 24
1  1  6 16 11 21
3  3  8 18 13 23
5  5 10 20 15 25

Note that this keeps values in rows and columns together but the columns and rows are in a different order. If you want the data set fully randomised then there isn't a really simple way with a data frame. I would do this using a matrix but it requires a bit more work, as @DWin shows

mat <- matrix(1:25, ncol = 5)
pmat <- mat
pmat[] <- mat[sample(length(mat))]

> pmat
     [,1] [,2] [,3] [,4] [,5]
[1,]   23   11   24   10    5
[2,]   25   21   20    9    8
[3,]    7    3   13    1   18
[4,]   19   12    4   16    2
[5,]   14   17    6   15   22

You can do what I was doing with the data frame in the same way with the matrix using slightly different indexing to the one above

mat[sample(nrow(mat)), sample(ncol(mat))]

> set.seed(42)
> mat[sample(nrow(mat)), sample(ncol(mat))]
     [,1] [,2] [,3] [,4] [,5]
[1,]   15   25    5   10   20
[2,]   14   24    4    9   19
[3,]   11   21    1    6   16
[4,]   12   22    2    7   17
[5,]   13   23    3    8   18


randomize function from NMF package could be what you are looking for.

From the doc:

randomize permutates independently the entries in each column of a matrix-like object, to produce random data that can be used in permutation tests or bootstrap analysis.

