问题
Someone know how can I randomize all the data inside my dataframe? I mean, I would get a new data frame where data are permuted by rows and by columns, to obtain an aleatory new data frame with the same numbers that I have in the first.
Something like this:
Thanks!
回答1:
It would be a lot faster to do this on a matrix:
dm <- matrix(1:25, ncol = 5); dm
dm[] <- sample(dm); dm
Edit: This is wrong: "I'm pretty sure that permuting first on columns and then on rows should give you the same result as permuting the entire vector and then reshaping to the original dimensions." <\s>
The "Simpson method" would give different results and may be what was requested (but it will be faster with a matrix testbed if this it to be done as part of a simulation effort):
dm <- dm[ sample(nrow(dm)), sample( ncol(dm)) ]
回答2:
Just use sample()
separately on the number of rows and number of columns and then index with the results from sample()
.
df <- data.frame(matrix(1:25, ncol = 5))
permDF <- function(x) {
nr <- nrow(x)
nc <- ncol(x)
x[sample(nr), sample(nc)]
}
> permDF(df)
X3 X4 X2 X1 X5
4 14 19 9 4 24
5 15 20 10 5 25
1 11 16 6 1 21
3 13 18 8 3 23
2 12 17 7 2 22
> permDF(df)
X1 X2 X4 X3 X5
2 2 7 17 12 22
4 4 9 19 14 24
1 1 6 16 11 21
3 3 8 18 13 23
5 5 10 20 15 25
Note that this keeps values in rows and columns together but the columns and rows are in a different order. If you want the data set fully randomised then there isn't a really simple way with a data frame. I would do this using a matrix but it requires a bit more work, as @DWin shows
mat <- matrix(1:25, ncol = 5)
pmat <- mat
set.seed(42)
pmat[] <- mat[sample(length(mat))]
pmat
> pmat
[,1] [,2] [,3] [,4] [,5]
[1,] 23 11 24 10 5
[2,] 25 21 20 9 8
[3,] 7 3 13 1 18
[4,] 19 12 4 16 2
[5,] 14 17 6 15 22
You can do what I was doing with the data frame in the same way with the matrix using slightly different indexing to the one above
mat[sample(nrow(mat)), sample(ncol(mat))]
> set.seed(42)
> mat[sample(nrow(mat)), sample(ncol(mat))]
[,1] [,2] [,3] [,4] [,5]
[1,] 15 25 5 10 20
[2,] 14 24 4 9 19
[3,] 11 21 1 6 16
[4,] 12 22 2 7 17
[5,] 13 23 3 8 18
回答3:
randomize
function from NMF
package could be what you are looking for.
From the doc:
randomize permutates independently the entries in each column of a matrix-like object, to produce random data that can be used in permutation tests or bootstrap analysis.
来源:https://stackoverflow.com/questions/16487238/permuting-a-data-frame-by-rows-and-columns