问题
Here is the problem: I have a dataset, let's say:
a <- c(0,0,0,0,1,1,1,1,1,1)
I want to cut it into even pieces (e.g. 5 pieces ). The problem is I cannot use quantiles or cut because some values repeat, so you cannot set distinct breakpoints.
> quantile(a)
0% 25% 50% 75% 100%
0 0 1 1 1
(repeated breakpoints)
> cut(a, 5)
[1] (-0.001,0.199] (-0.001,0.199] (-0.001,0.199] (-0.001,0.199] (0.801,1]
[6] (0.801,1] (0.801,1] (0.801,1] (0.801,1] (0.801,1]
Levels: (-0.001,0.199] (0.199,0.4] (0.4,0.6] (0.6,0.801] (0.801,1]
(only two levels used)
I know I can produce a vector like this:
b <- c(1,1,2,2,3,3,4,4,5,5)
and use it for sampling. Or I can use for loop and count instances. But this needs loops and some clumsy coding. I am looking for a simple and efficient (R-style) function that does better than this.
(I can write it but I don't want to reinvent the wheel.)
回答1:
You can use cut
, but you have to use it on the numerical indices of the vector, i.e., seq(a)
, not the vector itself.
Then you split the vector into pieces of equal length with split
:
split(a, cut(seq(a), 5, labels = FALSE))
This returns a list of five short vectors.
Another way, without cut
, is given by
split(a, rep(seq(5), each = length(a) / 5))
回答2:
I think it depends on what you are going to do next. I like dim:
dim(a) <- c(2, length(a) / 2)
And now a looks this:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 1 1 1
[2,] 0 0 1 1 1
来源:https://stackoverflow.com/questions/20587852/how-to-cut-data-in-even-pieces-in-r