Using the package dplyr and the function sample_frac
it is possible to sample a percentage from every group. What I need is to first sort the elements in every group and then select top x% from every group?
There is a function top_n
, but here I can only determine the number of rows, and I need a relative value.
For example the following data is grouped by gear and sorted by wt
within each group:
library(dplyr) mtcars %>% select(gear, wt) %>% group_by(gear) %>% arrange(gear, wt) gear wt 1 3 2.465 2 3 3.215 3 3 3.435 4 3 3.440 5 3 3.460 6 3 3.520 7 3 3.570 8 3 3.730 9 3 3.780 10 3 3.840 11 3 3.845 12 3 4.070 13 3 5.250 14 3 5.345 15 3 5.424 16 4 1.615 17 4 1.835 18 4 1.935 19 4 2.200 20 4 2.320 21 4 2.620 22 4 2.780 23 4 2.875 24 4 3.150 25 4 3.190 26 4 3.440 27 4 3.440 28 5 1.513 29 5 2.140 30 5 2.770 31 5 3.170 32 5 3.570
Now I would like to select top 20 % within each gear group.
It would be very nice if the solution could be integrated with dplyr's group_by
function.