dplyr - Group by and select TOP x %

匿名 (未验证) 提交于 2019-12-03 01:23:02

问题:

Using the package dplyr and the function sample_frac it is possible to sample a percentage from every group. What I need is to first sort the elements in every group and then select top x% from every group?

There is a function top_n, but here I can only determine the number of rows, and I need a relative value.

For example the following data is grouped by gear and sorted by wt within each group:

library(dplyr) mtcars %>%   select(gear, wt) %>%   group_by(gear) %>%   arrange(gear, wt)      gear    wt 1   3   2.465 2   3   3.215 3   3   3.435 4   3   3.440 5   3   3.460 6   3   3.520 7   3   3.570 8   3   3.730 9   3   3.780 10  3   3.840 11  3   3.845 12  3   4.070 13  3   5.250 14  3   5.345 15  3   5.424 16  4   1.615 17  4   1.835 18  4   1.935 19  4   2.200 20  4   2.320 21  4   2.620 22  4   2.780 23  4   2.875 24  4   3.150 25  4   3.190 26  4   3.440 27  4   3.440 28  5   1.513 29  5   2.140 30  5   2.770 31  5   3.170 32  5   3.570 

Now I would like to select top 20 % within each gear group.

It would be very nice if the solution could be integrated with dplyr's group_by function.

回答1:

Or another option with dplyr:

mtcars %>% select(gear, wt) %>%    group_by(gear) %>%    arrange(gear, desc(wt)) %>%    filter(wt > quantile(wt, .8))  Source: local data frame [7 x 2] Groups: gear [3]     gear    wt   (dbl) (dbl) 1     3 5.424 2     3 5.345 3     3 5.250 4     4 3.440 5     4 3.440 6     4 3.190 7     5 3.570 


回答2:

Here's another way

mtcars %>%    select(gear, wt) %>%    arrange(gear, desc(wt)) %>%    group_by(gear) %>%    slice(seq(n()*.2))     gear    wt   (dbl) (dbl) 1     3 5.424 2     3 5.345 3     3 5.250 4     4 3.440 5     4 3.440 6     5 3.570 

I take "top" to mean "having the highest value for wt" and so used desc().



回答3:

I believe this gets to the answer you're looking for.

library(dplyr)  mtcars %>% select(gear, wt) %>%    group_by(gear) %>%    arrange(gear, wt) %>%    filter(row_number() / n() 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!