How to do range grouping on a column using dplyr?

前端 未结 1 998
自闭症患者
自闭症患者 2020-12-03 05:46

I want to group a data.table based on a column\'s range value, how can I do this with the dplyr library?

For example, my data table is like below:

相关标签:
1条回答
  • 2020-12-03 06:36

    We can use cut to do the grouping. We create the 'gr' column within the group_by, use summarise to create the number of elements in each group (n()), and order the output (arrange) based on 'gr'.

    library(dplyr)
     DT %>% 
         group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05)) ) %>% 
         summarise(n= n()) %>%
         arrange(as.numeric(gr))
    

    As the initial object is data.table, this can be done using data.table methods (included @Frank's suggestion to use keyby)

    library(data.table)
    DT[,.N , keyby = .(gr=cut(B, breaks=seq(0, 1, by=0.05)))]
    

    EDIT:

    Based on the update in the OP's post, we could substract a small number to the seq

    lvls <- levels(cut(DT$B, seq(0, 1, by =0.05)))
    DT %>%
       group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05) -
                     .Machine$double.eps, right=FALSE, labels=lvls)) %>% 
       summarise(n=n()) %>% 
       arrange(as.numeric(gr))
    #          gr n
    #1   (0,0.05] 2
    #2 (0.05,0.1] 2
    #3 (0.1,0.15] 3
    #4 (0.15,0.2] 2
    #5 (0.7,0.75] 1
    
    0 讨论(0)
提交回复
热议问题