Grouping of R dataframe by connected values

前端 未结 4 652
臣服心动
臣服心动 2021-01-13 02:07

I didn\'t find a solution for this common grouping problem in R:

This is my original dataset

ID  State
1   A
2   A
3   B
4   B
5   B
6   A
7   A
8            


        
4条回答
  •  南笙
    南笙 (楼主)
    2021-01-13 02:32

    You could try:

    library(dplyr)
    df %>%
      mutate(rleid = cumsum(State != lag(State, default = ""))) %>%
      group_by(rleid) %>%
      summarise(State = first(State), min = min(ID), max = max(ID)) %>%
      select(-rleid)
    

    Or as per mentioned by @alistaire in the comments, you can actually mutate within group_by() with the same syntax, combining the first two steps. Stealing data.table::rleid() and using summarise_all() to simplify:

    df %>% 
      group_by(State, rleid = data.table::rleid(State)) %>% 
      summarise_all(funs(min, max)) %>% 
      select(-rleid)
    

    Which gives:

    ## A tibble: 4 × 3
    #   State   min   max
    #    
    #1      A     1     2
    #2      B     3     5
    #3      A     6     8
    #4      C     9    10
    

提交回复
热议问题