Is there a dplyr equivalent to data.table::rleid?

前端 未结 4 1620
灰色年华
灰色年华 2020-11-22 13:16

data.table offers a nice convenience function, rleid for run-length encoding:

library(data.table)
DT = data.table(grp=rep(c(\"A\", \"B\", \"C\",         


        
相关标签:
4条回答
  • 2020-11-22 13:48

    You can do it using the lag function from dplyr.

    DT <-
        DT %>%
        mutate(rleid = (grp != lag(grp, 1, default = "asdf"))) %>%
        mutate(rleid = cumsum(rleid))
    

    gives

    > DT
        grp value rleid
     1:   A     1     1
     2:   A     2     1
     3:   B     3     2
     4:   B     4     2
     5:   C     5     3
     6:   C     6     3
     7:   C     7     3
     8:   A     8     4
     9:   B     9     5
    10:   B    10     5
    
    0 讨论(0)
  • 2020-11-22 13:55

    You can just do (when you have both data.table and dplyr loaded):

    DT <- DT %>% mutate(rlid = rleid(grp))
    

    this gives:

    > DT
        grp value rlid
     1:   A     1    1
     2:   A     2    1
     3:   B     3    2
     4:   B     4    2
     5:   C     5    3
     6:   C     6    3
     7:   C     7    3
     8:   A     8    4
     9:   B     9    5
    10:   B    10    5
    

    When you don't want to load data.table separately you can also use (as mentioned by @DavidArenburg in the comments):

    DT <- DT %>% mutate(rlid = data.table::rleid(grp))
    

    And as @RichardScriven said in his comment you can just copy/steal it:

    myrleid <- data.table::rleid
    
    0 讨论(0)
  • 2020-11-22 14:03

    A simplification (involving no additional package) of the approach used by the OP could be:

    DT %>%
     mutate(rleid = with(rle(grp), rep(seq_along(lengths), lengths)))
    
       grp value rleid
    1    A     1     1
    2    A     2     1
    3    B     3     2
    4    B     4     2
    5    C     5     3
    6    C     6     3
    7    C     7     3
    8    A     8     4
    9    B     9     5
    10   B    10     5
    

    Or:

    DT %>%
     mutate(rleid = rep(seq(ls <- rle(grp)$lengths), ls))
    
    0 讨论(0)
  • 2020-11-22 14:08

    If you want to use just base R and dplyr, the better way is to wrap up your own one or two line version of rleid() as a function and then apply that whenever you need it.

    library(dplyr)
    
    myrleid <- function(x) {
        x <- rle(x)$lengths
        rep(seq_along(x), times=x)
    }
    
    ## Try it out
    DT <- DT %>% mutate(rlid = myrleid(grp))
    DT
    #   grp value rlid
    # 1:   A     1    1
    # 2:   A     2    1
    # 3:   B     3    2
    # 4:   B     4    2
    # 5:   C     5    3
    # 6:   C     6    3
    # 7:   C     7    3
    # 8:   A     8    4
    # 9:   B     9    5
    #10:   B    10    5
    
    0 讨论(0)
提交回复
热议问题