Remove consecutive duplicates from dataframe

后端 未结 3 476
孤城傲影
孤城傲影 2020-12-30 06:36

I have a data frame that I want to remove duplicates that are consecutive (in base). I know rle may be helpful here but can\'t think of how to use it. The exa

相关标签:
3条回答
  • 2020-12-30 07:00

    Here a fast solution using filter

    dat[(filter(dat,c(-1,1))!= 0)[,1],]
         v1   v2
    1     A  Jan
    3     E  May
    4     B  Feb
    7     A  Jan
    8     D  Apr
    10    A  Mar
    11    B  Feb
    12    E  May
    15    B  Feb
    18    C  Mar
    19    D  Apr
    NA <NA> <NA>
    

    You need to add the last value of the original data to the result.

    0 讨论(0)
  • 2020-12-30 07:01

    Here's a way, not with rle, but a way none-the-less:

    dat[with(dat, c(TRUE, diff(as.numeric(interaction(v1, v2))) != 0)), ]
    

    This assumes you're using factor columns, as your sample data implies.

    0 讨论(0)
  • 2020-12-30 07:09

    Using rle I came up with this

    ind <- cumsum(rle(as.character(dat$v1))$length)
    dat[ind, ]
    

    ind indicates either the first or the last of consecutive entries.

    EDIT:

    A simple solution to Matthews comment would be

    dat[15, 2] <- "May"
    dat[cumsum(rle(paste0(dat$v1, dat$v2))$length), ]
    
    0 讨论(0)
提交回复
热议问题