Grouping of R dataframe by connected values

前端 未结 4 651
臣服心动
臣服心动 2021-01-13 02:07

I didn\'t find a solution for this common grouping problem in R:

This is my original dataset

ID  State
1   A
2   A
3   B
4   B
5   B
6   A
7   A
8            


        
4条回答
  •  失恋的感觉
    2021-01-13 02:48

    Here is a method that uses the rle function in base R for the data set you provided.

    # get the run length encoding
    temp <- rle(df$State)
    
    # construct the data.frame
    newDF <- data.frame(State=temp$values,
                        min.ID=c(1, head(cumsum(temp$lengths) + 1, -1)),
                        max.ID=cumsum(temp$lengths))
    

    which returns

    newDF
      State min.ID max.ID
    1     A      1      2
    2     B      3      5
    3     A      6      8
    4     C      9     10
    

    Note that rle requires a character vector rather than a factor, so I use the as.is argument below.


    As @cryo111 notes in the comments below, the data set might be unordered timestamps that do not correspond to the lengths calculated in rle. For this method to work, you would need to first convert the timestamps to a date-time format, with a function like as.POSIXct, use df <- df[order(df$ID),], and then employ a slight alteration of the method above:

    # get the run length encoding
    temp <- rle(df$State)
    
    # construct the data.frame
    newDF <- data.frame(State=temp$values,
                        min.ID=df$ID[c(1, head(cumsum(temp$lengths) + 1, -1))],
                        max.ID=df$ID[cumsum(temp$lengths)])
    

    data

    df <- read.table(header=TRUE, as.is=TRUE, text="ID  State
    1   A
    2   A
    3   B
    4   B
    5   B
    6   A
    7   A
    8   A
    9   C
    10  C")
    

提交回复
热议问题