How to calculate time difference with previous row of a data.frame by group

前端 未结 2 2036
心在旅途
心在旅途 2020-12-01 12:46

The problem I am trying to solve is that I have a data frame with a sorted POSIXct variable in it. Each row is categorized and I want to get the time differences between eac

相关标签:
2条回答
  • 2020-12-01 13:10

    Try this:

    library(dplyr)
    df %>%
      arrange(category, randtime) %>%
      group_by(category) %>%
      mutate(diff = randtime - lag(randtime),
             diff_secs = as.numeric(diff, units = 'secs'))
    
    #   category            randtime             diff   diff_secs
    #     (fctr)              (time)           (dfft)       (dbl)
    # 1        A 2015-06-01 11:10:54         NA hours          NA
    # 2        A 2015-06-01 15:35:04   4.402785 hours   15850.027
    # 3        A 2015-06-01 17:01:22   1.438395 hours    5178.222
    # 4        B 2015-06-01 08:14:46         NA hours          NA
    # 5        B 2015-06-01 16:53:43 518.955379 hours 1868239.364
    # 6        B 2015-06-01 17:37:48  44.090950 hours  158727.420
    

    You may also want to add replace(is.na(.), 0) to the chain.

    0 讨论(0)
  • 2020-12-01 13:22

    In base R you can use:

    # creating an ordered data.frame
    df <- data.frame(category, randtime)
    df <- df[order(df$category, df$randtime),]
    
    # calculating the timedifference
    # option 1:
    df$tdiff <- unlist(tapply(df$randtime, INDEX = df$category,
                              FUN = function(x) c(0, `units<-`(diff(x), "secs"))))
    # option 2:
    df$tdiff <- unlist(tapply(df$randtime, INDEX = df$category,
                              FUN = function(x) c(0, diff(as.numeric(x)))))
    

    which gives:

    > df
       category            randtime      tdiff
    6         A 2015-06-01 11:10:54     0.0000
    15        A 2015-06-01 15:35:04 15850.0271
    18        A 2015-06-01 17:01:22  5178.2223
    1         B 2015-06-01 08:14:46     0.0000
    17        B 2015-06-01 16:53:43 31137.3227
    19        B 2015-06-01 17:37:48  2645.4570
    3         C 2015-06-01 10:09:50     0.0000
    7         C 2015-06-01 12:46:40  9409.9693
    9         C 2015-06-01 13:56:29  4188.4578
    10        C 2015-06-01 14:24:18  1669.1326
    12        C 2015-06-01 14:54:25  1807.1447
    14        C 2015-06-01 15:05:07   641.7068
    2         D 2015-06-01 09:28:16     0.0000
    13        D 2015-06-01 14:55:40 19644.8313
    4         E 2015-06-01 10:18:58     0.0000
    5         E 2015-06-01 10:53:29  2071.2223
    8         E 2015-06-01 13:26:26  9176.6263
    11        E 2015-06-01 14:33:25  4019.0319
    16        E 2015-06-01 15:57:16  5031.4183
    20        E 2015-06-01 17:56:33  7156.8849
    

    If you want minutes or hours, you can use "mins" or "hours" instead of "secs".


    An alternative with the data.table package:

    library(data.table)
    # creating an ordered/keyed data.table
    dt <- data.table(category, randtime, key = c("category", "randtime"))
    
    # calculating the timedifference
    # option 1:
    dt[, tdiff := difftime(randtime, shift(randtime, fill=randtime[1L]), units="secs"), by=category]
    # option 2:
    dt[, tdiff := c(0, `units<-`(diff(randtime), "secs")), by = category]
    # option 3:
    dt[ , test := c(0, diff(as.numeric(randtime))), category]
    

    which results in:

    > dt
        category            randtime           tdiff
     1:        A 2015-06-01 11:10:54     0.0000 secs
     2:        A 2015-06-01 15:35:04 15850.0271 secs
     3:        A 2015-06-01 17:01:22  5178.2223 secs
     4:        B 2015-06-01 08:14:46     0.0000 secs
     5:        B 2015-06-01 16:53:43 31137.3227 secs
     6:        B 2015-06-01 17:37:48  2645.4570 secs
     7:        C 2015-06-01 10:09:50     0.0000 secs
     8:        C 2015-06-01 12:46:40  9409.9693 secs
     9:        C 2015-06-01 13:56:29  4188.4578 secs
    10:        C 2015-06-01 14:24:18  1669.1326 secs
    11:        C 2015-06-01 14:54:25  1807.1447 secs
    12:        C 2015-06-01 15:05:07   641.7068 secs
    13:        D 2015-06-01 09:28:16     0.0000 secs
    14:        D 2015-06-01 14:55:40 19644.8313 secs
    15:        E 2015-06-01 10:18:58     0.0000 secs
    16:        E 2015-06-01 10:53:29  2071.2223 secs
    17:        E 2015-06-01 13:26:26  9176.6263 secs
    18:        E 2015-06-01 14:33:25  4019.0319 secs
    19:        E 2015-06-01 15:57:16  5031.4183 secs
    20:        E 2015-06-01 17:56:33  7156.8849 secs
    
    0 讨论(0)
提交回复
热议问题