Time difference between rows in R dplyr, different units

吃可爱长大的小学妹 提交于 2019-12-31 01:12:11

问题


Here is my example. I am reading the following file: sample_data

library(dplyr)

txt <- c('"",  "MDN",                  "Cl_Date"',
          '"1",  "A",  "2017-04-15 15:10:42.510"',
          '"2",  "A",  "2017-04-01 14:47:23.210"',
          '"3",  "A",  "2017-04-01 14:49:54.063"',
          '"4",  "B",  "2017-04-30 13:25:00.000"',
          '"5",  "B",  "2017-04-03 17:53:13.217"',
          '"6",  "B",  "2017-04-15 15:17:43.780"')

ts <- read.csv(text = txt, as.is = TRUE)
ts$Cl_Date <- as.POSIXct(ts$Cl_Date)
ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff = c(0,diff(Cl_Date)))
ts <-ts[order(ts$MDN, ts$Cl_Date),]

As a result I have

MDN Cl_Date         time_diff
A   4/1/2017 14:47  0
A   4/1/2017 14:49  2.514216665
A   4/15/2017 15:10 20180.80745
B   4/3/2017 17:53  0
B   4/15/2017 15:17 11.89202041
B   4/30/2017 13:25 14.92171551

So I group by MDN column and compute difference between Cl_Date column. As you can see sometime different in minutes (group A) and sometime difference in days (group B).

Why is time difference in different units and how to correct it?

P.S. I could not reproduce the same example with manual data.frame creation, so I had to read from file.

UPDATE 1 diff(ts$Cl_Date) seems to be consistent, everything is in minutes. Does something break within dplyr?

UPDATE 2

ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff_2 = Cl_Date-lag(Cl_Date))

produces the same result.


回答1:


ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff_2 = as.numeric(Cl_Date-lag(Cl_Date), units = 'mins'))

Convert the time difference to a numeric value. You can use units argument to make the return values consistent.




回答2:


According to @hadley here, the solution is to use lubridate instead of relying on base R.

This would be something like:

ts %>% 
  group_by(MDN) %>% 
  arrange(Cl_Date) %>%
  mutate(as.duration(Cl_Date %--% lag(Cl_Date)))


来源:https://stackoverflow.com/questions/44377637/time-difference-between-rows-in-r-dplyr-different-units

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!