Calculating differences of dates in hours between rows of a dataframe

十年热恋 提交于 2019-12-12 06:56:43

问题


I have the following dataframe (ts1):

                D1 Diff
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01
3 20/11/2014 19:00 0.03

I would like to add a new column to ts1 that will be the difference in hours between successive rows D1 (dates) in hours.

The new ts1 should be:

                D1 Diff N
1 20/11/2014 16:00 0.00 
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.03 2

For calculating the difference in hours independently I use:

library(lubridate)
difftime(dmy_hm("29/12/2014 11:00"), dmy_hm("29/12/2014 9:00"), units="hours") 

I know that for calculating the difference between each row I need to transform the ts1 into matrix.

I use the following command:

> ts1$N<-difftime(dmy_hm(as.matrix(ts1$D1)), units="hours")

And I get:

Error in as.POSIXct(time2) : argument "time2" is missing, with no default

回答1:


Suppose ts1 is as shown in Note 2 at the end. Then create a POSIXct variable tt from D1, convert tt to numeric giving the number of seconds since the Epoch, divide that by 3600 to get the number of hours since the Epoch and take differences. No packages are used.

tt <- as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")
m <- transform(ts1, N = c(NA, diff(as.numeric(tt) / 3600)))

giving:

> m

                D1 Diff  N
1 20/11/2014 16:00 0.00 NA
2 20/11/2014 17:00 0.01  1
3 20/11/2014 19:00 0.03  2

Note 1: I assume you are looking for N so that you can fill in the empty hours. In that case you don't really need N. Also, it would be easier to deal with time series if you use a time series representation. First we convert ts1 to a zoo object, then we create a zero width zoo object with the datetimes that we need and finally we merge them:

library(zoo)
z <- read.zoo(ts1, tz = "", format = "%d/%m/%Y %H:%M")

z0 <- zoo(, seq(start(z), end(z), "hours"))
zz <- merge(z, z0)

giving:

> zz
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00 
               0.00                0.01                  NA                0.03 

If you really did need a data frame back then:

DF <- fortify.zoo(zz)

Note 2: Input used in reproducible form is:

Lines <- "D1,Diff
1,20/11/2014 16:00,0.00
2,20/11/2014 17:00,0.01
3,20/11/2014 19:00,0.03"

ts1 <- read.csv(text = Lines, as.is = TRUE)



回答2:


Thanks to @David Arenburg and @nicola: Can use either:

res <- diff(as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")) ; units(res) <- "hours" 

Or:

res <- diff(dmy_hm(ts1$D1))

and afterwards:

ts1$N <- c(NA_real_, as.numeric(res))


来源:https://stackoverflow.com/questions/34705674/calculating-differences-of-dates-in-hours-between-rows-of-a-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!