问题
I have the following dataframe (ts1):
D1 Diff
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01
3 20/11/2014 19:00 0.03
I would like to add a new column to ts1 that will be the difference in hours between successive rows D1 (dates) in hours.
The new ts1 should be:
D1 Diff N
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.03 2
For calculating the difference in hours independently I use:
library(lubridate)
difftime(dmy_hm("29/12/2014 11:00"), dmy_hm("29/12/2014 9:00"), units="hours")
I know that for calculating the difference between each row I need to transform the ts1 into matrix.
I use the following command:
> ts1$N<-difftime(dmy_hm(as.matrix(ts1$D1)), units="hours")
And I get:
Error in as.POSIXct(time2) : argument "time2" is missing, with no default
回答1:
Suppose ts1
is as shown in Note 2 at the end. Then create a POSIXct
variable tt
from D1
, convert tt
to numeric giving the number of seconds since the Epoch, divide that by 3600 to get the number of hours since the Epoch and take differences. No packages are used.
tt <- as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")
m <- transform(ts1, N = c(NA, diff(as.numeric(tt) / 3600)))
giving:
> m
D1 Diff N
1 20/11/2014 16:00 0.00 NA
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.03 2
Note 1: I assume you are looking for N
so that you can fill in the empty hours. In that case you don't really need N
. Also, it would be easier to deal with time series if you use a time series representation. First we convert ts1
to a zoo object, then we create a zero width zoo object with the datetimes that we need and finally we merge them:
library(zoo)
z <- read.zoo(ts1, tz = "", format = "%d/%m/%Y %H:%M")
z0 <- zoo(, seq(start(z), end(z), "hours"))
zz <- merge(z, z0)
giving:
> zz
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00
0.00 0.01 NA 0.03
If you really did need a data frame back then:
DF <- fortify.zoo(zz)
Note 2: Input used in reproducible form is:
Lines <- "D1,Diff
1,20/11/2014 16:00,0.00
2,20/11/2014 17:00,0.01
3,20/11/2014 19:00,0.03"
ts1 <- read.csv(text = Lines, as.is = TRUE)
回答2:
Thanks to @David Arenburg and @nicola: Can use either:
res <- diff(as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")) ; units(res) <- "hours"
Or:
res <- diff(dmy_hm(ts1$D1))
and afterwards:
ts1$N <- c(NA_real_, as.numeric(res))
来源:https://stackoverflow.com/questions/34705674/calculating-differences-of-dates-in-hours-between-rows-of-a-dataframe