问题
Trying to caulculate the difference in time between two two columns, however time 2 in 'difftime' is in the next.row
sample data:
structure(list(code = c(10888, 10888, 10888, 10888, 10888, 10888,
10889, 10889, 10889, 10889, 10889, 10889, 10890, 10890, 10890
), station = c("F1", "F3", "F4", "F5", "L5", "L7", "F1", "F3",
"F4", "L5", "L6", "L7", "F1", "F3", "F5"), a = structure(c(1365895151,
1365969188, 1366105495, 1367433149, 1368005216, 1368011698, 1366244224,
1366414926, 1367513240, 1367790556, 1367946420, 1367923973, 1365896546,
1365907968, 1366144207), class = c("POSIXct", "POSIXt"), tzone = ""),
b = structure(c(1365895316, 1365976904, 1366105495, 1367436539,
1368005233, 1368033855, 1366244224, 1366415643, 1367513840,
1367915506, 1367946597, 1367954061, 1365897164, 1365907968,
1366157867), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("code",
"station", "a", "b"), row.names = 2:16, class = "data.frame")
I want to calculate time between first row in column a then second row in column b.
Difftime is easy enough:
difftime(test$a, test$b)
However i am struggling to get the next row, have tried:
difftime(test$a, c(test, b[seq_len(.N+1)])[])
and variations on this theme but to no avail.
Finally I would the the calculation to happen by=code, so in that the difftime is only calculated between values with the same code. possibly using ddply? or ,by=code.
回答1:
I think you are looking for something like:
difftime(head(test$a, -1), tail(test$b, -1))
Then if you want to apply that idea to each code
and using plyr
like you suggested:
ddply(test, .(code), transform, diff = c(difftime(head(a, -1), tail(b, -1)), NA))
# code station a b diff
# 1 10888 F1 2013-04-13 19:19:11 2013-04-13 19:21:56 -22.709167
# 2 10888 F3 2013-04-14 15:53:08 2013-04-14 18:01:44 -37.863056
# 3 10888 F4 2013-04-16 05:44:55 2013-04-16 05:44:55 -369.734444
# 4 10888 F5 2013-05-01 14:32:29 2013-05-01 15:28:59 -158.912222
# 5 10888 L5 2013-05-08 05:26:56 2013-05-08 05:27:13 -7.955278
# 6 10888 L7 2013-05-08 07:14:58 2013-05-08 13:24:15 NA
# 7 10889 F1 2013-04-17 20:17:04 2013-04-17 20:17:04 -47.616389
# 8 10889 F3 2013-04-19 19:42:06 2013-04-19 19:54:03 -305.253889
# 9 10889 F4 2013-05-02 12:47:20 2013-05-02 12:57:20 -111.740556
# 10 10889 L5 2013-05-05 17:49:16 2013-05-07 04:31:46 -43.344722
# 11 10889 L6 2013-05-07 13:07:00 2013-05-07 13:09:57 -2.122500
# 12 10889 L7 2013-05-07 06:52:53 2013-05-07 15:14:21 NA
# 13 10890 F1 2013-04-13 19:42:26 2013-04-13 19:52:44 -3.172778
# 14 10890 F3 2013-04-13 22:52:48 2013-04-13 22:52:48 -69.416389
# 15 10890 F5 2013-04-16 16:30:07 2013-04-16 20:17:47 NA
回答2:
Looks like you come from SAS-land since .N
has meaning there ... but no meaning in R-land. If you want to get the difftime
for offset vectors, you will need to shorten index vectors at beginning and ends:
test$b[-nrow(test$b)] - test$a[-1],
# Since my session of R is locked up with a long computation
# I am posting s simple example.
d$a
# 1 2 3 4 5 6 7
d$a[-1]
# 2 3 4 5 6 7
d$b
# 10 9 8 7 6 5 4
d$b[-nrow(d$b)]
# 10 9 8 7 6 5
Could also use:
d$b - c( NA, d$a[-1] ) # first element will be NA
> test$b - c( NA, test$a[-1] )
[1] NA "1969-12-31 18:08:36 PST"
[3] "1969-12-31 16:00:00 PST" "1969-12-31 16:56:30 PST"
[5] "1969-12-31 16:00:17 PST" "1969-12-31 22:09:17 PST"
[7] "1969-12-31 16:00:00 PST" "1969-12-31 16:11:57 PST"
[9] "1969-12-31 16:10:00 PST" "1970-01-02 02:42:30 PST"
[11] "1969-12-31 16:02:57 PST" "1970-01-01 00:21:28 PST"
[13] "1969-12-31 16:10:18 PST" "1969-12-31 16:00:00 PST"
[15] "1969-12-31 19:47:40 PST"
来源:https://stackoverflow.com/questions/17015425/difftime-between-2-columns-and-next-row-within-a-variable