difftime between 2 columns and next.row within a variable

我的未来我决定 提交于 2019-12-23 05:03:29

问题


Trying to caulculate the difference in time between two two columns, however time 2 in 'difftime' is in the next.row

sample data:    
structure(list(code = c(10888, 10888, 10888, 10888, 10888, 10888, 
10889, 10889, 10889, 10889, 10889, 10889, 10890, 10890, 10890
), station = c("F1", "F3", "F4", "F5", "L5", "L7", "F1", "F3", 
"F4", "L5", "L6", "L7", "F1", "F3", "F5"), a = structure(c(1365895151, 
1365969188, 1366105495, 1367433149, 1368005216, 1368011698, 1366244224, 
1366414926, 1367513240, 1367790556, 1367946420, 1367923973, 1365896546, 
1365907968, 1366144207), class = c("POSIXct", "POSIXt"), tzone = ""), 
b = structure(c(1365895316, 1365976904, 1366105495, 1367436539, 
1368005233, 1368033855, 1366244224, 1366415643, 1367513840, 
1367915506, 1367946597, 1367954061, 1365897164, 1365907968, 
1366157867), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("code", 
"station", "a", "b"), row.names = 2:16, class = "data.frame")

I want to calculate time between first row in column a then second row in column b.

Difftime is easy enough:

difftime(test$a, test$b)

However i am struggling to get the next row, have tried:

difftime(test$a, c(test, b[seq_len(.N+1)])[])

and variations on this theme but to no avail.

Finally I would the the calculation to happen by=code, so in that the difftime is only calculated between values with the same code. possibly using ddply? or ,by=code.


回答1:


I think you are looking for something like:

difftime(head(test$a, -1), tail(test$b, -1))

Then if you want to apply that idea to each code and using plyr like you suggested:

ddply(test, .(code), transform, diff = c(difftime(head(a, -1), tail(b, -1)), NA))

#     code station                   a                   b        diff
# 1  10888      F1 2013-04-13 19:19:11 2013-04-13 19:21:56  -22.709167
# 2  10888      F3 2013-04-14 15:53:08 2013-04-14 18:01:44  -37.863056
# 3  10888      F4 2013-04-16 05:44:55 2013-04-16 05:44:55 -369.734444
# 4  10888      F5 2013-05-01 14:32:29 2013-05-01 15:28:59 -158.912222
# 5  10888      L5 2013-05-08 05:26:56 2013-05-08 05:27:13   -7.955278
# 6  10888      L7 2013-05-08 07:14:58 2013-05-08 13:24:15          NA
# 7  10889      F1 2013-04-17 20:17:04 2013-04-17 20:17:04  -47.616389
# 8  10889      F3 2013-04-19 19:42:06 2013-04-19 19:54:03 -305.253889
# 9  10889      F4 2013-05-02 12:47:20 2013-05-02 12:57:20 -111.740556
# 10 10889      L5 2013-05-05 17:49:16 2013-05-07 04:31:46  -43.344722
# 11 10889      L6 2013-05-07 13:07:00 2013-05-07 13:09:57   -2.122500
# 12 10889      L7 2013-05-07 06:52:53 2013-05-07 15:14:21          NA
# 13 10890      F1 2013-04-13 19:42:26 2013-04-13 19:52:44   -3.172778
# 14 10890      F3 2013-04-13 22:52:48 2013-04-13 22:52:48  -69.416389
# 15 10890      F5 2013-04-16 16:30:07 2013-04-16 20:17:47          NA



回答2:


Looks like you come from SAS-land since .N has meaning there ... but no meaning in R-land. If you want to get the difftime for offset vectors, you will need to shorten index vectors at beginning and ends:

  test$b[-nrow(test$b)] - test$a[-1], 

 # Since my session of R is locked up with a long computation 
 # I am posting s simple example.
 d$a
 # 1 2 3 4 5 6 7
 d$a[-1]
 # 2 3 4 5 6 7
 d$b
 # 10 9 8 7 6 5 4
 d$b[-nrow(d$b)]
 # 10 9 8 7 6 5

Could also use:

 d$b - c( NA, d$a[-1] ) # first element will be NA

> test$b - c( NA, test$a[-1] )
 [1] NA                        "1969-12-31 18:08:36 PST"
 [3] "1969-12-31 16:00:00 PST" "1969-12-31 16:56:30 PST"
 [5] "1969-12-31 16:00:17 PST" "1969-12-31 22:09:17 PST"
 [7] "1969-12-31 16:00:00 PST" "1969-12-31 16:11:57 PST"
 [9] "1969-12-31 16:10:00 PST" "1970-01-02 02:42:30 PST"
[11] "1969-12-31 16:02:57 PST" "1970-01-01 00:21:28 PST"
[13] "1969-12-31 16:10:18 PST" "1969-12-31 16:00:00 PST"
[15] "1969-12-31 19:47:40 PST"


来源:https://stackoverflow.com/questions/17015425/difftime-between-2-columns-and-next-row-within-a-variable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!