Time lag between sequential observations giving negative values

谁说我不能喝 提交于 2019-12-12 22:26:30

问题


I am trying to calculate the time between sequential observations. I have attached a sample of my data here.

A subset of my data looks like:

head(d1) #visualize the first few lines of the data
date       time   year    km sps      pp datetime          next timedif seque
<fct>      <fct> <int> <dbl> <fct> <dbl> <chr>            <dbl>   <dbl> <fct>
2012/06/21 23:23  2012    80 MUXX      1 2012-06-21 23:23     0    4144 10   
2012/07/15 11:38  2012    80 MAMO      0 2012-07-15 11:38     1   33855 01   
2012/07/20 22:19  2012    80 MICRO     0 2012-07-20 22:19     0    7841 00   
2012/07/29 23:03  2012    80 MICRO     0 2012-07-29 23:03     0   13004 00   
2012/10/18 2:54   2012    80 MICRO     0 2012-10-18 02:54     0    -971 00   
2012/10/23 2:49   2012    80 MICRO     0 2012-10-23 02:49     0   -1094 00   

Where:

  • pp: which species (sps) are predators (coded as 1) and which are prey (coded as 0)
  • next: very next pp after the current observation
  • timedif: time difference between the current observation and the next one
  • seque: this should be the sequence order: where the first number is the current pp and the second number is the next pp

To generate the datetime column, I did this:

d1$datetime=strftime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M',usetz=FALSE) #converting the date/time into a new format

To make the other columns I used the following code:

d1 = d1 %>% 
    ungroup() %>% 
    group_by(km, year) %>%
    mutate(next = dplyr::lag(pp)) %>% 
    mutate(timedif = as.numeric(as.POSIXct(datetime) - lag(as.POSIXct(datetime))))
d1 = d1[2:nrow(d1),] %>% mutate(seque = as.factor(paste0(pp,prev)))

I have two questions:

  1. My lag function appears to be recording the previous pp event, not the next pp event. How do I fix this?
  2. My timedif calculation is giving me negative values, which shouldn't be possible. Why is that happening?

Just in case, here is the output for str(d1):

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   96 obs. of  10 variables:
 $ date    : Factor w/ 1093 levels "2012/05/30","2012/05/31",..: 23 47 52 61 71 76 76 88 90 98 ...
 $ time    : Factor w/ 1439 levels "0:00","0:01",..: 983 219 919 963 1016 5 47 52 923 1058 ...
 $ year    : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 $ km      : num  80 80 80 80 80 80 80 80 80 80 ...
 $ sps     : Factor w/ 17 levels "CACA","ERDO",..: 11 7 9 9 9 9 9 4 9 11 ...
 $ pp      : num  1 0 0 0 0 0 0 0 0 1 ...
 $ datetime: chr  "2012-06-21 23:23" "2012-07-15 11:38" "2012-07-20 22:19" "2012-07-29 23:03" ...
 $ next    : num  0 1 0 0 0 0 0 0 0 0 ...
 $ timedif : num  4144 33855 7841 13004 14453 ...
 $ seque   : Factor w/ 4 levels "00","01","10",..: 3 2 1 1 1 1 1 1 1 3 ...

And also:

dput(d1[1:10,])

structure(list(
date = structure(c(23L, 47L, 52L, 61L, 71L, 76L, 76L, 88L, 90L, 98L), 
.Label = c("2012/05/30", "2012/05/31", "2012/06/01", "2012/06/02", "2012/06/03", "2012/06/04", "2012/06/05",  "2013/06/18", "2013/06/19", "2013/06/20", "2013/06/21", "2013/06/22", "2014/07/19", "2014/07/20", "2014/07/21", "2014/07/22", "2014/07/23", "2015/08/06", "2015/08/07", "2015/08/08", "2015/08/09", "2015/08/10"), 
class = "factor"), 
time = structure(c(983L, 219L, 919L, 963L, 1016L, 5L, 47L, 52L, 923L, 1058L), 
.Label = c("0:00", "0:01", "0:02", "0:03", "0:04", "0:05", "0:06", "0:07", "0:33","0:34", "0:35", "0:36", "0:37","10:06", "10:07", "10:08", "10:09", "10:10", "10:11", "10:12", "10:13",  "2:05", "2:06", "2:07", "2:08", "2:09", "2:10", "2:11", "9:54", "9:55", "9:56", "9:57", "9:58", "9:59"), 
class = "factor"), 
year = c(2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L), 
km = c(80, 80, 80, 80, 80, 80, 80, 80, 80, 80), 
sps = structure(c(11L, 7L, 9L, 9L, 9L, 9L, 9L, 4L, 9L, 11L), 
.Label = c("CACA", "ERDO", "FEDO", "LEAM", "LOCA", "MAAM", "MAMO", "MEME", "MICRO", "MUVI", "MUXX", "ONZI", "PRLO", "TAHU", "TAST", "URAM", "VUVU"), 
class = "factor"),  
pp = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 1), 
datetime = c("2012-06-21 23:23", "2012-07-15 11:38", "2012-07-20 22:19", "2012-07-29 23:03", "2012-08-08 23:56", "2012-08-13 00:04", "2012-08-13 00:46", "2012-08-25 00:51", "2012-08-27 22:23", "2012-09-04 03:38"), 
prev = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0), 
timedif = c(4144, 33855, 7841, 13004, 14453, 5768, 42, 17285, 4172, 10395),  
seque = structure(c(3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c("00", "01", "10", "11"), 
class = "factor")), 
class = c("tbl_df", "tbl", "data.frame"), 
row.names = c(NA, -10L))

来源:https://stackoverflow.com/questions/56171163/time-lag-between-sequential-observations-giving-negative-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!