问题
I am trying to calculate the time between sequential observations. I have attached a sample of my data here.
A subset of my data looks like:
head(d1) #visualize the first few lines of the data
date time year km sps pp datetime next timedif seque
<fct> <fct> <int> <dbl> <fct> <dbl> <chr> <dbl> <dbl> <fct>
2012/06/21 23:23 2012 80 MUXX 1 2012-06-21 23:23 0 4144 10
2012/07/15 11:38 2012 80 MAMO 0 2012-07-15 11:38 1 33855 01
2012/07/20 22:19 2012 80 MICRO 0 2012-07-20 22:19 0 7841 00
2012/07/29 23:03 2012 80 MICRO 0 2012-07-29 23:03 0 13004 00
2012/10/18 2:54 2012 80 MICRO 0 2012-10-18 02:54 0 -971 00
2012/10/23 2:49 2012 80 MICRO 0 2012-10-23 02:49 0 -1094 00
Where:
pp
: which species (sps
) are predators (coded as 1) and which are prey (coded as 0)next
: very nextpp
after the current observationtimedif
: time difference between the current observation and the next oneseque
: this should be the sequence order: where the first number is the currentpp
and the second number is the nextpp
To generate the datetime
column, I did this:
d1$datetime=strftime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M',usetz=FALSE) #converting the date/time into a new format
To make the other columns I used the following code:
d1 = d1 %>%
ungroup() %>%
group_by(km, year) %>%
mutate(next = dplyr::lag(pp)) %>%
mutate(timedif = as.numeric(as.POSIXct(datetime) - lag(as.POSIXct(datetime))))
d1 = d1[2:nrow(d1),] %>% mutate(seque = as.factor(paste0(pp,prev)))
I have two questions:
- My lag function appears to be recording the previous
pp
event, not the nextpp
event. How do I fix this? - My
timedif
calculation is giving me negative values, which shouldn't be possible. Why is that happening?
Just in case, here is the output for str(d1)
:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 96 obs. of 10 variables:
$ date : Factor w/ 1093 levels "2012/05/30","2012/05/31",..: 23 47 52 61 71 76 76 88 90 98 ...
$ time : Factor w/ 1439 levels "0:00","0:01",..: 983 219 919 963 1016 5 47 52 923 1058 ...
$ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
$ km : num 80 80 80 80 80 80 80 80 80 80 ...
$ sps : Factor w/ 17 levels "CACA","ERDO",..: 11 7 9 9 9 9 9 4 9 11 ...
$ pp : num 1 0 0 0 0 0 0 0 0 1 ...
$ datetime: chr "2012-06-21 23:23" "2012-07-15 11:38" "2012-07-20 22:19" "2012-07-29 23:03" ...
$ next : num 0 1 0 0 0 0 0 0 0 0 ...
$ timedif : num 4144 33855 7841 13004 14453 ...
$ seque : Factor w/ 4 levels "00","01","10",..: 3 2 1 1 1 1 1 1 1 3 ...
And also:
dput(d1[1:10,])
structure(list(
date = structure(c(23L, 47L, 52L, 61L, 71L, 76L, 76L, 88L, 90L, 98L),
.Label = c("2012/05/30", "2012/05/31", "2012/06/01", "2012/06/02", "2012/06/03", "2012/06/04", "2012/06/05", "2013/06/18", "2013/06/19", "2013/06/20", "2013/06/21", "2013/06/22", "2014/07/19", "2014/07/20", "2014/07/21", "2014/07/22", "2014/07/23", "2015/08/06", "2015/08/07", "2015/08/08", "2015/08/09", "2015/08/10"),
class = "factor"),
time = structure(c(983L, 219L, 919L, 963L, 1016L, 5L, 47L, 52L, 923L, 1058L),
.Label = c("0:00", "0:01", "0:02", "0:03", "0:04", "0:05", "0:06", "0:07", "0:33","0:34", "0:35", "0:36", "0:37","10:06", "10:07", "10:08", "10:09", "10:10", "10:11", "10:12", "10:13", "2:05", "2:06", "2:07", "2:08", "2:09", "2:10", "2:11", "9:54", "9:55", "9:56", "9:57", "9:58", "9:59"),
class = "factor"),
year = c(2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L),
km = c(80, 80, 80, 80, 80, 80, 80, 80, 80, 80),
sps = structure(c(11L, 7L, 9L, 9L, 9L, 9L, 9L, 4L, 9L, 11L),
.Label = c("CACA", "ERDO", "FEDO", "LEAM", "LOCA", "MAAM", "MAMO", "MEME", "MICRO", "MUVI", "MUXX", "ONZI", "PRLO", "TAHU", "TAST", "URAM", "VUVU"),
class = "factor"),
pp = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 1),
datetime = c("2012-06-21 23:23", "2012-07-15 11:38", "2012-07-20 22:19", "2012-07-29 23:03", "2012-08-08 23:56", "2012-08-13 00:04", "2012-08-13 00:46", "2012-08-25 00:51", "2012-08-27 22:23", "2012-09-04 03:38"),
prev = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0),
timedif = c(4144, 33855, 7841, 13004, 14453, 5768, 42, 17285, 4172, 10395),
seque = structure(c(3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c("00", "01", "10", "11"),
class = "factor")),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -10L))
来源:https://stackoverflow.com/questions/56171163/time-lag-between-sequential-observations-giving-negative-values