This is similar to this dplyr lag post, and this dplyr mutate lag post, but neither of those ask this question about defaulting to the input value. I am using dplyr to muta
In the OP's code ...
... d) group_by(ip) %>% e) mutate(shifted = dplyr::lag(fulldate, default=fulldate)) %>% ...
The default=
argument should have a length of one. Replacing the OP's code with default = first(fulldate)
should work in this case (since the first element won't have a lag and so is where we need to apply the default value).
Related cases:
dplyr::lead(x, default=last(x))
. n
greater than 1), default=
cannot do it and we'd probably need to switch to if_else
or case_when
or similar. (I'm not sure about the current tidyverse idiom.)How about
ifelse(is.na(lag(value)), value, lag(value))
I think Frank's solution works pretty well. Here is the complete example:
library(dplyr, warn.conflicts = F)
test <- data.frame(ip=c("192.168.1.2","192.168.1.2","192.168.1.4","192.168.1.4","192.168.1.4","192.168.1.7"),
hour=c(2017070700,2017070700,2017070700,2017070701,2017070702,2017070700),
snap=c(0,15,0,45,30,15))
test %>%
mutate(snap = formatC(snap, width = 2, flag = 0)) %>%
mutate(fulldateint = paste(hour, snap, sep = "")) %>%
mutate(fulldate = as.POSIXct(strptime(fulldateint, "%Y%m%d%H%M"))) %>%
group_by(ip) %>%
mutate(shifted = lag(fulldate, default = first(fulldate))) %>%
mutate(diff = fulldate - shifted) %>%
ungroup() %>%
select(ip, fulldate, shifted, diff)
#> # A tibble: 6 x 4
#> ip fulldate shifted diff
#> <fctr> <dttm> <dttm> <time>
#> 1 192.168.1.2 2017-07-07 00:00:00 2017-07-07 00:00:00 0 secs
#> 2 192.168.1.2 2017-07-07 00:15:00 2017-07-07 00:00:00 900 secs
#> 3 192.168.1.4 2017-07-07 00:00:00 2017-07-07 00:00:00 0 secs
#> 4 192.168.1.4 2017-07-07 01:45:00 2017-07-07 00:00:00 6300 secs
#> 5 192.168.1.4 2017-07-07 02:30:00 2017-07-07 01:45:00 2700 secs
#> 6 192.168.1.7 2017-07-07 00:15:00 2017-07-07 00:15:00 0 secs