问题
I have a panel with many IDs, begin
and end
dates. begin
to end
date create an interval
of time.
id begin end interval overlap
1: 1 2010-01-31 2011-06-30 2009-08-04 UTC--2011-12-27 UTC TRUE
2: 1 2011-01-31 2012-06-30 2010-08-04 UTC--2012-12-27 UTC TRUE
3: 1 2012-01-31 2013-06-30 2011-08-04 UTC--2013-12-27 UTC TRUE
4: 1 2013-01-31 2014-06-30 2012-08-04 UTC--2014-12-27 UTC TRUE
5: 1 2013-02-28 2013-07-31 2012-09-01 UTC--2014-01-27 UTC TRUE
6: 1 2015-02-28 2015-03-31 2014-09-01 UTC--2015-09-27 UTC TRUE
7: 1 2015-06-30 2015-07-31 2015-01-01 UTC--2016-01-27 UTC TRUE
8: 1 2015-09-30 2016-01-31 2015-04-03 UTC--2016-07-29 UTC TRUE
9: 2 2010-01-31 2011-06-30 2009-08-04 UTC--2011-12-27 UTC TRUE
10: 2 2011-01-31 2012-06-30 2010-08-04 UTC--2012-12-27 UTC TRUE
11: 2 2012-01-31 2013-06-30 2011-08-04 UTC--2013-12-27 UTC TRUE
12: 2 2013-01-31 2014-06-30 2012-08-04 UTC--2014-12-27 UTC TRUE
13: 2 2013-02-28 2013-07-31 2012-09-01 UTC--2014-01-27 UTC TRUE
14: 2 2015-02-28 2015-03-31 2014-09-01 UTC--2015-09-27 UTC TRUE
15: 2 2015-06-30 2015-07-31 2015-01-01 UTC--2016-01-27 UTC TRUE
16: 2 2015-09-30 2016-01-31 2015-04-03 UTC--2016-07-29 UTC TRUE
I need to test whether, for each ID, any of the begin
/end
dates is included in another interval
(of the same ID).
For instance, id1
begin
(2010-01-31) is not included in any other period of id1
than in the first line. However, id1
end
date (2011-06-30) is included in the interval of the second row (2010-08-04 UTC--2012-12-27 UTC).
I have tried lubridate
interval and %within%
in data table but it yields TRUE as it is included in its corresponding period. I need to know if it is included in any other period of the same ID.
customer[begin %within% interval | end %within% interval, overlap := TRUE, by = id]
I have checked foverlap of data.table but seems design for joining different tables and other problems are just vectors but not panels with intervals.
Any ideas?
Data:
structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2), begin = structure(c(14640, 15005, 15370, 15736, 15764,
16494, 16616, 16708, 14640, 15005, 15370, 15736, 15764, 16494,
16616, 16708), class = "Date"), end = structure(c(15155, 15521,
15886, 16251, 15917, 16525, 16647, 16831, 15155, 15521, 15886,
16251, 15917, 16525, 16647, 16831), class = "Date"), interval = structure(c(75600000,
75686400, 75686400, 75600000, 44323200, 33782400, 33782400, 41731200,
75600000, 75686400, 75686400, 75600000, 44323200, 33782400, 33782400,
41731200), start = structure(c(1249344000, 1280880000, 1312416000,
1344038400, 1346457600, 1409529600, 1420070400, 1428019200, 1249344000,
1280880000, 1312416000, 1344038400, 1346457600, 1409529600, 1420070400,
1428019200), tzone = "UTC", class = c("POSIXct", "POSIXt")), tzone = "UTC", class = structure("Interval", package = "lubridate")),
overlap = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)), .Names = c("id",
"begin", "end", "interval", "overlap"), row.names = c(NA, -16L
), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000140788>)
回答1:
Here is one way to do it, using int_overlaps
from lubridate
. I have defined the intervals from the begin and end dates, although in your data they are different - perhaps you could clarify which is correct.
library(lubridate)
df$interval <- interval(as.POSIXct(df$begin),as.POSIXct(df$end))
df <- df[order(df$id),] #needs to be sorted by id for next stage to work
df$overlap <- unlist(tapply(df$interval, #loop through intervals
df$id, #grouped by id
function(x) rowSums(outer(x,x,int_overlaps))>1))
#check if more than one overlap in subset for that id
df
id begin end interval overlap
1 1 2010-01-31 2011-06-30 2010-01-31 UTC--2011-06-30 UTC TRUE
2 1 2011-01-31 2012-06-30 2011-01-31 UTC--2012-06-30 UTC TRUE
3 1 2012-01-31 2013-06-30 2012-01-31 UTC--2013-06-30 UTC TRUE
4 1 2013-01-31 2014-06-30 2013-01-31 UTC--2014-06-30 UTC TRUE
5 1 2013-02-28 2013-07-31 2013-02-28 UTC--2013-07-31 UTC TRUE
6 1 2015-02-28 2015-03-31 2015-02-28 UTC--2015-03-31 UTC FALSE
7 1 2015-06-30 2015-07-31 2015-06-30 UTC--2015-07-31 UTC FALSE
8 1 2015-09-30 2016-01-31 2015-09-30 UTC--2016-01-31 UTC FALSE
9 2 2010-01-31 2011-06-30 2010-01-31 UTC--2011-06-30 UTC TRUE
10 2 2011-01-31 2012-06-30 2011-01-31 UTC--2012-06-30 UTC TRUE
11 2 2012-01-31 2013-06-30 2012-01-31 UTC--2013-06-30 UTC TRUE
12 2 2013-01-31 2014-06-30 2013-01-31 UTC--2014-06-30 UTC TRUE
13 2 2013-02-28 2013-07-31 2013-02-28 UTC--2013-07-31 UTC TRUE
14 2 2015-02-28 2015-03-31 2015-02-28 UTC--2015-03-31 UTC FALSE
15 2 2015-06-30 2015-07-31 2015-06-30 UTC--2015-07-31 UTC FALSE
16 2 2015-09-30 2016-01-31 2015-09-30 UTC--2016-01-31 UTC FALSE
来源:https://stackoverflow.com/questions/44200364/find-dates-within-a-period-interval-by-group