I have a data frame of time related events.
Here is an example:
Name Event Order Sequence start_event end_event duration Gr
As I understand it, you want to return any row where an event for John with a particular sequence number overlaps an event for anybody else with the same sequence value. To achieve this, you could use split-apply-combine to split by sequence, identify the overlapping rows, and then re-combine:
overlap <- function(start1, end1, start2, end2) pmin(end1, end2) > pmax(start2, start1)
do.call(rbind, lapply(split(dat, dat$Sequence), function(x) {
jpos <- which(x$Name == "JOHN")
njpos <- which(x$Name != "JOHN")
over <- outer(jpos, njpos, function(a, b) {
overlap(x$start_event[a], x$end_event[a], x$start_event[b], x$end_event[b])
})
x[c(jpos[rowSums(over) > 0], njpos[colSums(over) > 0]),]
}))
# Name EventOrder Sequence start_event end_event duration Group
# A.2 JOHN 2 A 60 112 52 ID1
# A.3 JOHN 3 A 392 429 37 ID1
# A.7 ADAM 1 A 19 75 56 ID2
# A.8 ADAM 2 A 384 407 23 ID2
# C.5 JOHN 5 C 147 226 79 ID1
# C.6 JOHN 6 C 566 611 45 ID1
# C.11 ADAM 5 C 140 205 65 ID2
# C.12 ADAM 6 C 522 599 77 ID2
Note that my output includes two additional rows that are not shown in the question -- sequence A for John from time range [60, 112], which overlaps sequence A for Adam from time range [19, 75].
This could be pretty easily mapped into dplyr
language:
library(dplyr)
overlap <- function(start1, end1, start2, end2) pmin(end1, end2) > pmax(start2, start1)
sliceRows <- function(name, start, end) {
jpos <- which(name == "JOHN")
njpos <- which(name != "JOHN")
over <- outer(jpos, njpos, function(a, b) overlap(start[a], end[a], start[b], end[b]))
c(jpos[rowSums(over) > 0], njpos[colSums(over) > 0])
}
dat %>%
group_by(Sequence) %>%
slice(sliceRows(Name, start_event, end_event))
# Source: local data frame [8 x 7]
# Groups: Sequence [3]
#
# Name EventOrder Sequence start_event end_event duration Group
# (fctr) (int) (fctr) (int) (int) (int) (fctr)
# 1 JOHN 2 A 60 112 52 ID1
# 2 JOHN 3 A 392 429 37 ID1
# 3 ADAM 1 A 19 75 56 ID2
# 4 ADAM 2 A 384 407 23 ID2
# 5 JOHN 5 C 147 226 79 ID1
# 6 JOHN 6 C 566 611 45 ID1
# 7 ADAM 5 C 140 205 65 ID2
# 8 ADAM 6 C 522 599 77 ID2
If you wanted to be able to compute the overlaps for a specified pair of users, this could be done by wrapping the operation into a function that specifies the pair of users to be processed:
overlap <- function(start1, end1, start2, end2) pmin(end1, end2) > pmax(start2, start1)
pair.overlap <- function(dat, user1, user2) {
dat <- dat[dat$Name %in% c(user1, user2),]
do.call(rbind, lapply(split(dat, dat$Sequence), function(x) {
jpos <- which(x$Name == user1)
njpos <- which(x$Name == user2)
over <- outer(jpos, njpos, function(a, b) {
overlap(x$start_event[a], x$end_event[a], x$start_event[b], x$end_event[b])
})
x[c(jpos[rowSums(over) > 0], njpos[colSums(over) > 0]),]
}))
}
You could use pair.overlap(dat, "JOHN", "ADAM")
to get the previous output. Generating the overlaps for every pair of users can now be done with combn
and apply
:
apply(combn(unique(as.character(dat$Name)), 2), 2, function(x) pair.overlap(dat, x[1], x[2]))