Joining data frames by lubridate date %within% intervals

戏子无情 提交于 2020-06-25 18:10:45

问题


I've been practicing and learning wrangling R data frames with columns that contain lubridate data types, such as an example problem in my other question.

Now, I am trying to do the equivalent of joining two data frames, but joining them by whether one timestamp in one data frame falls within an interval in the other data frame. For example:

This is df1:

> glimpse(df1)
Observations: 6,160
Variables: 4
$ upload_id  <int> 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, ...
$ site_id    <int> 2, 2, 2, 2, 2, 4, 4, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, ...
$ segment_id <int> 1, 2, 3, 4, 5, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, ...
$ interval   <S4: Interval> 2015-04-12 UTC--2015-04-19 UTC, 2015-04-19 UTC--201...

Where there is a bunch of lubridate time intervals each with a corresponding unique combination of upload_id, site_id, and segment_id.

And this is df2:

> glimpse(df2)
Observations: 32,385
Variables: 3
$ sequence_id <int> 2047, 2067, 2069, 2072, 2075, 2081, 2086, 2091, 2096, 2104,...
$ upload_id   <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 5, 5,...
$ taken       <dttm> 2015-04-11 23:09:59, 2015-04-15 19:17:10, 2015-04-16 07:42...

Where there is a series of timestamps in column taken with corresponding unique combinations of sequence_id and upload_id.

Essentially, I want to left_join(df2, df1) where the needed by argument considers two things: (1) the shared upload_id column; and (2) whether taken in df2 falls within interval in df1. This is because for any given taken, it might fall %within% multiple intervals, and vice versa, so I want to use upload_id as a unique identifier for each taken so that each taken in df2 will be matched to only one other row in df1. After the join operation, I expect the new data frame to have six columns: sequence_id, taken, upload_id, site_id, segment_id, and interval. How can this be done tidyly?

EDIT: A comment suggested that uploading .Rdata files may be untrustworthy and another stated that it's against the policy here. So I removed the .Rdata files, and I tried to take a 300-row subset of each data frame via dput(), here is df1:

structure(list(upload_id = c(1050L, 1582L, 2336L, 2665L, 1007L, 
2148L, 275L, 2738L, 1501L, 64L, 2737L, 1547L, 2146L, 2596L, 457L, 
2141L, 2790L, 362L, 2835L, 2741L, 575L, 914L, 2820L, 2572L, 2791L, 
2157L, 1117L, 1535L, 2738L, 794L, 1335L, 2737L, 2570L, 1597L, 
300L, 460L, 1701L, 2142L, 274L, 339L, 2109L, 500L, 2184L, 2837L, 
1238L, 2837L, 2727L, 1175L, 1524L, 303L, 1714L, 1412L, 1894L, 
340L, 1495L, 869L, 995L, 2438L, 1974L, 2762L, 205L, 1581L, 1527L, 
2818L, 1617L, 2537L, 1956L, 638L, 1808L, 2151L, 771L, 2709L, 
2185L, 2015L, 2511L, 1163L, 2557L, 1377L, 2213L, 2560L, 1417L, 
1934L, 1860L, 2772L, 2614L, 2698L, 421L, 2609L, 1418L, 2355L, 
463L, 2697L, 347L, 1531L, 1427L, 2548L, 2218L, 2781L, 1962L, 
396L, 234L, 2846L, 4L, 2742L, 2838L, 1676L, 1635L, 2810L, 1990L, 
2514L, 2809L, 1354L, 2668L, 2737L, 1606L, 764L, 1176L, 1442L, 
519L, 2584L, 1021L, 352L, 2314L, 2662L, 1368L, 1043L, 2207L, 
2792L, 684L, 1806L, 2743L, 2557L, 1971L, 1510L, 418L, 1866L, 
1569L, 1717L, 1992L, 1629L, 2189L, 316L, 2030L, 2840L, 2307L, 
1506L, 1962L, 1249L, 2791L, 670L, 592L, 236L, 2781L, 793L, 2790L, 
2640L, 2517L, 855L, 626L, 1303L, 2241L, 1541L, 910L, 155L, 1617L, 
29L, 916L, 732L, 2006L, 2742L, 2788L, 2830L, 2664L, 1455L, 1062L, 
937L, 1543L, 781L, 737L, 901L, 2633L, 194L, 1000L, 1170L, 1567L, 
2826L, 73L, 801L, 970L, 1327L, 2688L, 1538L, 2306L, 2170L, 1977L, 
2367L, 186L, 1990L, 2606L, 2000L, 2818L, 396L, 696L, 630L, 2835L, 
2067L, 1540L, 51L, 511L, 2587L, 2737L, 1961L, 594L, 1867L, 1042L, 
116L, 1532L, 760L, 2662L, 2814L, 2585L, 2596L, 2837L, 1870L, 
1971L, 73L, 2595L, 1955L, 692L, 2062L, 2742L, 2084L, 1098L, 2205L, 
1404L, 2627L, 809L, 2684L, 2570L, 322L, 2605L, 2016L, 2782L, 
54L, 2254L, 1165L, 655L, 532L, 732L, 534L, 2664L, 1880L, 1444L, 
1920L, 477L, 2728L, 2640L, 1434L, 100L, 2587L, 1545L, 250L, 282L, 
1756L, 940L, 2826L, 1005L, 2835L, 2152L, 203L, 1970L, 579L, 1234L, 
2682L, 1050L, 2594L, 199L, 945L, 758L, 1262L, 796L, 2156L, 921L, 
1961L, 817L, 486L, 982L, 394L, 1928L, 2237L, 2570L, 2144L, 2386L, 
325L, 2729L, 2685L, 901L, 2042L, 141L, 2248L), site_id = c(184L, 
278L, 73L, 364L, 231L, 244L, 72L, 364L, 74L, 52L, 350L, 248L, 
223L, 306L, 117L, 223L, 350L, 115L, 357L, 295L, 113L, 74L, 350L, 
348L, 364L, 267L, 74L, 248L, 364L, 198L, 73L, 350L, 347L, 260L, 
103L, 134L, 271L, 223L, 72L, 120L, 73L, 145L, 214L, 350L, 74L, 
350L, 361L, 227L, 160L, 73L, 73L, 237L, 292L, 110L, 267L, 205L, 
230L, 74L, 306L, 295L, 47L, 261L, 44L, 357L, 280L, 355L, 199L, 
119L, 160L, 73L, 186L, 348L, 214L, 295L, 348L, 160L, 306L, 74L, 
191L, 350L, 73L, 191L, 191L, 364L, 306L, 364L, 74L, 73L, 74L, 
74L, 155L, 350L, 54L, 248L, 260L, 114L, 241L, 360L, 292L, 31L, 
36L, 73L, 7L, 360L, 364L, 74L, 262L, 361L, 292L, 350L, 360L, 
256L, 73L, 350L, 280L, 184L, 44L, 258L, 146L, 347L, 217L, 44L, 
113L, 357L, 191L, 233L, 245L, 360L, 156L, 293L, 360L, 306L, 292L, 
226L, 74L, 36L, 73L, 73L, 199L, 244L, 241L, 110L, 295L, 361L, 
248L, 251L, 292L, 113L, 364L, 74L, 160L, 105L, 360L, 202L, 350L, 
306L, 351L, 201L, 160L, 247L, 320L, 248L, 213L, 54L, 280L, 41L, 
198L, 187L, 74L, 360L, 357L, 287L, 350L, 44L, 234L, 105L, 248L, 
200L, 174L, 198L, 73L, 54L, 217L, 236L, 277L, 361L, 63L, 194L, 
160L, 73L, 361L, 248L, 320L, 74L, 293L, 73L, 68L, 292L, 350L, 
199L, 357L, 31L, 166L, 165L, 357L, 312L, 248L, 42L, 148L, 350L, 
350L, 147L, 116L, 248L, 174L, 47L, 226L, 74L, 357L, 73L, 348L, 
306L, 350L, 293L, 292L, 63L, 348L, 298L, 174L, 316L, 360L, 312L, 
227L, 319L, 237L, 350L, 160L, 348L, 347L, 108L, 306L, 293L, 361L, 
54L, 74L, 74L, 73L, 56L, 187L, 74L, 350L, 199L, 74L, 271L, 56L, 
360L, 306L, 226L, 72L, 350L, 248L, 90L, 91L, 74L, 44L, 361L, 
217L, 357L, 73L, 55L, 191L, 73L, 226L, 347L, 184L, 357L, 95L, 
218L, 196L, 249L, 197L, 74L, 74L, 147L, 199L, 145L, 217L, 136L, 
295L, 73L, 347L, 223L, 113L, 47L, 350L, 350L, 198L, 310L, 23L, 
74L), segment_id = c(3L, 1L, 1L, 1L, 1L, 2L, 1L, 5L, 1L, 1L, 
7L, 1L, 2L, 7L, 1L, 1L, 3L, 3L, 7L, 1L, 2L, 1L, 8L, 2L, 11L, 
1L, 1L, 3L, 6L, 1L, 1L, 8L, 2L, 2L, 4L, 5L, 3L, 1L, 1L, 1L, 1L, 
3L, 1L, 17L, 1L, 3L, 4L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 
5L, 1L, 1L, 2L, 1L, 1L, 2L, 7L, 4L, 2L, 3L, 1L, 1L, 1L, 3L, 3L, 
1L, 6L, 2L, 2L, 5L, 1L, 2L, 5L, 1L, 2L, 3L, 2L, 4L, 3L, 1L, 1L, 
2L, 1L, 4L, 13L, 3L, 2L, 1L, 2L, 3L, 6L, 5L, 5L, 3L, 1L, 2L, 
7L, 10L, 1L, 1L, 1L, 7L, 4L, 2L, 2L, 1L, 9L, 1L, 1L, 1L, 10L, 
3L, 4L, 6L, 1L, 4L, 9L, 1L, 1L, 1L, 10L, 2L, 1L, 4L, 4L, 1L, 
1L, 1L, 1L, 1L, 1L, 8L, 1L, 1L, 1L, 7L, 15L, 2L, 8L, 7L, 3L, 
6L, 1L, 1L, 1L, 8L, 1L, 23L, 4L, 3L, 2L, 2L, 2L, 2L, 4L, 1L, 
1L, 3L, 2L, 5L, 1L, 1L, 6L, 5L, 1L, 12L, 2L, 2L, 1L, 1L, 3L, 
1L, 2L, 1L, 2L, 5L, 2L, 1L, 6L, 4L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 
2L, 1L, 4L, 5L, 5L, 7L, 4L, 17L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 
1L, 18L, 4L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 8L, 6L, 2L, 1L, 
6L, 1L, 1L, 2L, 1L, 1L, 10L, 1L, 1L, 1L, 2L, 10L, 1L, 15L, 4L, 
4L, 3L, 4L, 12L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 11L, 1L, 1L, 2L, 
2L, 2L, 7L, 3L, 1L, 2L, 4L, 2L, 2L, 1L, 2L, 16L, 2L, 4L, 1L, 
2L, 1L, 1L, 2L, 14L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 
1L, 1L, 3L, 1L, 2L, 1L, 7L, 2L, 1L, 2L, 2L, 15L, 6L, 1L, 1L, 
1L), interval = new("Interval", .Data = c(604800, 86400, 86400, 
259200, 604800, 604800, 604800, 604800, 86400, 86400, 604800, 
604800, 518400, 604800, 86400, 604800, 604800, 604800, 604800, 
518400, 604800, 86400, 604800, 604800, 259200, 604800, 86400, 
604800, 604800, 518400, 172800, 604800, 604800, 604800, 172800, 
432000, 604800, 604800, 259200, 432000, 86400, 604800, 432000, 
604800, 86400, 604800, 604800, 604800, 604800, 86400, 86400, 
604800, 604800, 604800, 604800, 172800, 604800, 345600, 518400, 
604800, 345600, 604800, 86400, 86400, 604800, 604800, 604800, 
604800, 604800, 86400, 86400, 604800, 518400, 604800, 604800, 
86400, 604800, 86400, 86400, 604800, 604800, 432000, 604800, 
604800, 604800, 604800, 86400, 86400, 259200, 86400, 604800, 
604800, 259200, 604800, 604800, 604800, 259200, 604800, 604800, 
604800, 604800, 86400, 604800, 604800, 604800, 172800, 604800, 
604800, 604800, 432000, 604800, 604800, 86400, 604800, 604800, 
518400, 518400, 604800, 604800, 604800, 172800, 604800, 604800, 
86400, 604800, 604800, 604800, 604800, 86400, 518400, 604800, 
604800, 604800, 518400, 518400, 604800, 86400, 86400, 172800, 
604800, 604800, 259200, 604800, 604800, 604800, 604800, 432000, 
604800, 604800, 86400, 604800, 432000, 604800, 604800, 604800, 
604800, 604800, 86400, 518400, 604800, 604800, 604800, 604800, 
518400, 604800, 604800, 604800, 604800, 172800, 604800, 86400, 
604800, 604800, 604800, 345600, 604800, 604800, 604800, 604800, 
604800, 86400, 86400, 345600, 172800, 172800, 604800, 604800, 
518400, 604800, 86400, 604800, 604800, 604800, 172800, 604800, 
86400, 86400, 604800, 604800, 604800, 604800, 432000, 604800, 
604800, 604800, 172800, 604800, 345600, 604800, 604800, 604800, 
604800, 604800, 604800, 172800, 604800, 172800, 86400, 604800, 
86400, 604800, 604800, 604800, 604800, 604800, 604800, 604800, 
604800, 604800, 86400, 518400, 259200, 604800, 604800, 604800, 
604800, 432000, 604800, 604800, 86400, 604800, 604800, 604800, 
259200, 86400, 86400, 86400, 518400, 86400, 86400, 604800, 604800, 
259200, 345600, 604800, 604800, 604800, 604800, 172800, 604800, 
604800, 259200, 604800, 86400, 86400, 604800, 604800, 604800, 
86400, 172800, 604800, 86400, 604800, 604800, 604800, 172800, 
432000, 604800, 518400, 345600, 518400, 86400, 86400, 604800, 
604800, 604800, 604800, 172800, 604800, 86400, 604800, 518400, 
86400, 604800, 604800, 518400, 172800, 259200, 86400, 86400), 
    start = structure(c(1463097600, 1479081600, 1499817600, 1511654400, 
    1464912000, 1493337600, 1440028800, 1514073600, 1478995200, 
    1438128000, 1507593600, 1475193600, 1491782400, 1507593600, 
    1445212800, 1487462400, 1505174400, 1445731200, 1519084800, 
    1515456000, 1449964800, 1463529600, 1508198400, 1504483200, 
    1517702400, 1485648000, 1468195200, 1476403200, 1514678400, 
    1460073600, 1472860800, 1508198400, 1504483200, 1475798400, 
    1444348800, 1451692800, 1481587200, 1488153600, 1439769600, 
    1445126400, 1492732800, 1449446400, 1494201600, 1513641600, 
    1470441600, 1505174400, 1510704000, 1469145600, 1478563200, 
    1444780800, 1483228800, 1475280000, 1485129600, 1444867200, 
    1477267200, 1462492800, 1464652800, 1503532800, 1488931200, 
    1516060800, 1441584000, 1475884800, 1479772800, 1519084800, 
    1478908800, 1505952000, 1486598400, 1444608000, 1485216000, 
    1493942400, 1459814400, 1505088000, 1494201600, 1488240000, 
    1504483200, 1469491200, 1506384000, 1474502400, 1495411200, 
    1506384000, 1475366400, 1487548800, 1485734400, 1512259200, 
    1505779200, 1512864000, 1448496000, 1509494400, 1475884800, 
    1500422400, 1448582400, 1511222400, 1444348800, 1474416000, 
    1475193600, 1506038400, 1495411200, 1513036800, 1487548800, 
    1439856000, 1441497600, 1519948800, 1428192000, 1513641600, 
    1517097600, 1481673600, 1475884800, 1508889600, 1488758400, 
    1505779200, 1510617600, 1471305600, 1511913600, 1508803200, 
    1477094400, 1457481600, 1469577600, 1473206400, 1449187200, 
    1505692800, 1465776000, 1444694400, 1497744000, 1511827200, 
    1473465600, 1465516800, 1494892800, 1515456000, 1454803200, 
    1485216000, 1511827200, 1505779200, 1485129600, 1478649600, 
    1447977600, 1465516800, 1479945600, 1483315200, 1489622400, 
    1479340800, 1494201600, 1444867200, 1488844800, 1517356800, 
    1495756800, 1477785600, 1488758400, 1468800000, 1514678400, 
    1455753600, 1452556800, 1442534400, 1514246400, 1456617600, 
    1517270400, 1505779200, 1505606400, 1462147200, 1453852800, 
    1471824000, 1495584000, 1477008000, 1462579200, 1439596800, 
    1478304000, 1433808000, 1462492800, 1457395200, 1489881600, 
    1513036800, 1517875200, 1518912000, 1510617600, 1476230400, 
    1466121600, 1463443200, 1475193600, 1458432000, 1457395200, 
    1460678400, 1510617600, 1441324800, 1465171200, 1469491200, 
    1477872000, 1511913600, 1439510400, 1460332800, 1464134400, 
    1472774400, 1508889600, 1476403200, 1494979200, 1494460800, 
    1485820800, 1501027200, 1441324800, 1487548800, 1506384000, 
    1489017600, 1517270400, 1447113600, 1455580800, 1453680000, 
    1516060800, 1491264000, 1475193600, 1437696000, 1449446400, 
    1503964800, 1514246400, 1487030400, 1452124800, 1485216000, 
    1464825600, 1438905600, 1479772800, 1459641600, 1506988800, 
    1518739200, 1508112000, 1506988800, 1504569600, 1485216000, 
    1488153600, 1437696000, 1503878400, 1487808000, 1455321600, 
    1489881600, 1515456000, 1491609600, 1466121600, 1494201600, 
    1471651200, 1509408000, 1460592000, 1512345600, 1505692800, 
    1445040000, 1505174400, 1487030400, 1515542400, 1437868800, 
    1496620800, 1469577600, 1455235200, 1450224000, 1.458e+09, 
    1450828800, 1510012800, 1485388800, 1476835200, 1487894400, 
    1447977600, 1510617600, 1507593600, 1474934400, 1438905600, 
    1504569600, 1477008000, 1443312000, 1443312000, 1484524800, 
    1464048000, 1517961600, 1463356800, 1517270400, 1494028800, 
    1441238400, 1488758400, 1452643200, 1470700800, 1511740800, 
    1461888000, 1508803200, 1441238400, 1463616000, 1455062400, 
    1471478400, 1460073600, 1494115200, 1463616000, 1488240000, 
    1460073600, 1448236800, 1463961600, 1447372800, 1485820800, 
    1496102400, 1507507200, 1489968000, 1499126400, 1444176000, 
    1504569600, 1512432000, 1463097600, 1490745600, 1440028800, 
    1496448000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    tzone = "UTC")), row.names = c(NA, -300L), class = c("tbl_df", 
"tbl", "data.frame"))

And here is df2:

structure(list(sequence_id = c(10545297L, 5696697L, 26853675L, 
26800598L, 5477912L, 3564676L, 11545989L, 26788357L, 26790778L, 
4682984L, 12887744L, 4254651L, 6472328L, 18236650L, 26829066L, 
26784117L, 26886686L, 797197L, 26820954L, 26791541L, 11657412L, 
3960964L, 10189029L, 21286407L, 12914356L, 26793531L, 26802965L, 
12435451L, 5484298L, 26827162L, 26853752L, 25711869L, 9030699L, 
14386264L, 26802894L, 26377583L, 13291447L, 1851672L, 26790782L, 
9900386L, 26797667L, 6561255L, 26818879L, 11648069L, 14259988L, 
26809952L, 26809264L, 15071783L, 26791374L, 26853008L, 6762100L, 
26853620L, 26880265L, 26878102L, 26809279L, 26787754L, 5502014L, 
17810813L, 18236753L, 5568166L, 9252741L, 26786093L, 18418962L, 
1218679L, 26801395L, 16954415L, 26853619L, 26800113L, 26817488L, 
26811724L, 26809375L, 26809666L, 5869152L, 7681085L, 26894216L, 
15810230L, 26829083L, 26817434L, 26789887L, 26785533L, 26796803L, 
26786930L, 26825007L, 26784040L, 26810066L, 26853657L, 18236660L, 
26797322L, 26825026L, 4103811L, 26878149L, 10545137L, 26784075L, 
26902434L, 3948950L, 26816568L, 11453844L, 26826969L, 26813846L, 
26897750L, 26802715L, 26790888L, 26815971L, 26797683L, 4726015L, 
4617411L, 26797067L, 9252726L, 26797067L, 26785670L, 26789320L, 
26901211L, 26894241L, 499985L, 26825082L, 21774171L, 26803324L, 
26815122L, 56056L, 18236919L, 5425808L, 13209778L, 4726052L, 
14386262L, 5477952L, 5564830L, 9756473L, 26894173L, 7136912L, 
26792378L, 26878986L, 7726907L, 26903079L, 9517618L, 10730383L, 
21774142L, 26901299L, 15071807L, 26786514L, 26901389L, 26903784L, 
26802651L, 7817686L, 26805379L, 4617432L, 21624158L, 9656749L, 
26789389L, 25399602L, 26901650L, 26797702L, 9900332L, 10965877L, 
15268795L, 26896376L, 26787716L, 26851798L, 15810222L, 12887738L, 
26827055L, 16102402L, 26796994L, 26784422L, 14725739L, 26901257L, 
26853712L, 26785221L, 26793075L, 11658007L, 26823570L, 26791524L, 
26797467L, 26796972L, 8501567L, 26799777L, 5572466L, 26787249L, 
18385461L, 4791179L, 15810380L, 26808430L, 10239023L, 26790569L, 
26805358L, 18158022L, 15810244L, 26878116L, 10623114L, 267502L, 
9517623L, 16102411L, 26377567L, 8230310L, 13076594L, 26878082L, 
415271L, 13833529L, 26823199L, 2410L, 26900200L), upload_id = c(851L, 
592L, 2314L, 1799L, 546L, 357L, 925L, 299L, 1611L, 465L, 976L, 
424L, 641L, 1249L, 2274L, 1436L, 2556L, 157L, 2166L, 1666L, 928L, 
388L, 836L, 1405L, 977L, 1698L, 1928L, 961L, 547L, 2261L, 2316L, 
1486L, 774L, 1038L, 1920L, 1503L, 993L, 229L, 1611L, 819L, 1767L, 
651L, 2151L, 927L, 1034L, 2049L, 2028L, 1074L, 1629L, 2302L, 
666L, 2314L, 2434L, 2387L, 2028L, 392L, 557L, 1217L, 1249L, 564L, 
783L, 883L, 1265L, 179L, 1846L, 1159L, 2314L, 1783L, 2138L, 2079L, 
2035L, 2045L, 594L, 736L, 2569L, 1102L, 2277L, 2089L, 52L, 1025L, 
1746L, 669L, 2230L, 1506L, 2055L, 2314L, 1249L, 1757L, 2230L, 
406L, 2387L, 851L, 1506L, 2787L, 385L, 2128L, 922L, 2251L, 2102L, 
2711L, 1907L, 1605L, 2125L, 1767L, 459L, 458L, 1746L, 783L, 1746L, 
1000L, 98L, 2750L, 2569L, 122L, 2230L, 1416L, 1929L, 2110L, 41L, 
1249L, 542L, 985L, 459L, 1038L, 546L, 563L, 815L, 2569L, 681L, 
1665L, 2419L, 738L, 2821L, 792L, 879L, 1416L, 2751L, 1074L, 779L, 
2755L, 2849L, 1904L, 740L, 1951L, 458L, 1399L, 810L, 98L, 1479L, 
2760L, 1767L, 819L, 891L, 1086L, 2693L, 440L, 2292L, 1102L, 976L, 
2257L, 1106L, 1746L, 1442L, 1055L, 2751L, 2314L, 1400L, 1680L, 
929L, 2194L, 1661L, 1765L, 1746L, 769L, 1774L, 570L, 572L, 1264L, 
473L, 1102L, 2009L, 838L, 1586L, 1951L, 1235L, 1102L, 2387L, 
864L, 95L, 792L, 1106L, 1503L, 762L, 984L, 2387L, 120L, 1012L, 
1681L, 5L, 2722L), taken = structure(c(1461607098, 1357440699, 
1497946386, 1480535568, 1450529748, 1446385695, 1463741872, 1444334424, 
1479280400, 1449136788, 1462488333, 1448183687, 1454753449, 1467598406, 
1497333513, 1475588136, 1507455271, 1440251873, 1494085620, 1481115392, 
1463814473, 1441262063, 1461931738, 1471111946, 1462814426, 1482484495, 
1488369500, 1463341759, 1451394079, 1496897690, 1499171773, 1478337380, 
1459646439, 1465542945, 1487492476, 1478507314, 1465151499, 1440878596, 
1479297148, 1461237979, 1484471493, 1455032917, 1493960869, 1462284996, 
1465967563, 1490769440, 1490547948, 1458713033, 1480133603, 1498456304, 
1454837375, 1497347897, 1502541854, 1499517904, 1490563199, 1443806209, 
1451728803, 1469188230, 1468317942, 1452000085, 1459446443, 1462629579, 
1469694294, 1438787731, 1486631809, 1469203046, 1497347627, 1485346076, 
1493760152, 1491737060, 1490640549, 1490971607, 1452390124, 1458148243, 
1506439827, 1465194751, 1497427230, 1493546423, 1437499385, 1465909309, 
1479587401, 1455275863, 1494462120, 1475150180, 1486585139, 1497692625, 
1467632404, 1483992126, 1494818410, 1443259589, 1499966514, 1461252282, 
1476463125, 1517825105, 1439276459, 1492732155, 1463060151, 1496495881, 
1492443646, 1513698078, 1487699018, 1478033857, 1493459209, 1484574255, 
1445463014, 1445377602, 1482270132, 1459068085, 1482270132, 1465324190, 
1437645893, 1516448011, 1506768001, 1439499230, 1495154336, 1475995917, 
1487326465, 1492842646, 1437512735, 1471084135, 1451331488, 1464596049, 
1445487433, 1465542768, 1450654515, 1450251138, 1458756627, 1505539318, 
1456158745, 1481191991, 1502958079, 1456851898, 1519301621, 1460132323, 
1462246721, 1475745018, 1516537759, 1459318655, 1460122320, 1514916703, 
1520412137, 1488024066, 1458195162, 1487453288, 1445389049, 1474006970, 
1459754632, 1438269539, 1477661255, 1516007192, 1484753445, 1461136855, 
1463031275, 1466667291, 1509613313, 1441042946, 1497589967, 1465033581, 
1462417047, 1496682390, 1467178192, 1481293492, 1469788770, 1462814225, 
1516529474, 1498386350, 1470051133, 1481928052, 1463302826, 1495262048, 
1480681123, 1483683739, 1481041639, 1459773430, 1484652813, 1451208417, 
1451471584, 1467788032, 1445564488, 1466521584, 1490178592, 1461418924, 
1478867863, 1486761277, 1470424975, 1465375208, 1499603574, 1462529520, 
1438348434, 1460184847, 1467258314, 1478446800, 1457830628, 1464092571, 
1499339617, 1439448916, 1465530027, 1491299676, 1431043226, 1511424274
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
-200L), class = c("tbl_df", "tbl", "data.frame"))

The problem with these subsets is that I'm not sure how much overlap remains between the two of them for the join, but hopefully there will be some. I tried to filter() one to include upload_ids from the other, but I get an error saying:

Error in filter_impl(.data, quo) : Column interval classes Period and Interval from lubridate are currently not supported.

Sorry this sounds complicated, please let me know if I can clarify this question further. I am truly grateful for your help!


回答1:


You can use the fuzzyjoin package:

library(BiocManager)
library(lubridate)
library(fuzzyjoin)
colnames(df2) <- c("sequence_id", "upload_id",  "start") 
df1$start <- int_start(df1$interval)
df1$end <- int_end(df1$interval)
df2$end <- df2$start

df3 <- interval_inner_join(df1, df2, by=c("start", "end"))   # let 1 join with 2


来源:https://stackoverflow.com/questions/51412533/joining-data-frames-by-lubridate-date-within-intervals

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!