问题
I am trying to merge two tables based on time ranges. I only found some old answers on this (e.g. Data Table merge based on date ranges) which don't use lubridate
.
Actually, lubridate
provides the %within%
function which can check if a date is within an interval. I constructed a minimal example and wondering if there is a way to merge these data frames together based on the overlapping dates/intervals. So checking if df1$Date
is in df2$interval
.
library(lubridate)
df1 <- data.frame(Date=c(ymd('20161222'),ymd('20161223'),ymd('20161228'),ymd('20170322')),
User=c('a','b','a','a'),
Units=c(1,2,3,1))
df2 <- data.frame(User=c('a','b','a'),
Start=c(ymd('20140101'), ymd('20140101'), ymd('20170101')),
End=c(ymd('20161231'),ymd('20170331'),ymd('20170331')),
Price=c(10,10,20))
df2$interval <- interval(df2$Start, df2$End)
My expected output would be something like this
| |User |Date | Units| Price|
|:--|:----|:----------|-----:|-----:|
|1 |a |2016-12-22 | 1| 10|
|3 |a |2016-12-28 | 3| 10|
|6 |a |2017-03-22 | 1| 20|
|7 |b |2016-12-23 | 2| 10|
回答1:
This may be inefficient for large dataframes (since you're creating a much larger match and subsetting), and I'm sure there's a more elegant way, but this works:
output <- merge(df1,df2,by="User")[test$Date %within% test$interval,]
Or you could use a loop:
for(x in 1:length(df1$User)){
df1$Price[x]<-df2[(df1$Date[x] %within% df2$interval)&df1$User[x]==df2$User,]$Price
}
I'm sure you could also make a function and use apply...
来源:https://stackoverflow.com/questions/42839399/merging-tables-based-on-time-ranges-intervals-using-lubridate