merging tables based on time ranges/intervals using lubridate

守給你的承諾、 提交于 2020-01-05 04:00:13

问题


I am trying to merge two tables based on time ranges. I only found some old answers on this (e.g. Data Table merge based on date ranges) which don't use lubridate.

Actually, lubridate provides the %within% function which can check if a date is within an interval. I constructed a minimal example and wondering if there is a way to merge these data frames together based on the overlapping dates/intervals. So checking if df1$Date is in df2$interval.

library(lubridate)
df1 <- data.frame(Date=c(ymd('20161222'),ymd('20161223'),ymd('20161228'),ymd('20170322')),
                  User=c('a','b','a','a'),
                  Units=c(1,2,3,1))
df2 <- data.frame(User=c('a','b','a'),
                  Start=c(ymd('20140101'), ymd('20140101'), ymd('20170101')),
                  End=c(ymd('20161231'),ymd('20170331'),ymd('20170331')),
                  Price=c(10,10,20))
df2$interval <- interval(df2$Start, df2$End)

My expected output would be something like this

|   |User |Date       | Units| Price|
|:--|:----|:----------|-----:|-----:|
|1  |a    |2016-12-22 |     1|    10|
|3  |a    |2016-12-28 |     3|    10|
|6  |a    |2017-03-22 |     1|    20|
|7  |b    |2016-12-23 |     2|    10|

回答1:


This may be inefficient for large dataframes (since you're creating a much larger match and subsetting), and I'm sure there's a more elegant way, but this works:

output <- merge(df1,df2,by="User")[test$Date %within% test$interval,]

Or you could use a loop:

for(x in 1:length(df1$User)){
  df1$Price[x]<-df2[(df1$Date[x] %within% df2$interval)&df1$User[x]==df2$User,]$Price
}

I'm sure you could also make a function and use apply...



来源:https://stackoverflow.com/questions/42839399/merging-tables-based-on-time-ranges-intervals-using-lubridate

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!