Checking if Date is Between two Dates in R

后端 未结 3 1974
慢半拍i
慢半拍i 2020-11-30 14:54

I have two large datasets, df1 and df2. The first dataset, df1, contains the columns \'ID\' and \'actual.data\'.

df1 <- data.frame(ID=c(1,1,1,2,3,4,4), a         


        
相关标签:
3条回答
  • 2020-11-30 15:33

    You may use foverlaps from data.table. Convert both the 'data.frame's to 'data.table' with 'start/end' columns. Set the key column as the column names of each dataset. Use foverlaps to get the numeric index which can be converted to binary 'match' based on the NA values in it.

    library(data.table)#v1.9.5+
    dt1 <- data.table(ID=df1$ID, start=df1$actual.date, end=df1$actual.date)
    setkeyv(dt1, colnames(dt1))
    dt2 <- as.data.table(df2)
    setnames(dt2, 2:3, c('start', 'end'))
    setkeyv(dt2, colnames(dt2))
    indx <- foverlaps(dt1, dt2, type='within', which=TRUE, mult='first')
    dt1[, match:= +(!is.na(indx))][,end:=NULL]
    setnames(dt1, 1:2, colnames(df1))
    dt1
    #   ID actual.date match
    #1:  1  1997-10-01     0
    #2:  1  1998-02-01     1
    #3:  1  2002-05-01     1
    #4:  2  1999-07-01     0
    #5:  3  2005-09-01     1
    #6:  4  2003-02-03     1
    #7:  4  2006-05-01     0
    
    0 讨论(0)
  • 2020-11-30 15:47

    Here is a solution with dplyr

    library(dplyr)
    dat <- inner_join(df1, df2, by = "ID")
    dat %>% rowwise() %>%
            mutate(match = ifelse(between(actual.date, before.date, after.date), 1, 0)) %>%
            select(-c(before.date, after.date)) %>%
            arrange(actual.date, desc(match)) %>%
            distinct(actual.date)
    

    the output is slightly different because it order the actual.date, maybe this is a problem, I'll delete my solution if the case.

    Source: local data frame [7 x 3]
    
      ID actual.date match
    1  1  1997-10-01     0
    2  1  1998-02-01     1
    3  2  1999-07-01     0
    4  1  2002-05-01     1
    5  4  2003-02-03     1
    6  3  2005-09-01     1
    7  4  2006-05-01     0
    
    0 讨论(0)
  • 2020-11-30 15:49

    Just another hopefully correct answer using the fuzzyjoin package.

    library(data.table)
    library(fuzzyjoin)
    dt1 <- data.table(df1)
    dt2 <- data.table(df2)
    
    fuzzy_left_join(dt1
                    , dt2, 
                    by = c("ID" = "ID", "actual.date" = "before.date", "actual.date" = "after.date"), 
                    match_fun = list(`==`, `>`, `<`))[,.(ID = ID.x
                                                         ,actual.date
                                                         , match = ifelse(is.na(ID.y),0,1))]
    
    0 讨论(0)
提交回复
热议问题