Merge nearest date, and related variables from a another dataframe by group

后端 未结 4 1180
花落未央
花落未央 2020-12-03 08:57

I have two dataframes each with multiple rows per ID. I need to return the closest date and related data from the second dataframe based on the ID and date of the first data

相关标签:
4条回答
  • 2020-12-03 09:03

    We can also do this by one-liner with dplyr.

    library(dplyr)
    
    left_join(df1, df2, by = "ID") %>%
      mutate(dateDiff = abs(dateTarget.x - dateTarget.y)) %>%
      group_by(ID, dateTarget.x) %>%
      filter(dateDiff == min(dateDiff))
    
    0 讨论(0)
  • 2020-12-03 09:04

    Using data.table, simple and elegant solution:

    library(data.table)
    
    setDT(df1)
    setDT(df2)
    
    setkey(df2, ID, dateTarget)[, dateMatch:=dateTarget]
    df2[df1, roll='nearest']
    
        ID dateTarget ValueMatch  dateMatch Value
     1:  3 2015-11-14         48 2015-07-06    47
     2:  3 2015-02-22         94 2015-03-09    52
     3:  1 2014-12-29         88 2014-12-06    18
     4:  3 2015-12-08         48 2015-07-06    98
     5:  2 2013-01-14         77 2013-04-08    52
     6:  2 2015-07-29         68 2015-08-01    97
     7:  3 2013-05-30         85 2013-04-01    91
     8:  1 2013-11-04         35 2014-02-21    70
     9:  2 2015-06-15         68 2015-08-01    98
    10:  3 2014-11-17         95 2014-12-15    68
    
    0 讨论(0)
  • 2020-12-03 09:25

    Here is the solution based on the base package:

    z <- lapply(intersect(df1$ID,df2$ID),function(id) {
       d1 <- subset(df1,ID==id)
       d2 <- subset(df2,ID==id)
    
       d1$indices <- sapply(d1$dateTarget,function(d) which.min(abs(d2$dateTarget - d)))
       d2$indices <- 1:nrow(d2)
    
       merge(d1,d2,by=c('ID','indices'))
      })
    
    z2 <- do.call(rbind,z)
    z2$indices <- NULL
    
    print(z2)
    
    #    ID dateTarget.x Value dateTarget.y ValueMatch
    # 1   3   2015-11-14    47   2015-07-06         48
    # 2   3   2015-12-08    98   2015-07-06         48
    # 3   3   2015-02-22    52   2015-03-09         94
    # 4   3   2014-11-17    68   2014-12-15         95
    # 5   3   2013-05-30    91   2013-04-01         85
    # 6   1   2013-11-04    70   2014-02-21         35
    # 7   1   2014-12-29    18   2014-12-06         88
    # 8   2   2013-01-14    52   2013-04-08         77
    # 9   2   2015-07-29    97   2015-08-01         68
    # 10  2   2015-06-15    98   2015-08-01         68
    
    0 讨论(0)
  • 2020-12-03 09:27

    Here's my take using dplyr, based on the accepted answer. I wanted to have a bit more freedom on the grouping column.

    
    match_by_group_date <- function(df1, df2, grp, datecol) {
      
      grp1 <- df1 %>% pull({{grp}}) %>% unique()
      grp2 <- df2 %>% pull({{grp}}) %>% unique()
      
      li <-
      lapply(intersect(grp1, grp2), function(tt) {
        d1 <- filter(df1, {{grp}}== tt)
        d2 <- filter(df2, {{grp}}==tt) %>% mutate(indices = 1:n())
        d2_date <- d2 %>% pull({{datecol}}) %>% as.POSIXct()
        print(d2_date)
        d1 <- mutate(d1, indices = map_dbl({{datecol}}, function(d) which.min(abs(d2_date - as.POSIXct(d)))))
        
        left_join(d1,d2, by=c(quo_name(enquo(grp)), "indices"))
      })
      
      # bind rows
      return(bind_rows(li))
    }
    
    0 讨论(0)
提交回复
热议问题