Join and overwrite data in one table with data from another table

后端 未结 3 2015
生来不讨喜
生来不讨喜 2021-01-14 07:57

How to join and overwrite data appears to be a common request, but I have yet to find an elegant solution that applies to an entire dataset.

(Note: to simplify the d

相关标签:
3条回答
  • 2021-01-14 08:30

    I think it's easiest to go to long form:

    md1 = melt(d2, id="id")
    md2 = melt(d2, id="id")
    

    Then you can stack them and take the latest value:

    res1 = unique(rbind(md1, md2), by=c("id", "variable"), fromLast=TRUE)
    

    I'd also like to know how this can be done if you only want to update the NA values in [d3], that is, make sure existing non-NA values are not overwritten.

    You can exclude rows from the update table, md2, if they appear in md3:

    md3 = melt(d3, id="id")
    
    res3 = unique(rbind(md3, md2[!md3, on=.(id, variable)]), 
      by=c("id", "variable"), fromLast=TRUE)   
    

    dcast can be used to go back to wide format if necessary, e.g., dcast(res3, id ~ ...).

    0 讨论(0)
  • 2021-01-14 08:31

    Here's @Frank's solution from the comments. (Note: d1 and d2 need to be defined as data.table first).

    library(data.table)
    cols = setdiff(intersect(names(d1), names(d2)), "id") 
    d1[d2, on=.(id), (cols) := mget(paste0("i.", cols))]
    

    As he notes, the original solution I provided below is a bad idea generally speaking. If ids appear multiple times or in a different order, it will do the wrong thing.

    d1[d1$id %in% d2$id, names(d2):=d2]

    0 讨论(0)
  • 2021-01-14 08:42
    library("dplyr")
    
    d12 <- anti_join(d1, d2, by = "id") %>%
             bind_rows(d2)
    

    This solution takes the rows from d1 that aren't in d2, then adds the d2 rows on to them.

    This won't work for the 'Additional scenario', which looks much much messier to resolve, and maybe should be a separate question.

    0 讨论(0)
提交回复
热议问题