Join and overwrite data in one table with data from another table

后端未结

关注

 3  2015

How to join and overwrite data appears to be a common request, but I have yet to find an elegant solution that applies to an entire dataset.

(Note: to simplify the d

相关标签:

3条回答

萌比男神i

2021-01-14 08:30
I think it's easiest to go to long form:
```
md1 = melt(d2, id="id")
md2 = melt(d2, id="id")
```
Then you can stack them and take the latest value:
```
res1 = unique(rbind(md1, md2), by=c("id", "variable"), fromLast=TRUE)
```
I'd also like to know how this can be done if you only want to update the NA values in [d3], that is, make sure existing non-NA values are not overwritten.

You can exclude rows from the update table, md2, if they appear in md3:
```
md3 = melt(d3, id="id")

res3 = unique(rbind(md3, md2[!md3, on=.(id, variable)]), 
  by=c("id", "variable"), fromLast=TRUE)   
```
dcast can be used to go back to wide format if necessary, e.g., dcast(res3, id ~ ...).
0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2021-01-14 08:31
Here's @Frank's solution from the comments. (Note: d1 and d2 need to be defined as data.table first).
```
library(data.table)
cols = setdiff(intersect(names(d1), names(d2)), "id") 
d1[d2, on=.(id), (cols) := mget(paste0("i.", cols))]
```
As he notes, the original solution I provided below is a bad idea generally speaking. If ids appear multiple times or in a different order, it will do the wrong thing.

~~d1[d1$id %in% d2$id, names(d2):=d2]~~
0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-01-14 08:42
```
library("dplyr")

d12 <- anti_join(d1, d2, by = "id") %>%
         bind_rows(d2)
```
This solution takes the rows from d1 that aren't in d2, then adds the d2 rows on to them.

This won't work for the 'Additional scenario', which looks much much messier to resolve, and maybe should be a separate question.
0 讨论(0)
发布评论:

提交评论
- 加载中...