What is the easiest way to do a left outer join on two data tables (dt1, dt2) with the fill value being 0 (or some other value) instead of NA (default) without overwriting valid
The cleanest way at present may simply be to seed an intermediary table with the values to be joined on in the left table (dt1), chain a merge of dt2, set NA values to 0, merge intermediary table with dt1. Can be done entirely with data.table
and doesn't depend on data.frame
syntax, and the intermediary step ensures that there will be no nomatch
NA results in the second merge:
library(data.table);
dt1 <- data.table(x=c('a', 'b', 'c', 'd', 'e'), y=c(NA, 'w', NA, 'y', 'z'));
dt2 <- data.table(x=c('a', 'b', 'c'), new_col=c(1,2,3));
setkey(dt1, x);
setkey(dt2, x);
inter_table <- dt2[dt1[, list(x)]];
inter_table[is.na(inter_table)] <- 0;
setkey(inter_table, x);
merged <- inter_table[dt1];
> merged;
x new_col y
1: a 1 NA
2: b 2 w
3: c 3 NA
4: d 0 y
5: e 0 z
The benefit of this approach is that it doesn't depend on new columns being added on the right and stays inside data.table
keyed speed optimizations. Crediting answer to @SamFirke because his solution also works and may be more useful in other contexts.