R Left Outer Join with 0 Fill Instead of NA While Preserving Valid NA's in Left Table

前端 未结 3 1681
礼貌的吻别
礼貌的吻别 2021-02-18 22:53

What is the easiest way to do a left outer join on two data tables (dt1, dt2) with the fill value being 0 (or some other value) instead of NA (default) without overwriting valid

相关标签:
3条回答
  • 2021-02-18 23:36

    I stumbled on the same problem with dplyr and wrote a small function that solved my problem. (the solution requires tidyr and dplyr)

    left_join0 <- function(x, y, fill = 0L){
      z <- left_join(x, y)
      tmp <- setdiff(names(z), names(x))
      z <- replace_na(z, setNames(as.list(rep(fill, length(tmp))), tmp))
      z
    }
    
    0 讨论(0)
  • 2021-02-18 23:39

    Could you use column indices to refer only to the new columns, as with left_join they'll all be on the right of the resulting data.frame? Here it would be in dplyr:

    dt1 <- data.frame(x = c('a', 'b', 'c', 'd', 'e'),
                      y = c(NA, 'w', NA, 'y', 'z'),
                      stringsAsFactors = FALSE)
    dt2 <- data.frame(x = c('a', 'b', 'c'),
                      new_col = c(1,2,3),
                      stringsAsFactors = FALSE)
    
    merged <- left_join(dt1, dt2)
    index_new_col <- (ncol(dt1) + 1):ncol(merged)
    merged[, index_new_col][is.na(merged[, index_new_col])] <- 0
    
    > merged
      x    y new_col
    1 a <NA>       1
    2 b    w       2
    3 c <NA>       3
    4 d    y       0
    5 e    z       0
    
    0 讨论(0)
  • 2021-02-18 23:42

    The cleanest way at present may simply be to seed an intermediary table with the values to be joined on in the left table (dt1), chain a merge of dt2, set NA values to 0, merge intermediary table with dt1. Can be done entirely with data.table and doesn't depend on data.frame syntax, and the intermediary step ensures that there will be no nomatch NA results in the second merge:

    library(data.table);
    dt1 <- data.table(x=c('a', 'b', 'c', 'd', 'e'), y=c(NA, 'w', NA, 'y', 'z'));
    dt2 <- data.table(x=c('a', 'b', 'c'), new_col=c(1,2,3));
    setkey(dt1, x);
    setkey(dt2, x);
    inter_table <- dt2[dt1[, list(x)]];
    inter_table[is.na(inter_table)] <- 0;
    setkey(inter_table, x);
    merged <- inter_table[dt1];
    
    > merged;
       x new_col  y
    1: a       1 NA
    2: b       2  w
    3: c       3 NA
    4: d       0  y
    5: e       0  z
    

    The benefit of this approach is that it doesn't depend on new columns being added on the right and stays inside data.table keyed speed optimizations. Crediting answer to @SamFirke because his solution also works and may be more useful in other contexts.

    0 讨论(0)
提交回复
热议问题