How to implement coalesce efficiently in R

前端 未结 8 2469
深忆病人
深忆病人 2020-11-21 23:20

Background

Several SQL languages (I mostly use postgreSQL) have a function called coalesce which returns the first non null column element for each row. This can b

8条回答
  •  盖世英雄少女心
    2020-11-21 23:51

    Using dplyr package:

    library(dplyr)
    coalesce(a, b, c)
    # [1]  1  2 NA  4  6
    

    Benchamark, not as fast as accepted solution:

    coalesce2 <- function(...) {
      Reduce(function(x, y) {
        i <- which(is.na(x))
        x[i] <- y[i]
        x},
        list(...))
    }
    
    microbenchmark::microbenchmark(
      coalesce(a, b, c),
      coalesce2(a, b, c)
    )
    
    # Unit: microseconds
    #                expr    min     lq     mean median      uq     max neval cld
    #   coalesce(a, b, c) 21.951 24.518 27.28264 25.515 26.9405 126.293   100   b
    #  coalesce2(a, b, c)  7.127  8.553  9.68731  9.123  9.6930  27.368   100  a 
    

    But on a larger dataset, it is comparable:

    aa <- sample(a, 100000, TRUE)
    bb <- sample(b, 100000, TRUE)
    cc <- sample(c, 100000, TRUE)
    
    microbenchmark::microbenchmark(
      coalesce(aa, bb, cc),
      coalesce2(aa, bb, cc))
    
    # Unit: milliseconds
    #                   expr      min       lq     mean   median       uq      max neval cld
    #   coalesce(aa, bb, cc) 1.708511 1.837368 5.468123 3.268492 3.511241 96.99766   100   a
    #  coalesce2(aa, bb, cc) 1.474171 1.516506 3.312153 1.957104 3.253240 91.05223   100   a
    

提交回复
热议问题