How to implement coalesce efficiently in R

前端 未结 8 2458
深忆病人
深忆病人 2020-11-21 23:20

Background

Several SQL languages (I mostly use postgreSQL) have a function called coalesce which returns the first non null column element for each row. This can b

相关标签:
8条回答
  • 2020-11-22 00:13

    I have a ready-to-use implementation called coalesce.na in my misc package. It seems to be competitive, but not fastest. It will also work for vectors of different length, and has a special treatment for vectors of length one:

                        expr        min          lq      median          uq         max neval
        coalesce(aa, bb, cc) 990.060402 1030.708466 1067.000698 1083.301986 1280.734389    10
       coalesce1(aa, bb, cc)  11.356584   11.448455   11.804239   12.507659   14.922052    10
      coalesce1a(aa, bb, cc)   2.739395    2.786594    2.852942    3.312728    5.529927    10
       coalesce2(aa, bb, cc)   2.929364    3.041345    3.593424    3.868032    7.838552    10
     coalesce.na(aa, bb, cc)   4.640552    4.691107    4.858385    4.973895    5.676463    10
    

    Here's the code:

    coalesce.na <- function(x, ...) {
      x.len <- length(x)
      ly <- list(...)
      for (y in ly) {
        y.len <- length(y)
        if (y.len == 1) {
          x[is.na(x)] <- y
        } else {
          if (x.len %% y.len != 0)
            warning('object length is not a multiple of first object length')
          pos <- which(is.na(x))
          x[pos] <- y[(pos - 1) %% y.len + 1]
        }
      }
      x
    }
    

    Of course, as Kevin pointed out, an Rcpp solution might be faster by orders of magnitude.

    0 讨论(0)
  • 2020-11-22 00:13

    Here is my solution:

    coalesce <- function(x){ y <- head( x[is.na(x) == F] , 1) return(y) } It returns first vaule which is not NA and it works on data.table, for example if you want to use coalesce on few columns and these column names are in vector of strings:

    column_names <- c("col1", "col2", "col3")

    how to use:

    ranking[, coalesce_column := coalesce( mget(column_names) ), by = 1:nrow(ranking)]

    0 讨论(0)
提交回复
热议问题