How to implement coalesce efficiently in R

前端未结

关注

 8  2490

Background

Several SQL languages (I mostly use postgreSQL) have a function called coalesce which returns the first non null column element for each row. This can b

相关标签:

8条回答

无人共我

2020-11-22 00:13

I have a ready-to-use implementation called coalesce.na in my misc package. It seems to be competitive, but not fastest. It will also work for vectors of different length, and has a special treatment for vectors of length one:

                    expr        min          lq      median          uq         max neval
    coalesce(aa, bb, cc) 990.060402 1030.708466 1067.000698 1083.301986 1280.734389    10
   coalesce1(aa, bb, cc)  11.356584   11.448455   11.804239   12.507659   14.922052    10
  coalesce1a(aa, bb, cc)   2.739395    2.786594    2.852942    3.312728    5.529927    10
   coalesce2(aa, bb, cc)   2.929364    3.041345    3.593424    3.868032    7.838552    10
 coalesce.na(aa, bb, cc)   4.640552    4.691107    4.858385    4.973895    5.676463    10

Here's the code:

coalesce.na <- function(x, ...) {
  x.len <- length(x)
  ly <- list(...)
  for (y in ly) {
    y.len <- length(y)
    if (y.len == 1) {
      x[is.na(x)] <- y
    } else {
      if (x.len %% y.len != 0)
        warning('object length is not a multiple of first object length')
      pos <- which(is.na(x))
      x[pos] <- y[(pos - 1) %% y.len + 1]
    }
  }
  x
}

Of course, as Kevin pointed out, an Rcpp solution might be faster by orders of magnitude.

0 讨论(0)

盖世英雄少女心

2020-11-22 00:13

Here is my solution:

coalesce <- function(x){ y <- head( x[is.na(x) == F] , 1) return(y) } It returns first vaule which is not NA and it works on data.table, for example if you want to use coalesce on few columns and these column names are in vector of strings:

column_names <- c("col1", "col2", "col3")

how to use:

ranking[, coalesce_column := coalesce( mget(column_names) ), by = 1:nrow(ranking)]

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2