Sub-assign by reference on vector in R

后端 未结 2 937
闹比i
闹比i 2020-12-05 18:51

Can I use sub-assign by reference on atomic vectors somehow?
Of course without wrapping it in 1 column data.table to use :=.

librar         


        
相关标签:
2条回答
  • 2020-12-05 19:18

    In most recent R versions (3.1-3.1.2+ or so), assignment to a vector does not copy. You will not see that by running OP's code though, and the reason for that is the following. Because you reuse x and assign it to some other object, R is not notified that x is copied at that point, and has to assume that it won't be (in the particular case above, I think it'll be good to change it in data.table::data.table and notify R that a copy has been made, but that's a separate issue - data.frame suffers from same issue), and because of that it copies x on first use. If you change the order of the commands a bit, you'd see no difference:

    N <- 5e7
    x <- sample(letters, N, TRUE)
    upd_i <- sample(N, 1L, FALSE)
    # no copy here:
    system.time(x[upd_i] <- NA_character_)
    #   user  system elapsed 
    #      0       0       0 
    X <- data.table(x = x)
    system.time(X[upd_i, x := NA_character_])
    #   user  system elapsed 
    #      0       0       0 
    
    # but now R will copy:
    system.time(x[upd_i] <- NA_character_)
    #   user  system elapsed 
    #   0.28    0.08    0.36 
    

    (old answer, mostly left as a curiosity)

    You actually can use the data.table := operator to modify your vector in place (I think you need R version 3.1+ to avoid the copy in list):

    modify.vector = function (v, idx, value) setDT(list(v))[idx, V1 := value]
    
    v = 1:5
    address(v)
    #[1] "000000002CC7AC48"
    
    modify.vector(v, 4, 10)
    v
    #[1]  1  2  3 10  5
    
    address(v)
    #[1] "000000002CC7AC48"
    
    0 讨论(0)
  • 2020-12-05 19:26

    As suggested by @Frank, it's possible to do this using Rcpp. Here's a version including a macro inspired by Rcpp's dispatch.h which handles all atomic vector types:

    mod_vector.cpp

    #include <Rcpp.h>
    using namespace Rcpp;
    
    template <int RTYPE>
    Vector<RTYPE> mod_vector_impl(Vector<RTYPE> x, IntegerVector i, Vector<RTYPE> value) {
      if (i.size() != value.size()) {
        stop("i and value must have same length.");
      }
      for (int a = 0; a < i.size(); a++) {
        x[i[a] - 1] = value[a];
      }
      return x;
    }
    
    #define __MV_HANDLE_CASE__(__RTYPE__) case __RTYPE__ : return mod_vector_impl(Vector<__RTYPE__>(x), i, Vector<__RTYPE__>(value));
    
    // [[Rcpp::export]]
    SEXP mod_vector(SEXP x, IntegerVector i, SEXP value) {
      switch(TYPEOF(x)) {
        __MV_HANDLE_CASE__(INTSXP)
        __MV_HANDLE_CASE__(REALSXP)
        __MV_HANDLE_CASE__(RAWSXP)
        __MV_HANDLE_CASE__(LGLSXP)
        __MV_HANDLE_CASE__(CPLXSXP)
        __MV_HANDLE_CASE__(STRSXP)
        __MV_HANDLE_CASE__(VECSXP)
        __MV_HANDLE_CASE__(EXPRSXP)
      }
      stop("Not supported.");
      return x;
    }
    

    Example:

    x <- 1:20
    address(x)
    #[1] "0x564e7e8"
    mod_vector(x, 4:5, 12:13)
    # [1]  1  2  3 12 13  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
    address(x)
    #[1] "0x564e7e8"
    

    Comparison with base and data.table methods. It can be seen it's a lot faster:

    x <- 1:2e7
    microbenchmark::microbenchmark(mod_vector(x, 4:5, 12:13), x[4:5] <- 12:13, modify.vector(x, 4:5, 12:13))
    #Unit: microseconds
    #                         expr     min       lq        mean    median         uq
    #    mod_vector(x, 4:5, 12:13)   5.967   7.3480    15.05259     9.718    21.0135
    #              x[4:5] <- 12:13   2.953   5.3610 45722.61334 48122.996 52623.1505
    # modify.vector(x, 4:5, 12:13) 954.577 988.7785  1177.17925  1021.380  1361.1210
    #        max neval
    #     58.463   100
    # 126978.146   100
    #   1559.985   100
    
    0 讨论(0)
提交回复
热议问题