Rcpp function to select (and to return) a sub-dataframe

后端 未结 3 1663
北海茫月
北海茫月 2020-12-03 12:56

Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this q

相关标签:
3条回答
  • 2020-12-03 13:11

    You don't need Rcpp and RcppArmadillo for that, you can just use R's subset or perhaps dplyr::filter. This is likely to be more efficient than your code that has to deep copy data from the data frame into armadillo vectors, create new armadillo vectors, and then copy these back into R vectors so that you can build the data frame. This produces lots of waste. Another source of waste is that you find three times the same exact thing

    Anyway, to answer your question, just use DataFrame::create.

    DataFrame::create( _["id"] = id_sub, _["alpha"] = alph_dub, _["mess"] = mess_sub ) ;
    

    Also, note that in your code, alpha will be a factor, so arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]); is not likely to do what you want.

    0 讨论(0)
  • 2020-12-03 13:17

    Here is a complete test file. It does not need your extractor function and just re-assembles the subsets -- but for that it needs the very newest Rcpp as currently on GitHub where Kevin happens to have added some work on subset indexing which is just what we need here:

    #include <Rcpp.h>
    
    /*** R
    ##  Suppose I have the data frame below created in R:
    ##  NB: stringsAsFactors set to FALSE
    ##  NB: setting seed as well
    set.seed(42)
    myDF <- data.frame(id = rep(c(1,2), each = 5), 
                       alph = letters[1:10], 
                       mess = rnorm(10), 
                       stringsAsFactor=FALSE)
    */
    
    // [[Rcpp::export]]
    Rcpp::DataFrame extract(Rcpp::DataFrame D, Rcpp::IntegerVector idx) {
    
      Rcpp::IntegerVector     id = D["id"];
      Rcpp::CharacterVector alph = D["alph"];
      Rcpp::NumericVector   mess = D["mess"];
    
      return Rcpp::DataFrame::create(Rcpp::Named("id")    = id[idx],
                                     Rcpp::Named("alpha") = alph[idx],
                                     Rcpp::Named("mess")  = mess[idx]);
    }
    
    /*** R
    extract(myDF, c(2,4,6,8))
    */
    

    With that file, we get the expected result:

    R> library(Rcpp)
    R> sourceCpp("/tmp/sepher.cpp")
    
    R> ##  Suppose I have the data frame below created in R:
    R> ##  NB: stringsAsFactors set to FALSE
    R> ##  NB: setting seed as well
    R> set.seed(42)
    
    R> myDF <- data.frame(id = rep(c(1,2), each = 5), 
    +                    alph = letters[1:10], 
    +                    mess = rnorm(10), 
    +               .... [TRUNCATED] 
    
    R> extract(myDF, c(2,4,6,8))
      id alpha     mess
    1  1     c 0.363128
    2  1     e 0.404268
    3  2     g 1.511522
    4  2     i 2.018424
    R>
    R> packageDescription("Rcpp")$Version   ## unreleased version
    [1] "0.11.1.1"
    R> 
    

    I just needed something similar a few weeks ago (but not involving character vectors) and used Armadillo with its elem() functions using an unsigned int vector as index.

    0 讨论(0)
  • 2020-12-03 13:32

    To add on to Romain's answer, you can try calling the [ operator through Rcpp. If we understand how df[x, ] is evaluated (ie, it's really a call to "[.data.frame"(df, x, R_MissingArg) this is easy to do:

    #include <Rcpp.h>
    using namespace Rcpp;
    
    Function subset("[.data.frame");
    
    // [[Rcpp::export]]
    DataFrame subset_test(DataFrame x, IntegerVector y) {
      return subset(x, y, R_MissingArg);
    }
    
    /*** R
    df <- data.frame(x=1:3, y=letters[1:3])
    subset_test(df, c(1L, 2L))
    */
    

    gives me

    > df <- data.frame(x=1:3, y=letters[1:3])
    > subset_test(df, c(1L, 2L))
      x y
    1 1 a
    2 2 b
    

    Callbacks to R can generally be slower in Rcpp, but depending on how much of a bottleneck this is it could still be fast enough for you.

    Be careful though, as this function will use 1-based subsetting rather than 0-based subsetting for integer vectors.

    0 讨论(0)
提交回复
热议问题