Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this q
You don't need Rcpp
and RcppArmadillo
for that, you can just use R's subset
or perhaps dplyr::filter
. This is likely to be more efficient than your code that has to deep copy data from the data frame into armadillo vectors, create new armadillo vectors, and then copy these back into R vectors so that you can build the data frame. This produces lots of waste. Another source of waste is that you find
three times the same exact thing
Anyway, to answer your question, just use DataFrame::create
.
DataFrame::create( _["id"] = id_sub, _["alpha"] = alph_dub, _["mess"] = mess_sub ) ;
Also, note that in your code, alpha
will be a factor, so arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]);
is not likely to do what you want.
Here is a complete test file. It does not need your extractor function and just re-assembles the subsets -- but for that it needs the very newest Rcpp as currently on GitHub where Kevin happens to have added some work on subset indexing which is just what we need here:
#include <Rcpp.h>
/*** R
## Suppose I have the data frame below created in R:
## NB: stringsAsFactors set to FALSE
## NB: setting seed as well
set.seed(42)
myDF <- data.frame(id = rep(c(1,2), each = 5),
alph = letters[1:10],
mess = rnorm(10),
stringsAsFactor=FALSE)
*/
// [[Rcpp::export]]
Rcpp::DataFrame extract(Rcpp::DataFrame D, Rcpp::IntegerVector idx) {
Rcpp::IntegerVector id = D["id"];
Rcpp::CharacterVector alph = D["alph"];
Rcpp::NumericVector mess = D["mess"];
return Rcpp::DataFrame::create(Rcpp::Named("id") = id[idx],
Rcpp::Named("alpha") = alph[idx],
Rcpp::Named("mess") = mess[idx]);
}
/*** R
extract(myDF, c(2,4,6,8))
*/
With that file, we get the expected result:
R> library(Rcpp)
R> sourceCpp("/tmp/sepher.cpp")
R> ## Suppose I have the data frame below created in R:
R> ## NB: stringsAsFactors set to FALSE
R> ## NB: setting seed as well
R> set.seed(42)
R> myDF <- data.frame(id = rep(c(1,2), each = 5),
+ alph = letters[1:10],
+ mess = rnorm(10),
+ .... [TRUNCATED]
R> extract(myDF, c(2,4,6,8))
id alpha mess
1 1 c 0.363128
2 1 e 0.404268
3 2 g 1.511522
4 2 i 2.018424
R>
R> packageDescription("Rcpp")$Version ## unreleased version
[1] "0.11.1.1"
R>
I just needed something similar a few weeks ago (but not involving character vectors) and used Armadillo with its elem()
functions using an unsigned int
vector as index.
To add on to Romain's answer, you can try calling the [
operator through Rcpp. If we understand how df[x, ]
is evaluated (ie, it's really a call to "[.data.frame"(df, x, R_MissingArg)
this is easy to do:
#include <Rcpp.h>
using namespace Rcpp;
Function subset("[.data.frame");
// [[Rcpp::export]]
DataFrame subset_test(DataFrame x, IntegerVector y) {
return subset(x, y, R_MissingArg);
}
/*** R
df <- data.frame(x=1:3, y=letters[1:3])
subset_test(df, c(1L, 2L))
*/
gives me
> df <- data.frame(x=1:3, y=letters[1:3])
> subset_test(df, c(1L, 2L))
x y
1 1 a
2 2 b
Callbacks to R can generally be slower in Rcpp, but depending on how much of a bottleneck this is it could still be fast enough for you.
Be careful though, as this function will use 1-based subsetting rather than 0-based subsetting for integer vectors.