I want to construct a data frame in an Rcpp function, but when I get it, it doesn\'t really look like a data frame. I\'ve tried pushing vectors etc. but it leads to the same
I concur with joran. The output of a C function called from within R is a list of all its arguments, both "in" and "out", so each "column" of the dataframe could be represented in the C function call as an argument. Once the result of the C function call is in R, all that remains to be done is to extract those list elements using list indexing and give them the appropriate names.
Using the information from @baptiste's answer, this is what finally does give a well formed data frame:
RcppExport SEXP makeDataFrame(SEXP in) {
Rcpp::DataFrame dfin(in);
Rcpp::DataFrame dfout;
Rcpp::CharacterVector namevec;
std::string namestem = "Column Heading ";
for (int i=0;i<2;i++) {
dfout.push_back(dfin(i));
namevec.push_back(namestem+std::string(1,(char)(((int)'a') + i)));
}
dfout.attr("names") = namevec;
Rcpp::DataFrame x;
Rcpp::Language call("as.data.frame",dfout);
x = call.eval();
return x;
}
I think the point remains that this might be inefficient due to push_back (as suggested by @Dirk) and the second Language call evaluation. I looked up the rcpp unitTests, and haven't been able to come up with something better yet. Anybody have any ideas?
Update:
Using @Dirk's suggestions (thanks!), this seems to be a simpler, efficient solution:
RcppExport SEXP makeDataFrame(SEXP in) {
Rcpp::DataFrame dfin(in);
Rcpp::List myList(dfin.length());
Rcpp::CharacterVector namevec;
std::string namestem = "Column Heading ";
for (int i=0;i<dfin.length();i++) {
myList[i] = dfin(i); // adding vectors
namevec.push_back(namestem+std::string(1,(char)(((int)'a') + i))); // making up column names
}
myList.attr("names") = namevec;
Rcpp::DataFrame dfout(myList);
return dfout;
}
It seems Rcpp can return a proper data.frame, provided you supply the names explicitely. I'm not sure how to adapt this to your example with arbitrary names
mkdf <- '
Rcpp::DataFrame dfin(input);
Rcpp::DataFrame dfout;
for (int i=0;i<dfin.length();i++) {
dfout.push_back(dfin(i));
}
return Rcpp::DataFrame::create( Named("x")= dfout(1), Named("y") = dfout(2));
'
library(inline)
test <- cxxfunction( signature(input="data.frame"),
mkdf, plugin="Rcpp")
test(input=head(iris))
Briefly:
DataFrames are indeed just like lists with the added restriction of having to have a common length, so they are best constructed column by column.
The best way is often to look for our unit tests. Her inst/unitTests/runit.DataFrame.R
regroups tests for the DataFrame
class.
You also found the .push_back()
member function in Rcpp which we added for convenience and analogy with the STL. We do warn that it is not recommended: due to differences with the way R objects are constructed, we essentially always need to do full copies .push_back
is not very efficient.
Despite me answering here frequently, the rcpp-devel
list a better place for Rcpp questions.