I want to write a function that slices a \'string\' into a vector, sequentially, at a given index. I have a fairly adequate R solution for it; however, I figure that writing
This one-liner using strapplyc
from the gsubfn package is fast enough that rcpp may not be needed. Here we apply it to the entire text of James Joyce's Ulysses which only takes a few seconds:
library(gsubfn)
joyce <- readLines("http://www.gutenberg.org/files/4300/4300-8.txt")
joycec <- paste(joyce, collapse = " ") # all in one string
n <- 2
system.time(s <- strapplyc(joycec, paste(rep(".", n), collapse = ""))[[1]])
I would use substring
. Something like this:
strslice <- function( x, n ){
starts <- seq( 1L, nchar(x), by = n )
substring( x, starts, starts + n-1L )
}
strslice( "abcdef", 2 )
# [1] "ab" "cd" "ef"
About your Rcpp
code, maybe you can allocate the std::vector<std::string>
with the right size, so that you avoid resizing it which might mean memory allocations, ... or perhaps directly use a Rcpp::CharacterVector
. Something like this:
strslice_rcpp <- rcpp( signature(x="character", n="integer"), '
std::string myString = as<std::string>(x);
int cutpoint = as<int>(n);
int len = myString.length();
int nout = len / cutpoint ;
CharacterVector out( nout ) ;
for( int i=0; i<nout; i++ ) {
out[i] = myString.substr( cutpoint*i, 2 ) ;
}
return out ;
')
strslice_rcpp( "abdcefg", 2 )
# [1] "ab" "cd" "ef"