Slice a string at consecutive indices with R / Rcpp?

后端 未结 2 1210
鱼传尺愫
鱼传尺愫 2021-01-05 18:55

I want to write a function that slices a \'string\' into a vector, sequentially, at a given index. I have a fairly adequate R solution for it; however, I figure that writing

相关标签:
2条回答
  • 2021-01-05 19:38

    This one-liner using strapplyc from the gsubfn package is fast enough that rcpp may not be needed. Here we apply it to the entire text of James Joyce's Ulysses which only takes a few seconds:

    library(gsubfn)
    joyce <- readLines("http://www.gutenberg.org/files/4300/4300-8.txt") 
    joycec <- paste(joyce, collapse = " ") # all in one string 
    n <- 2
    system.time(s <- strapplyc(joycec, paste(rep(".", n), collapse = ""))[[1]])
    
    0 讨论(0)
  • 2021-01-05 19:45

    I would use substring. Something like this:

    strslice <- function( x, n ){   
        starts <- seq( 1L, nchar(x), by = n )
        substring( x, starts, starts + n-1L )
    }
    strslice( "abcdef", 2 )
    # [1] "ab" "cd" "ef"
    

    About your Rcpp code, maybe you can allocate the std::vector<std::string> with the right size, so that you avoid resizing it which might mean memory allocations, ... or perhaps directly use a Rcpp::CharacterVector. Something like this:

    strslice_rcpp <- rcpp( signature(x="character", n="integer"), '
        std::string myString = as<std::string>(x);
        int cutpoint = as<int>(n);
        int len = myString.length();
        int nout = len / cutpoint ;
        CharacterVector out( nout ) ;
        for( int i=0; i<nout; i++ ) {
          out[i] = myString.substr( cutpoint*i, 2 ) ;
        }
        return out ;
    ')
    strslice_rcpp( "abdcefg", 2 )
    # [1] "ab" "cd" "ef"
    
    0 讨论(0)
提交回复
热议问题