I\'m trying to figure out an efficient way to go about splitting a string like
\"111110000011110000111000\"
into a vector
[1] \
Original Approach: Here is a stringi approach that incorporates rle()
.
x <- "111110000011110000111000"
library(stringi)
cs <- cumsum(
rle(stri_split_boundaries(x, type = "character")[[1L]])$lengths
)
stri_sub(x, c(1L, head(cs + 1L, -1L)), cs)
# [1] "11111" "00000" "1111" "0000" "111" "000"
Or, you can use the length
argument in stri_sub()
rl <- rle(stri_split_boundaries(x, type = "character")[[1L]])
with(rl, {
stri_sub(x, c(1L, head(cumsum(lengths) + 1L, -1L)), length = lengths)
})
# [1] "11111" "00000" "1111" "0000" "111" "000"
Updated for Efficiency: After realizing that base::strsplit()
is faster than stringi::stri_split_boundaries()
, here is a more efficient version of my previous answer using only base functions.
set.seed(24)
x3 <- stri_rand_strings(1L, 1e6L)
system.time({
cs <- cumsum(rle(strsplit(x3, NULL)[[1L]])[[1L]])
substring(x3, c(1L, head(cs + 1L, -1L)), cs)
})
# user system elapsed
# 0.686 0.012 0.697