Split string based on alternating character in R

后端 未结 9 429
醉话见心
醉话见心 2021-01-30 10:02

I\'m trying to figure out an efficient way to go about splitting a string like

\"111110000011110000111000\"

into a vector

[1] \         


        
9条回答
  •  失恋的感觉
    2021-01-30 10:54

    You could probably make use of substr or read.fwf along with rle (though it is unlikely to be as efficient as any regex-based solution):

    x <- "111110000011110000111000"
    unlist(read.fwf(textConnection(x), 
                    rle(strsplit(x, "")[[1]])$lengths, 
                    colClasses = "character"))
    #      V1      V2      V3      V4      V5      V6 
    # "11111" "00000"  "1111"  "0000"   "111"   "000"
    

    One advantage of this approach is that it would work even with, say:

    x <- paste(c(rep("a", 5), rep("b", 2), rep("c", 7),
                 rep("b", 3), rep("a", 1), rep("d", 1)), collapse = "")
    x
    # [1] "aaaaabbcccccccbbbad"
    
    unlist(read.fwf(textConnection(x), 
                    rle(strsplit(x, "")[[1]])$lengths, 
                    colClasses = "character"))
    #        V1        V2        V3        V4        V5        V6 
    #   "aaaaa"      "bb" "ccccccc"     "bbb"       "a"       "d" 
    

提交回复
热议问题