Identifying substrings based on complex rules

前端 未结 3 798
野趣味
野趣味 2021-01-20 02:13

Assume I have text strings that look something like this:

A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I2-I1-I1-I3-I3

Here I want to identify sequ

相关标签:
3条回答
  • 2021-01-20 02:25

    Try the following expression: (.*?)(?:I[0-9]-)*I3(?:-I[0-9])*. See the match groups: https://regex101.com/r/yA6aV9/1

    0 讨论(0)
  • 2021-01-20 02:26

    Use strsplit

    > x <- "A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I2-I1-I1-I3-I3"
    > strsplit(x, "(?:-?I\\d+)*-?\\bI3-?(?:I\\d+-?)*")
    [[1]]
    [1] "A-B-C-I1-I2-D-E-F" "D-D-D-D"
    
    > strsplit("A-B-I3-C-I3", "(?:-?I\\d+)*-?\\bI3\\b-?(?:I\\d+-?)*")
    [[1]]
    [1] "A-B" "C" 
    

    or

    > strsplit("A-B-I3-C-I3", "(?:-?I\\d+)*-?\\bI3\\b-?(?:I3-?)*")
    [[1]]
    [1] "A-B" "C"
    
    0 讨论(0)
  • 2021-01-20 02:27

    You can identify the sequences which contains I3 with following regex :

    (?:I\\d-?)*I3(?:-?I\\d)*
    

    So you can split your text with this regex to get the desire result.

    See demo https://regex101.com/r/bJ3iA3/4

    0 讨论(0)
提交回复
热议问题