r split on delimiter not in parentheses

前端 未结 3 1678
慢半拍i
慢半拍i 2020-12-11 21:35

I am currently trying to split a string on the pipe delimiter: 

999|150|222|(123|145)|456|12,260|(10|10000)

The catch is I don\'t want to s

相关标签:
3条回答
  • 2020-12-11 21:56

    You can switch on PCRE by using perl=T and some dark magic:

    x <- '999|150|222|(123|145)|456|12,260|(10|10000)'
    strsplit(x, '\\([^)]*\\)(*SKIP)(*F)|\\|', perl=T)
    
    # [[1]]
    # [1] "999"        "150"        "222"        "(123|145)"  "456"       
    # [6] "12,260"     "(10|10000)"
    

    The idea is to skip content in parentheses. Live demo

    On the left side of the alternation operator we match anything in parentheses making the subpattern fail and force the regular expression engine to not retry the substring using backtracking control. The right side of the alternation operator matches | (outside of parentheses, what we want...)

    0 讨论(0)
  • 2020-12-11 22:06

    This seems to work

    x <- '999|150|222|(123|145)|456|12,260|(10|10000)'
    m <- strsplit(x, '\\|(?=[^)]+(\\||$))', perl=T)
    
    # [[1]]
    # [1] "999"        "150"        "222"        "(123|145)"  "456"        "12,260"    
    # [7] "(10|10000)"
    

    Here we not just split on the | but we also use a look ahead to make sure that there are no ")" marks before the next | or the end of the string. Note that this method doesn't require or ensure the parenthesis are balanced and closed. We assume your input is well formatted.

    0 讨论(0)
  • 2020-12-11 22:08

    One option:

    scan(text=gsub("\\(|\\)", "'", x), what='', sep="|")
    #[1] "999"      "150"      "222"      "123|145"  "456"      "12,260"   "10|10000"
    

    Here's another way using strsplit. There are other answers here using strsplit, but this seems to be the simplest pattern that works:

    strsplit(x, "\\|(?!\\d+\\))", perl=TRUE)
    # [1] "999"        "150"        "222"        "(123|145)"  "456"        "12,260"     "(10|10000)"
    
    0 讨论(0)
提交回复
热议问题