Split string based on regex but keep delimiters

后端 未结 3 743
遇见更好的自我
遇见更好的自我 2021-01-24 20:26

I\'m trying to split a string using a variety of characters as delimiters and also keep those delimiters in their own array index. For example say I want to split the string:

相关标签:
3条回答
  • 2021-01-24 21:08

    Well, you can use lookaround to split at points between characters without consuming the delimiters:

    (?<=[()>*-;\s])|(?=[()>*-;\s])
    

    This will create a split point before and after each delimiter character. You might need to remove superfluous whitespace elements from the resulting array, though.

    Quick PowerShell test (| marks the split points):

    PS Home:\> 'if (x>1) return x * fact(x-1);' -split '(?<=[()>*-;\s])|(?=[()>*-;\s])' -join '|'
    if| |(|x|>|1|)| |return| |x| |*| |fact|(|x|-|1|)|;|
    
    0 讨论(0)
  • 2021-01-24 21:09

    How about this pattern?

    (\w+)|([\p{P}\p{S}])
    
    0 讨论(0)
  • 2021-01-24 21:12

    To answer your question, "Why?", it's because your entire expression is a lookahead assertion. As long as that assertion is true at each character (or maybe I should say "between"), it is able to split.

    Also, you cannot group within character classes, e.g. (<=) is not doing what you think it is doing.

    0 讨论(0)
提交回复
热议问题