Regular expressions: Ensuring b doesn't come between a and c

前端 未结 4 1078
无人共我
无人共我 2020-11-22 15:26

Here\'s something I\'m trying to do with regular expressions, and I can\'t figure out how. I have a big file, and strings abc, 123 and xyz

相关标签:
4条回答
  • 2020-11-22 15:53

    You could use lookaround.

    /^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
    

    (I've not tested it.)

    0 讨论(0)
  • 2020-11-22 15:56

    When your left- and right-hand delimiters are single characters, it can be easily solved with negated character classes. So, if your match is between a and c and should not contain b (literally), you may use (demo)

    a[^abc]*c
    

    This is the same technique you use when you want to make sure there is a b in between the closest a and c (demo):

    a[^abc]*b[^ac]*c
    

    When your left- and right-hand delimiters are multi-character strings, you need a tempered greedy token:

    abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
    

    See the regex demo

    To make sure it matches across lines, use re.DOTALL flag when compiling the regex.

    Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.

    Pattern details:

    • abc - match abc
    • (?:(?!abc|xyz|123).)* - match any character that is not the starting point for a abc, xyz or 123 character sequences
    • 123 - a literal string 123
    • (?:(?!abc|xyz).)* - any character that is not the starting point for a abc or xyz character sequences
    • xyz - a trailing substring xyz

    See the diagram below (if re.S is used, . will mean AnyChar):

    See the Python demo:

    import re
    p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
    s = "abc 123 xyz\nabc abc 123 xyz\nabc text 123 xyz\nabc text xyz xyz"
    print(p.findall(s))
    // => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
    
    0 讨论(0)
  • 2020-11-22 16:06

    Using PCRE a solution would be:

    This using m flag. If you want to check only from start and end of a line add ^ and $ at beginning and end respectively

    abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
    

    Regular expression visualization

    Debuggex Demo

    0 讨论(0)
  • 2020-11-22 16:16

    The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:

    where val like 'abc%123%xyz' and
          val not like 'abc%abc%' and
          val not like '%xyz%xyz'
    

    I imagine something quite similar is simple to do in other environments.

    0 讨论(0)
提交回复
热议问题