How to implement a verbose REGEX in Python

前端 未结 2 987
终归单人心
终归单人心 2020-12-30 06:03

I am trying to use a verbose regular expression in Python (2.7). If it matters I am just trying to make it easier to go back and more clearly understand the expression some

相关标签:
2条回答
  • 2020-12-30 06:37
    • It is a good habit to use raw string literals when defining regex patterns. A lot of regex patterns use backslashes, and using a raw string literal will allow you to write single backslashes instead of having to worry about whether or not Python will interpret your backslash to have a different meaning (and having to use two backslashes in those cases).

    • \b? is not valid regex. This is saying 0-or-1 word boundaries. But either you have a word boundary or you don't. If you have a word boundary, then you have 1 word boundary. If you don't have a word boundary then you have 0 word boundaries. So \b? would (if it were valid regex) be always true.

    • Regex makes a distinction between the end of a string and the end of a line. (A string may consist of multiple lines.)

      • \A matches only the start of a string.
      • \Z matches only the end of a string.
      • $ matches the end of a string, and also end of a line in re.MULTILINE mode.
      • ^ matches the start of a string, and also start of a line in re.MULTILINE mode.

    import re
    verbose_item_pattern = re.compile(r"""
        $            # end of line boundary
        \s{1,2}      # 1-or-2 whitespace character, including the newline
        I            # a capital I
        [tT][eE][mM] # one character from each of the three sets this allows for unknown case
        \s+          # 1-or-more whitespaces INCLUDING newline
        \d{1,2}      # 1-or-2 digits
        [.]?         # 0-or-1 literal .
        \(?          # 0-or-1 literal open paren
        [a-e]?       # 0-or-1 letter in the range a-e
        \)?          # 0-or-1 closing paren
        .*           # any number of unknown characters so we can have words and punctuation
        [^0-9]       # anything but [0-9]
        $            # end of line boundary
        """, re.VERBOSE|re.MULTILINE)
    
    x = verbose_item_pattern.search("""
     Item 1.0(a) foo bar
    """)
    
    print(x)
    

    yields

    <_sre.SRE_Match object at 0xb76dd020>
    

    (indicating there is a match)

    0 讨论(0)
  • 2020-12-30 06:54

    As say in the comment you should escape your backslash or use raw string even with triple quote.

    verbose_item_pattern = re.compile(r"""
    ...
    
    0 讨论(0)
提交回复
热议问题