I am trying to use a verbose regular expression in Python (2.7). If it matters I am just trying to make it easier to go back and more clearly understand the expression some
It is a good habit to use raw string literals when defining regex patterns. A lot of regex patterns use backslashes, and using a raw string literal will allow you to write single backslashes instead of having to worry about whether or not Python will interpret your backslash to have a different meaning (and having to use two backslashes in those cases).
\b?
is not valid regex. This is saying 0-or-1 word boundaries. But either you have a word boundary or you don't. If you have a word boundary, then you have 1 word boundary. If you don't have a word boundary then you have 0 word boundaries. So \b?
would (if it were valid regex) be always true.
Regex makes a distinction between the end of a string and the end of a line. (A string may consist of multiple lines.)
\A
matches only the start of a string.\Z
matches only the end of a string.$
matches the end of a string, and also end of a line in re.MULTILINE mode.^
matches the start of a string, and also start of a line in re.MULTILINE mode.import re
verbose_item_pattern = re.compile(r"""
$ # end of line boundary
\s{1,2} # 1-or-2 whitespace character, including the newline
I # a capital I
[tT][eE][mM] # one character from each of the three sets this allows for unknown case
\s+ # 1-or-more whitespaces INCLUDING newline
\d{1,2} # 1-or-2 digits
[.]? # 0-or-1 literal .
\(? # 0-or-1 literal open paren
[a-e]? # 0-or-1 letter in the range a-e
\)? # 0-or-1 closing paren
.* # any number of unknown characters so we can have words and punctuation
[^0-9] # anything but [0-9]
$ # end of line boundary
""", re.VERBOSE|re.MULTILINE)
x = verbose_item_pattern.search("""
Item 1.0(a) foo bar
""")
print(x)
yields
<_sre.SRE_Match object at 0xb76dd020>
(indicating there is a match)
As say in the comment you should escape your backslash or use raw string even with triple quote.
verbose_item_pattern = re.compile(r"""
...