Inline Modifiers in the re module
Python implements inline (embedded) modifiers, such as (?s)
, (?i)
or (?aiLmsux)
, but not as part of a non-capturing group modifier like you were trying to use.
(?smi:subpattern)
works in Perl and PCRE, but not in Python.
Moreover, using an inline modifier anywhere in the pattern applies to the whole match and it can't be turned off.
From regular-expressions.info:
In Python, putting a modifier in the middle of the regex affects the
whole regex. So in Python, (?i)caseless
and caseless(?i)
are both case
insensitive.
Example:
import re
text = "A\nB"
print("Text: '%s'\n---" % text)
patterns = [ "a", "a(?i)", "A.*B", "A(?s).*B", "A.*(?s)B"]
for p in patterns:
match = re.search( p, text)
print("Pattern: '%s' \tMatch: %s" % (p, match.span() if match else None))
Output:
Text: 'A
B'
---
Pattern: 'a' Match: None
Pattern: 'a(?i)' Match: (0, 1)
Pattern: 'A.*B' Match: None
Pattern: 'A(?s).*B' Match: (0, 3)
Pattern: 'A.*(?s)B' Match: (0, 3)
ideone Demo
Solution
(?s)
(aka singleline or re.DOTALL
) makes .
also match newlines. And since you're trying to set it to only a part of the pattern, there are 2 alternatives:
- Match anything except newlines:
Set (?s)
for the whole pattern (either passed as flag or inline), and use [^\n]*
instead of a dot, to match any characters except newlines.
- Match everything including newlines:
Use [\S\s]*
instead of a dot, to match any characters including newlines. The character class includes all whitespace and all that is not a whitespace (thus, all characters).
For the specific case you presented, you can use the following expression:
(?m)^DOCUMENTATION.*(\"{3}|'{3})\n-*\n?([\s\S]+?)^\1[\s\S]*
regex101 Demo
Note: This post covers inline modifiers in the re module, whereas Matthew Barnett's regex module does in fact implement inline modifiers (scoped flags) with the same behaviour observed in PCRE and Perl.