matching any character including newlines in a Python regex subexpression, not globally

前端 未结 3 402
难免孤独
难免孤独 2020-12-01 15:47

I want to use re.MULTILINE but NOT re.DOTALL, so that I can have a regex that includes both an \"any character\" wildcard and the normal . wild

相关标签:
3条回答
  • 2020-12-01 16:00
    [^]
    

    In regex, brackets contains a list and/or range of possible values for one matching character. If that list is empty, I mean [], any character of string can't match it.

    Now, the caret in front of that list and/or range, negates those permitted values. So, in front of an empty list, any character (including newline) will match it.

    0 讨论(0)
  • 2020-12-01 16:06

    Match any character (including new line):

    Regular Expression: (Note the use of space ' ' is also there)

    [\S\n\t\v ]
    

    Example:

    import re
    
    text = 'abc def ###A quick brown fox.\nIt jumps over the lazy dog### ghi jkl'
    # We want to extract "A quick brown fox.\nIt jumps over the lazy dog"
    matches = re.findall('###[\S\n ]+###', text)
    print(matches[0])
    

    The 'matches[0]' will contain:
    'A quick brown fox.\nIt jumps over the lazy dog'

    Description of '\S' Python docs:

    \S Matches any character which is not a whitespace character.

    ( See: https://docs.python.org/3/library/re.html#regular-expression-syntax )

    0 讨论(0)
  • 2020-12-01 16:09

    To match a newline, or "any symbol" without re.S/re.DOTALL, you may use any of the following:

    [\s\S]
    [\w\W]
    [\d\D]
    

    The main idea is that the opposite shorthand classes inside a character class match any symbol there is in the input string.

    Comparing it to (.|\s) and other variations with alternation, the character class solution is much more efficient as it involves much less backtracking (when used with a * or + quantifier). Compare the small example: it takes (?:.|\n)+ 45 steps to complete, and it takes [\s\S]+ just 2 steps.

    0 讨论(0)
提交回复
热议问题