Why does python's re.search method hang?

后端 未结 1 578
遇见更好的自我
遇见更好的自我 2021-01-18 13:06

I\'m using python regex library to parse some strings and currently I found that my regex is either too complicated or the string I\'m searching is too long.

Here\'

1条回答
  •  天涯浪人
    2021-01-18 13:51

    The reason why the code execution hangs is catastrophic backtracking due to one obligatory and 1+ optional patterns (those that can match an empty string) inside a quantified group (\w+'?\s*)+ that allows a regex engine to test a lot of matching paths, so many that it takes too long to complete.

    I suggest unwrapping the problematic group in such a way that ' or \s become obligatory and wrap them in an optional group:

    (\w+(?:['\s]+\w+)*)\s*[-~]\s*(\$?\d+(?:\.\d+)?\$?)
    ^^^^^^^^^^^^^^^^^^^***
    

    See the regex demo

    Here, (\w+(?:['\s]+\w+)*) will match 1+ word chars, and then 0+ sequences of 1+ ' or whitespaces followed with 1+ word chars. This way, the pattern becomes linear and the regex engine fails the match quicker if a non-matching string occurs.

    The rest of the pattern:

    • \s*[-~]\s* - either - or ~ wrapped with 0+ whitespaces
    • (\$?\d+(?:\.\d+)?\$?) - Group 2 capturing
      • \$? - 1 or 0 $ symbols
      • \d+ - 1+ digits
      • (?:\.\d+)? - 1 or 0 zero sequences of:
        • \. - a dot
        • \d+ - 1+ digits
      • \$? - 1 or 0 $ symbols

    0 讨论(0)
提交回复
热议问题