I\'m using python regex library to parse some strings and currently I found that my regex is either too complicated or the string I\'m searching is too long.
Here\'
The reason why the code execution hangs is catastrophic backtracking due to one obligatory and 1+ optional patterns (those that can match an empty string) inside a quantified group (\w+'?\s*)+
that allows a regex engine to test a lot of matching paths, so many that it takes too long to complete.
I suggest unwrapping the problematic group in such a way that '
or \s
become obligatory and wrap them in an optional group:
(\w+(?:['\s]+\w+)*)\s*[-~]\s*(\$?\d+(?:\.\d+)?\$?)
^^^^^^^^^^^^^^^^^^^***
See the regex demo
Here, (\w+(?:['\s]+\w+)*)
will match 1+ word chars, and then 0+ sequences of 1+ '
or whitespaces followed with 1+ word chars. This way, the pattern becomes linear and the regex engine fails the match quicker if a non-matching string occurs.
The rest of the pattern:
\s*[-~]\s*
- either -
or ~
wrapped with 0+ whitespaces(\$?\d+(?:\.\d+)?\$?)
- Group 2 capturing
\$?
- 1 or 0 $
symbols\d+
- 1+ digits(?:\.\d+)?
- 1 or 0 zero sequences of:
\.
- a dot\d+
- 1+ digits\$?
- 1 or 0 $
symbols