Please see my regular expression pattern code:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import re
print \'Start\'
str1 = \'abcdefgasdsdfswossdfasdaef\'
m =
See Runaway Regular Expressions: Catastrophic Backtracking.
In brief, if there are extremely many combinations a substring can be split into the parts of the regex, the regex matcher may end up trying them all.
Constructs like (x+)+
and x+x+
practically guarantee this behaviour.
To detect and fix the problematic constructs, the following concept can be used:
At conceptual level, the presence of a problematic construct means that your regex is ambiguous - i.e. if you disregard greedy/lazy behaviour, there's no single "correct" split of some text into the parts of the regex (or, equivalently, a subexpression thereof). So, to avoid/fix the problems, you need to see and eliminate all ambiguities.
One way to do this is to
Just repost the answer and solution in comments from nhahtdh and Marc B:
([A-Za-z\-\s\:\.]+)+
--> [A-Za-z\-\s\:\.]+
Thanks so much to nhahtdh and Marc B!