问题
This does not give me an error nor an answer.
re.sub('\\.(\\W|\\.)*[o0](\\W|[o0])*', '*', '..........................................')
Why does it behave like so? Also, if I reduce the amount of 'periods', then it works.
Thank you.
回答1:
You've got catastrophic backtracking.
回答2:
You have no o
or 0
in your input string, yet your regular expression requires at least one of those characters to be there ([o0]
).
>>> re.compile('\\.(\\W|\\.)*[o0](\\W|[o0])*', re.DEBUG)
literal 46
max_repeat 0 65535
subpattern 1
branch
in
category category_not_word
or
literal 46
in
literal 111
literal 48
max_repeat 0 65535
subpattern 2
branch
in
category category_not_word
or
in
literal 111
literal 48
Update: Your regular expression is suffering from catastrophic backtracking; avoid the nested character-class-or-character-set combination in a group with a wildcard (the branch .. or
parts inside a max_repeat
listed above). You can put character classes inside a character set to avoid this.
Also note, that you can use the r''
raw string notation to avoid all the escaped backslashes.
The following works:
re.sub(r'\.[\W\.]*[o0][\Wo0]*', '*', '..........................................')
because it compiles to:
>>> re.compile(r'\.[\W\.]*[o0][\Wo0]*', re.DEBUG)
literal 46
max_repeat 0 65535
in
category category_not_word
literal 46
in
literal 111
literal 48
max_repeat 0 65535
in
category category_not_word
literal 111
literal 48
Note that now the branches are gone.
来源:https://stackoverflow.com/questions/12014991/python-regular-expression-why-does-this-not-work