Why my Python regular expression pattern run so slowly?

前端 未结 2 668
醉梦人生
醉梦人生 2021-01-22 08:50

Please see my regular expression pattern code:

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import re

print \'Start\'
str1 = \'abcdefgasdsdfswossdfasdaef\'
m =         


        
相关标签:
2条回答
  • 2021-01-22 09:21

    See Runaway Regular Expressions: Catastrophic Backtracking.

    In brief, if there are extremely many combinations a substring can be split into the parts of the regex, the regex matcher may end up trying them all.

    Constructs like (x+)+ and x+x+ practically guarantee this behaviour.

    To detect and fix the problematic constructs, the following concept can be used:

    • At conceptual level, the presence of a problematic construct means that your regex is ambiguous - i.e. if you disregard greedy/lazy behaviour, there's no single "correct" split of some text into the parts of the regex (or, equivalently, a subexpression thereof). So, to avoid/fix the problems, you need to see and eliminate all ambiguities.

      • One way to do this is to

        • always split the text into its meaningful parts (=parts that have separate meanings for the task at hand), and
        • define the parts in such a way that they cannot be confused (=using the same characteristics that you yourself would use to tell which is which if you were parsing it by hand)
    0 讨论(0)
  • 2021-01-22 09:25

    Just repost the answer and solution in comments from nhahtdh and Marc B:

    ([A-Za-z\-\s\:\.]+)+ --> [A-Za-z\-\s\:\.]+

    Thanks so much to nhahtdh and Marc B!

    0 讨论(0)
提交回复
热议问题