Python re infinite execution

前端 未结 4 2140
日久生厌
日久生厌 2020-12-05 18:40

I\'m trying to execute this code :

import re
pattern = r\"(\\w+)\\*([\\w\\s]+)*/$\"
re_compiled = re.compile(pattern)
results = re_compiled.search(\'COPRO*H         


        
相关标签:
4条回答
  • 2020-12-05 19:22

    Your regex runs into catastrophic backtracking because you have nested quantifiers (([...]+)*). Since your regex requires the string to end in / (which fails on your example), the regex engine tries all permutations of the string in the vain hope to find a matching combination. That's where it gets stuck.

    To illustrate, let's assume "A*BCD" as the input to your regex and see what happens:

    1. (\w+) matches A. Good.
    2. \* matches *. Yay.
    3. [\w\s]+ matches BCD. OK.
    4. / fails to match (no characters left to match). OK, let's back up one character.
    5. / fails to match D. Hum. Let's back up some more.
    6. [\w\s]+ matches BC, and the repeated [\w\s]+ matches D.
    7. / fails to match. Back up.
    8. / fails to match D. Back up some more.
    9. [\w\s]+ matches B, and the repeated [\w\s]+ matches CD.
    10. / fails to match. Back up again.
    11. / fails to match D. Back up some more, again.
    12. How about [\w\s]+ matches B, repeated [\w\s]+ matches C, repeated [\w\s]+ matches D? No? Let's try something else.
    13. [\w\s]+ matches BC. Let's stop here and see what happens.
    14. Darn, / still doesn't match D.
    15. [\w\s]+ matches B.
    16. Still no luck. / doesn't match C.
    17. Hey, the whole group is optional (...)*.
    18. Nope, / still doesn't match B.
    19. OK, I give up.

    Now that was a string of just three letters. Yours had about 30, trying all permutations of which would keep your computer busy until the end of days.

    I suppose what you're trying to do is to get the strings before/after *, in which case, use

    pattern = r"(\w+)\*([\w\s]+)$"
    
    0 讨论(0)
  • 2020-12-05 19:23

    Try re2 or any other regular expression engine base on automata theory. The one in a current python re module is a simple and slow backtracking engine (for now, things may change in future). But automata based engines have some restriction, it wouldn't allow you to use backreferences for example. Collate with this re2 syntax page to find out will it satisfy your needs or not.

    0 讨论(0)
  • 2020-12-05 19:29

    Looks like it might be something in your pattern. I'm not sure what you are trying to do with the last '*' in your expression. The following code seems to work for me:

    import re
    
    pattern = r"(\w+)\*([\w\s]+)$"
    
    re_compiled = re.compile(pattern)
    
    results = re_compiled.search('COPRO*HORIZON 2000                 HOR')
    
    print(results.groups())
    
    0 讨论(0)
  • 2020-12-05 19:36

    Interestingly, Perl runs it very quickly

    -> perl -e 'print "Match\n" if "COPRO*HORIZON 2000                 HOR" =~ m|(\w+)\*([\w\s]+)*/$|'
    -> perl -e 'print "Match\n" if "COPRO*HORIZON 2000                 HOR/" =~ m|(\w+)\*([\w\s]+)*/$|'
    Match
    
    0 讨论(0)
提交回复
热议问题