Regular Expressions in Python unexpectedly slow

后端 未结 4 1739
迷失自我
迷失自我 2021-02-01 16:16

Consider this Python code:

import timeit
import re

def one():
        any(s in mystring for s in (\'foo\', \'bar\', \'hello\'))

r = re.compile(\'(foo|bar|hello         


        
4条回答
  •  旧时难觅i
    2021-02-01 16:25

    Note to future readers

    I think the correct answer is actually that Python's string handling algorithms are really optimized for this case, and the re module is actually a bit slower. What I've written below is true, but is probably not relevant to the simple regexps I have in the question.

    Original Answer

    Apparently this is not a random fluke - Python's re module really is slower. It looks like it uses a recursive backtracking approach when it fails to find a match, as opposed to building a DFA and simulating it.

    It uses the backtracking approach even when there are no back references in the regular expression!

    What this means is that in the worst case, Python regexs take exponential, and not linear, time!

    This is a very detailed paper describing the issue: http://swtch.com/~rsc/regexp/regexp1.html

    I think this graph near the end summarizes it succinctly: graph of performance of various regular expression implementations, time vs. string length

提交回复
热议问题