Consider this Python code:
import timeit
import re
def one():
any(s in mystring for s in (\'foo\', \'bar\', \'hello\'))
r = re.compile(\'(foo|bar|hello
I think the correct answer is actually that Python's string handling algorithms are really optimized for this case, and the re
module is actually a bit slower. What I've written below is true, but is probably not relevant to the simple regexps I have in the question.
Apparently this is not a random fluke - Python's re
module really is slower. It looks like it uses a recursive backtracking approach when it fails to find a match, as opposed to building a DFA and simulating it.
It uses the backtracking approach even when there are no back references in the regular expression!
What this means is that in the worst case, Python regexs take exponential, and not linear, time!
This is a very detailed paper describing the issue: http://swtch.com/~rsc/regexp/regexp1.html
I think this graph near the end summarizes it succinctly: