Consider this Python code:
import timeit
import re
def one():
any(s in mystring for s in (\'foo\', \'bar\', \'hello\'))
r = re.compile(\'(foo|bar|hello
The reason the regex is so slow is because it not only has to go through the whole string, but it has to several calculations at every character.
The first one simply does this:
Does f match h? No.
Does b match h? No.
Does h match h? Yes.
Does e match e? Yes.
Does l match l? Yes.
Does l match l? Yes.
Does o match o? Yes.
Done. Match found.
The second one does this:
Does f match g? No.
Does b match g? No.
Does h match g? No.
Does f match o? No.
Does b match o? No.
Does h match o? No.
Does f match o? No.
Does b match o? No.
Does h match o? No.
Does f match d? No.
Does b match d? No.
Does h match d? No.
Does f match b? No.
Does b match b? Yes.
Does a match y? No.
Does h match b? No.
Does f match y? No.
Does b match y? No.
Does h match y? No.
Does f match e? No.
Does b match e? No.
Does h match e? No.
... 999 more times ...
Done. No match found.
I can only speculate about the difference between the any
and regex, but I'm guessing the regex is slower mostly because it runs in a highly complex engine, and with state machine stuff and everything, it just isn't as efficient as a specific implementation (in
).
In the first string, the regex will find a match almost instantaneously, while any
has to loop through the string twice before finding anything.
In the second string, however, the any
performs essentially the same steps as the regex, but in a different order. This seems to point out that the any
solution is faster, probably because it is simpler.
Specific code is more efficient than generic code. Any knowledge about the problem can be put to use in optimizing the solution. Simple code is preferred over complex code. Essentially, the regex is faster when the pattern will be near the start of the string, but in
is faster when the pattern is near the end of the string, or not found at all.
Disclaimer: I don't know Python. I know algorithms.