问题
When determining whether an instance of substring exists in a larger string,
I am considering two options:
(1)
if "aaaa" in "bbbaaaaaabbb":
dosomething()
(2)
pattern = re.compile("aaaa")
if pattern.search("bbbaaaaaabbb"):
dosomething()
Which of the two are more efficient & faster (considering the size of the string is huge)??
Is there any other option that is faster??
Thanks
回答1:
Option (1) definitely is faster. For the future, do something like this to test it:
>>> import time, re
>>> if True:
... s = time.time()
... "aaaa" in "bbbaaaaaabbb"
... print time.time()-s
...
True
1.78813934326e-05
>>> if True:
... s = time.time()
... pattern = re.compile("aaaa")
... pattern.search("bbbaaaaaabbb")
... print time.time()-s
...
<_sre.SRE_Match object at 0xb74a91e0>
0.0143280029297
gnibbler's way of doing this is better, I never really played around with interpreter options so I didn't know about that one.
回答2:
Regex will be slower.
$ python -m timeit '"aaaa" in "bbbaaaaaabbb"'
10000000 loops, best of 3: 0.0767 usec per loop
$ python -m timeit -s 'import re; pattern = re.compile("aaaa")' 'pattern.search("bbbaaaaaabbb")'
1000000 loops, best of 3: 0.356 usec per loop
回答3:
I happen to have the E.coli genome at hand, so I tested the two options... Looking for "AAAA" in the E.coli genome 10,000,000 times (just to have decent times) with option (1) takes about 3.7 seconds. With option (2), of course with pattern = re.compile("AAAA") out of the loop, it took about 8.4 seconds. "dosomething()" in my case was adding 1 to an arbitrary variable. The E. coli genome I used is 4639675 nucleotides (letters) long.
来源:https://stackoverflow.com/questions/19911508/python-speed-for-in-vs-regular-expression