I\'ve been working in a project that manages big lists of words and pass them trough a lot of tests to validate or not each word of the list. The funny thing is that each ti
Actually the any() function is equal to following function :
def any(iterable):
for element in iterable:
if element:
return True
return False
which is like your second function, but since the any()
returns a boolean value by itself, you don't need to check for the result and then return a new value, So the difference of performance is because of that you are actually use a redundant return and if
conditions,also calling the any
inside another function.
So the advantage of any
here is that you don't need to wrap it with another function because it does all the things for you.
Also as @interjay mentioned in comment it seems that the most important reason which I missed is that you are passing a generator expression to any()
which doesn't provide the results at once and since it produce the result on demand it does an extra job.
Based on PEP 0289 -- Generator Expressions
The semantics of a generator expression are equivalent to creating an anonymous generator function and calling it. For example:
g = (x**2 for x in range(10))
print g.next()
is equivalent to:
def __gen(exp):
for x in exp:
yield x**2
g = __gen(iter(range(10)))
print g.next()
So as you can see each time that python want to access the next item it calls the iter
function and the next
method of a generator.And finally the result is that it's overkill to use any()
in such cases.
Since your true question is answered, I'll take a shot at the implied question:
You can get a free speed boost by just doing unallowed_combinations = sorted(set(unallowed_combinations))
, since it contains duplicates.
Given that, the fastest way I know of doing this is
valid3_re = re.compile("|".join(map(re.escape, unallowed_combinations)))
def combination_is_valid3(string):
return not valid3_re.search(string)
With CPython 3.5 I get, for some test data with a line length of 60 characters,
combination_is_valid ended in 3.3051061630249023 seconds
combination_is_valid2 ended in 2.216959238052368 seconds
combination_is_valid3 ended in 1.4767844676971436 seconds
where the third is the regex version, and on PyPy3 I get
combination_is_valid ended in 2.2926249504089355 seconds
combination_is_valid2 ended in 2.0935239791870117 seconds
combination_is_valid3 ended in 0.14300894737243652 seconds
FWIW, this is competitive with Rust (a low-level language, like C++) and actually noticeably wins out on the regex side. Shorter strings favour PyPy over CPython a lot more (eg. 4x CPython for a line length of 10) since overhead is more important then.
Since only about a third of CPython's regex runtime is loop overhead, we conclude that PyPy's regex implementation is better optimized for this use-case. I'd recommend looking to see if there is a CPython regex implementation that makes this competitive with PyPy.