why is “any()” running slower than using loops?

前端未结

关注

 2  516

I\'ve been working in a project that manages big lists of words and pass them trough a lot of tests to validate or not each word of the list. The funny thing is that each ti

相关标签:

2条回答

野的像风

2021-01-04 10:50
Actually the any() function is equal to following function :
```
def any(iterable):
    for element in iterable:
        if element:
            return True
    return False
```
which is like your second function, but since the any() returns a boolean value by itself, you don't need to check for the result and then return a new value, So the difference of performance is because of that you are actually use a redundant return and if conditions,also calling the any inside another function.

So the advantage of any here is that you don't need to wrap it with another function because it does all the things for you.

Also as @interjay mentioned in comment it seems that the most important reason which I missed is that you are passing a generator expression to any() which doesn't provide the results at once and since it produce the result on demand it does an extra job.

Based on PEP 0289 -- Generator Expressions

The semantics of a generator expression are equivalent to creating an anonymous generator function and calling it. For example:
```
g = (x**2 for x in range(10))
print g.next()
```
is equivalent to:
```
def __gen(exp):
    for x in exp:
        yield x**2
g = __gen(iter(range(10)))
print g.next()
```
So as you can see each time that python want to access the next item it calls the iter function and the next method of a generator.And finally the result is that it's overkill to use any() in such cases.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2021-01-04 10:56
Since your true question is answered, I'll take a shot at the implied question:

You can get a free speed boost by just doing unallowed_combinations = sorted(set(unallowed_combinations)), since it contains duplicates.

Given that, the fastest way I know of doing this is
```
valid3_re = re.compile("|".join(map(re.escape, unallowed_combinations)))

def combination_is_valid3(string):
    return not valid3_re.search(string)
```
With CPython 3.5 I get, for some test data with a line length of 60 characters,
```
combination_is_valid ended in 3.3051061630249023 seconds
combination_is_valid2 ended in 2.216959238052368 seconds
combination_is_valid3 ended in 1.4767844676971436 seconds
```
where the third is the regex version, and on PyPy3 I get
```
combination_is_valid ended in 2.2926249504089355 seconds
combination_is_valid2 ended in 2.0935239791870117 seconds
combination_is_valid3 ended in 0.14300894737243652 seconds
```
FWIW, this is competitive with Rust (a low-level language, like C++) and actually noticeably wins out on the regex side. Shorter strings favour PyPy over CPython a lot more (eg. 4x CPython for a line length of 10) since overhead is more important then.

Since only about a third of CPython's regex runtime is loop overhead, we conclude that PyPy's regex implementation is better optimized for this use-case. I'd recommend looking to see if there is a CPython regex implementation that makes this competitive with PyPy.
0 讨论(0)
发布评论:

提交评论
- 加载中...