why is “any()” running slower than using loops?

前端 未结 2 516
再見小時候
再見小時候 2021-01-04 10:25

I\'ve been working in a project that manages big lists of words and pass them trough a lot of tests to validate or not each word of the list. The funny thing is that each ti

相关标签:
2条回答
  • 2021-01-04 10:50

    Actually the any() function is equal to following function :

    def any(iterable):
        for element in iterable:
            if element:
                return True
        return False
    

    which is like your second function, but since the any() returns a boolean value by itself, you don't need to check for the result and then return a new value, So the difference of performance is because of that you are actually use a redundant return and if conditions,also calling the any inside another function.

    So the advantage of any here is that you don't need to wrap it with another function because it does all the things for you.

    Also as @interjay mentioned in comment it seems that the most important reason which I missed is that you are passing a generator expression to any() which doesn't provide the results at once and since it produce the result on demand it does an extra job.

    Based on PEP 0289 -- Generator Expressions

    The semantics of a generator expression are equivalent to creating an anonymous generator function and calling it. For example:

    g = (x**2 for x in range(10))
    print g.next()
    

    is equivalent to:

    def __gen(exp):
        for x in exp:
            yield x**2
    g = __gen(iter(range(10)))
    print g.next()
    

    So as you can see each time that python want to access the next item it calls the iter function and the next method of a generator.And finally the result is that it's overkill to use any() in such cases.

    0 讨论(0)
  • 2021-01-04 10:56

    Since your true question is answered, I'll take a shot at the implied question:

    You can get a free speed boost by just doing unallowed_combinations = sorted(set(unallowed_combinations)), since it contains duplicates.

    Given that, the fastest way I know of doing this is

    valid3_re = re.compile("|".join(map(re.escape, unallowed_combinations)))
    
    def combination_is_valid3(string):
        return not valid3_re.search(string)
    

    With CPython 3.5 I get, for some test data with a line length of 60 characters,

    combination_is_valid ended in 3.3051061630249023 seconds
    combination_is_valid2 ended in 2.216959238052368 seconds
    combination_is_valid3 ended in 1.4767844676971436 seconds
    

    where the third is the regex version, and on PyPy3 I get

    combination_is_valid ended in 2.2926249504089355 seconds
    combination_is_valid2 ended in 2.0935239791870117 seconds
    combination_is_valid3 ended in 0.14300894737243652 seconds
    

    FWIW, this is competitive with Rust (a low-level language, like C++) and actually noticeably wins out on the regex side. Shorter strings favour PyPy over CPython a lot more (eg. 4x CPython for a line length of 10) since overhead is more important then.

    Since only about a third of CPython's regex runtime is loop overhead, we conclude that PyPy's regex implementation is better optimized for this use-case. I'd recommend looking to see if there is a CPython regex implementation that makes this competitive with PyPy.

    0 讨论(0)
提交回复
热议问题