Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?

前端 未结 11 1373
梦毁少年i
梦毁少年i 2020-11-22 03:46

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

11条回答
  •  清酒与你
    2020-11-22 04:13

    The fundamental misunderstanding here is in thinking that range is a generator. It's not. In fact, it's not any kind of iterator.

    You can tell this pretty easily:

    >>> a = range(5)
    >>> print(list(a))
    [0, 1, 2, 3, 4]
    >>> print(list(a))
    [0, 1, 2, 3, 4]
    

    If it were a generator, iterating it once would exhaust it:

    >>> b = my_crappy_range(5)
    >>> print(list(b))
    [0, 1, 2, 3, 4]
    >>> print(list(b))
    []
    

    What range actually is, is a sequence, just like a list. You can even test this:

    >>> import collections.abc
    >>> isinstance(a, collections.abc.Sequence)
    True
    

    This means it has to follow all the rules of being a sequence:

    >>> a[3]         # indexable
    3
    >>> len(a)       # sized
    5
    >>> 3 in a       # membership
    True
    >>> reversed(a)  # reversible
    
    >>> a.index(3)   # implements 'index'
    3
    >>> a.count(3)   # implements 'count'
    1
    

    The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn't remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.

    (As a side note, if you print(iter(a)), you'll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn't use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)


    Now, there's nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn't. But there's nothing that says it can't be. And it's easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn't it do it the better way?

    But there doesn't seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it's an obvious good idea and easy to do, there's no reason that IronPython or NewKickAssPython 3.x couldn't leave it out. (And in fact CPython 3.0-3.1 didn't include it.)


    If range actually were a generator, like my_crappy_range, then it wouldn't make sense to test __contains__ this way, or at least the way it makes sense wouldn't be obvious. If you'd already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?

提交回复
热议问题