It is my understanding that the range()
function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.
The other answers explained it well already, but I'd like to offer another experiment illustrating the nature of range objects:
>>> r = range(5)
>>> for i in r:
print(i, 2 in r, list(r))
0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]
As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.
It's all about a lazy approach to the evaluation and some extra optimization of range
.
Values in ranges don't need to be computed until real use, or even further due to extra optimization.
By the way, your integer is not such big, consider sys.maxsize
sys.maxsize in range(sys.maxsize)
is pretty fast
due to optimization - it's easy to compare given integer just with min and max of range.
but:
Decimal(sys.maxsize) in range(sys.maxsize)
is pretty slow.
(in this case, there is no optimization in range
, so if python receives unexpected Decimal, python will compare all numbers)
You should be aware of an implementation detail but should not be relied upon, because this may change in the future.
Use the source, Luke!
In CPython, range(...).__contains__
(a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we're using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:
start
and stop
, andFor example, 994
is in range(4, 1000, 2)
because:
4 <= 994 < 1000
, and(994 - 4) % 2 == 0
.The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:
static int
range_contains_long(rangeobject *r, PyObject *ob)
{
int cmp1, cmp2, cmp3;
PyObject *tmp1 = NULL;
PyObject *tmp2 = NULL;
PyObject *zero = NULL;
int result = -1;
zero = PyLong_FromLong(0);
if (zero == NULL) /* MemoryError in int(0) */
goto end;
/* Check if the value can possibly be in the range. */
cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
if (cmp1 == -1)
goto end;
if (cmp1 == 1) { /* positive steps: start <= ob < stop */
cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}
else { /* negative steps: stop < ob <= start */
cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
}
if (cmp2 == -1 || cmp3 == -1) /* TypeError */
goto end;
if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
result = 0;
goto end;
}
/* Check that the stride does not invalidate ob's membership. */
tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
end:
Py_XDECREF(tmp1);
Py_XDECREF(tmp2);
Py_XDECREF(zero);
return result;
}
static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);
return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
The "meat" of the idea is mentioned in the line:
/* result = ((int(ob) - start) % step) == 0 */
As a final note - look at the range_contains
function at the bottom of the code snippet. If the exact type check fails then we don't use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch
! You can check this behaviour in the interpreter (I'm using v3.5.0 here):
>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
... pass
...
>>> x_ = MyInt(x)
>>> x in r # calculates immediately :)
True
>>> x_ in r # iterates for ages.. :(
^\Quit (core dumped)
If you're wondering why this optimization was added to range.__contains__
, and why it wasn't added to xrange.__contains__
in 2.7:
First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__
. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because "xrange has behaved like this for such a long time that I don't see what it buys us to commit the patch this late." (2.7 was nearly out at that point.)
Meanwhile:
Originally, xrange
was a not-quite-sequence object. As the 3.1 docs say:
Range objects have very little behavior: they only support indexing, iteration, and the
len
function.
This wasn't quite true; an xrange
object actually supported a few other things that come automatically with indexing and len
,* including __contains__
(via linear search). But nobody thought it was worth making them full sequences at the time.
Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange
/range
claimed to implement collections.Sequence
, even though it still only handled the same "very little behavior". Nobody noticed that problem until issue 9213. The patch for that issue not only added index
and count
to 3.2's range
, it also re-worked the optimized __contains__
(which shares the same math with index
, and is directly used by count
).** This change went in for 3.2 as well, and was not backported to 2.x, because "it's a bugfix that adds new methods". (At this point, 2.7 was already past rc status.)
So, there were two chances to get this optimization backported to 2.7, but they were both rejected.
* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange
objects got a custom iterator.
** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False
. But Daniel Stutzbach's updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch
that pre-3.2 range.__contains__
was implicitly using when the optimization doesn't apply.
The object returned by range()
is actually a range
object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.
But it also implements the __contains__ interface which is actually what gets called when an object appears on the right hand side of the in
operator. The __contains__()
method returns a bool
of whether or not the item on the left-hand-side of the in
is in the object. Since range
objects know their bounds and stride, this is very easy to implement in O(1).