The issue of the performance of the python random module, and in particular, random.sample
and random.shuffle
came up in this question. On my computer,
----------- What Changed -----------------------------------------------
Several things happened:
Integers became less efficient in the int/long unification. That is also why integers are 28 bytes wide now instead of 24 bytes on 64-bit Linux/MacOS builds.
Shuffle became more accurate by using _randbelow
. This eliminated a subtle bias in the previous algorithm.
Indexing became slower because the special case for integer indices was removed from ceval.c primarily because it was harder to do with the newer integers and because a couple of the core devs didn't think the optimization was worth it.
The range() function was replaced with xrange(). This is relevant because the OP's timings both use range() in the inner-loop.
The algorithms for shuffle() and sample() were otherwise unchanged.
Python 3 made a number of changes like unicode-everywhere that made the internals more complex, a little slower, and more memory intensive. In return, Python 3 makes it easier for users to write correct code.
Likewise, int/long unification made the language simpler but at a cost of speed and space. The switch to using _randbelow()
in the random module had a runtime cost but benefited in terms of accuracy and correctness.
----------- Conclusion --------------------------------------------------
In short, Python 3 is better in some ways that matter to many users and worse in some ways that people rarely notice. Engineering is often about trade-offs.
----------- Details ---------------------------------------------------------
Python2.7 code for shuffle():
def shuffle(self, x, random=None):
if random is None:
random = self.random
_int = int
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = _int(random() * (i+1))
x[i], x[j] = x[j], x[i]
Python3.6 code for shuffle():
def shuffle(self, x, random=None):
if random is None:
randbelow = self._randbelow
for i in reversed(range(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = randbelow(i+1) # <-- This part changed
x[i], x[j] = x[j], x[i]
else:
_int = int
for i in reversed(range(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = _int(random() * (i+1))
x[i], x[j] = x[j], x[i]
Python 2.7 integer size:
>>> import sys
>>> sys.getsizeof(1)
24
Python 3.6 integer size:
>>> import sys
>>> sys.getsizeof(1)
28
Relative speed of indexed lookups (binary subscriptions with integer arguments indexing into a list):
$ python2.7 -m timeit -s 'a=[0]' 'a[0]'
10000000 loops, best of 3: 0.0253 usec per loop
$ python3.6 -m timeit -s 'a=[0]' 'a[0]'
10000000 loops, best of 3: 0.0313 usec per loop
Python 2.7 code in ceval.c with an optimization for indexed lookups:
TARGET_NOARG(BINARY_SUBSCR)
{
w = POP();
v = TOP();
if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {
/* INLINE: list[int] */
Py_ssize_t i = PyInt_AsSsize_t(w);
if (i < 0)
i += PyList_GET_SIZE(v);
if (i >= 0 && i < PyList_GET_SIZE(v)) {
x = PyList_GET_ITEM(v, i);
Py_INCREF(x);
}
else
goto slow_get;
}
else
slow_get:
x = PyObject_GetItem(v, w);
Py_DECREF(v);
Py_DECREF(w);
SET_TOP(x);
if (x != NULL) DISPATCH();
break;
}
Python 3.6 code in ceval.c without the optimization for indexed lookups:
TARGET(BINARY_SUBSCR) {
PyObject *sub = POP();
PyObject *container = TOP();
PyObject *res = PyObject_GetItem(container, sub);
Py_DECREF(container);
Py_DECREF(sub);
SET_TOP(res);
if (res == NULL)
goto error;
DISPATCH();
}