Note: I\'m a Ruby developer trying to find my way in Python.
When I wanted to figure out why some scripts use mylist[:]
instead of list(mylist)
I can't comment on the ruby timing vs. the python timing. But I can comment on list
vs. slice
. Here's a quick inspection of the bytecode:
>>> import dis
>>> a = range(10)
>>> def func(a):
... return a[:]
...
>>> def func2(a):
... return list(a)
...
>>> dis.dis(func)
2 0 LOAD_FAST 0 (a)
3 SLICE+0
4 RETURN_VALUE
>>> dis.dis(func2)
2 0 LOAD_GLOBAL 0 (list)
3 LOAD_FAST 0 (a)
6 CALL_FUNCTION 1
9 RETURN_VALUE
Notice that list
requires a LOAD_GLOBAL
to find the function list
. Looking up globals (and calling functions) in python is relatively slow. This would explain why a[0:len(a)]
is also slower. Also remember that list
needs to be able to handle arbitrary iterators whereas slicing doesn't. This means that list
needs to allocate a new list, pack elements into that list as it iterates over the list and resize when necessary. There are a few things in here which are expensive -- resizing if necessary and iterating (effectively in python, not C). With the slicing method, you can calculate the size of the memory you'll need so can probably avoid resizing, and the iteration can be done completely in C (probably with a memcpy
or something.
disclaimer : I'm not a python dev, so I don't know how the internals of list()
are implemented for sure. I'm just speculating based what I know of the specification.
EDIT -- So I've looked at the source (with a little guidance from Martijn). The relevant code is in listobject.c. list
calls list_init
which then calls listextend
at line 799. That function has some checks to see if it can use a fast branch if the object is a list or a tuple (line 812). Finally, the heavy lifting is done starting at line 834:
src = PySequence_Fast_ITEMS(b);
dest = self->ob_item + m;
for (i = 0; i < n; i++) {
PyObject *o = src[i];
Py_INCREF(o);
dest[i] = o;
}
Compare that to the slice version which I think is defined in list_subscript
(line 2544). That calls list_slice
(line 2570) where the heavy lifting is done by the following loop (line 486):
src = a->ob_item + ilow;
dest = np->ob_item;
for (i = 0; i < len; i++) {
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
They're pretty much the same code, so it's not surprising that the performance is almost the same for large lists (where the overhead of the small stuff like unpacking slices, looking up global variables, etc becomes less important)
Here's how I would run the python tests (and the results for my Ubuntu system):
$ python -m timeit -s 'a=range(30)' 'list(a)'
1000000 loops, best of 3: 0.39 usec per loop
$ python -m timeit -s 'a=range(30)' 'a[:]'
10000000 loops, best of 3: 0.183 usec per loop
$ python -m timeit -s 'a=range(30)' 'a[0:len(a)]'
1000000 loops, best of 3: 0.254 usec per loop
Use the timeit module in python for testing timings.
from copy import *
a=range(1000)
def cop():
b=copy(a)
def func1():
b=list(a)
def slice():
b=a[:]
def slice_len():
b=a[0:len(a)]
if __name__=="__main__":
import timeit
print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")
print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")
print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")
print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")
Results:
copy(a) 3.98940896988
list(a) 2.54542589188
a[:] 1.96630120277 #winner
a[0:len(a)] 10.5431251526
It's surely the extra steps involved in a[0:len(a)]
are the reason for it's slowness.
Here's the byte code comparison of the two:
In [19]: dis.dis(func1)
2 0 LOAD_GLOBAL 0 (range)
3 LOAD_CONST 1 (100000)
6 CALL_FUNCTION 1
9 STORE_FAST 0 (a)
3 12 LOAD_FAST 0 (a)
15 SLICE+0
16 STORE_FAST 1 (b)
19 LOAD_CONST 0 (None)
22 RETURN_VALUE
In [20]: dis.dis(func2)
2 0 LOAD_GLOBAL 0 (range)
3 LOAD_CONST 1 (100000)
6 CALL_FUNCTION 1
9 STORE_FAST 0 (a)
3 12 LOAD_FAST 0 (a) #same up to here
15 LOAD_CONST 2 (0) #loads 0
18 LOAD_GLOBAL 1 (len) # loads the builtin len(),
# so it might take some lookup time
21 LOAD_FAST 0 (a)
24 CALL_FUNCTION 1
27 SLICE+3
28 STORE_FAST 1 (b)
31 LOAD_CONST 0 (None)
34 RETURN_VALUE