What is the difference between applying list()
on a numpy
array vs. calling tolist()
?
I was checking the types of both output
Your example already shows the difference; consider the following 2D array:
>>> import numpy as np
>>> a = np.arange(4).reshape(2, 2)
>>> a
array([[0, 1],
[2, 3]])
>>> a.tolist()
[[0, 1], [2, 3]] # nested vanilla lists
>>> list(a)
[array([0, 1]), array([2, 3])] # list of arrays
tolist handles the full conversion to nested vanilla lists (i.e. list
of list
of int
), whereas list
just iterates over the first dimension of the array, creating a list of arrays (list
of np.array
of np.int64
). Although both are lists:
>>> type(list(a))
<type 'list'>
>>> type(a.tolist())
<type 'list'>
the elements of each list have a different type:
>>> type(list(a)[0])
<type 'numpy.ndarray'>
>>> type(a.tolist()[0])
<type 'list'>
The other difference, as you note, is that list
will work on any iterable, whereas tolist
can only be called on objects that specifically implement that method.
.tolist()
appears to convert all of the values recursively to python primitives (list
), whereas list
creates a python list from an iterable. Since the numpy array is an array of arrays
, list(...)
creates a list
of array
s
You can think of list
as a function that looks like this:
# Not the actually implementation, just for demo purposes
def list(iterable):
newlist = []
for obj in iter(iterable):
newlist.append(obj)
return newlist
The major difference is that tolist
recursively converts all data to python standard library types.
For instance:
>>> arr = numpy.arange(2)
>>> [type(item) for item in list(arr)]
[numpy.int64, numpy.int64]
>>> [type(item) for item in arr.tolist()]
[builtins.int, builtins.int]
Aside from the functional differences tolist
will generally be quicker as it knows it has a numpy array and access to the backing array. Whereas, list
will fall back to using an iterator to add all the elements.
In [2]: arr = numpy.arange(1000)
In [3]: %timeit arr.tolist()
10000 loops, best of 3: 33 µs per loop
In [4]: %timeit list(arr)
10000 loops, best of 3: 80.7 µs per loop
I would expect the tolist
to be