Difference between list(numpy_array) and numpy_array.tolist()

后端 未结 3 1278
误落风尘
误落风尘 2021-01-04 21:07

What is the difference between applying list() on a numpy array vs. calling tolist()?

I was checking the types of both output

相关标签:
3条回答
  • 2021-01-04 21:47

    Your example already shows the difference; consider the following 2D array:

    >>> import numpy as np
    >>> a = np.arange(4).reshape(2, 2)
    >>> a
    array([[0, 1],
           [2, 3]])
    >>> a.tolist()
    [[0, 1], [2, 3]] # nested vanilla lists
    >>> list(a)
    [array([0, 1]), array([2, 3])] # list of arrays
    

    tolist handles the full conversion to nested vanilla lists (i.e. list of list of int), whereas list just iterates over the first dimension of the array, creating a list of arrays (list of np.array of np.int64). Although both are lists:

    >>> type(list(a))
    <type 'list'>
    >>> type(a.tolist())
    <type 'list'>
    

    the elements of each list have a different type:

    >>> type(list(a)[0])
    <type 'numpy.ndarray'>
    >>> type(a.tolist()[0])
    <type 'list'>
    

    The other difference, as you note, is that list will work on any iterable, whereas tolist can only be called on objects that specifically implement that method.

    0 讨论(0)
  • 2021-01-04 21:47

    .tolist() appears to convert all of the values recursively to python primitives (list), whereas list creates a python list from an iterable. Since the numpy array is an array of arrays, list(...) creates a list of arrays

    You can think of list as a function that looks like this:

    # Not the actually implementation, just for demo purposes
    def  list(iterable):
        newlist = []
        for obj in iter(iterable):
            newlist.append(obj)
        return newlist
    
    0 讨论(0)
  • 2021-01-04 21:59

    The major difference is that tolist recursively converts all data to python standard library types.

    For instance:

    >>> arr = numpy.arange(2)
    >>> [type(item) for item in list(arr)]
    [numpy.int64, numpy.int64]
    >>> [type(item) for item in arr.tolist()]
    [builtins.int, builtins.int]
    

    Aside from the functional differences tolist will generally be quicker as it knows it has a numpy array and access to the backing array. Whereas, list will fall back to using an iterator to add all the elements.

    In [2]: arr = numpy.arange(1000)
    
    In [3]: %timeit arr.tolist()
    10000 loops, best of 3: 33 µs per loop
    
    In [4]: %timeit list(arr)
    10000 loops, best of 3: 80.7 µs per loop
    

    I would expect the tolist to be

    0 讨论(0)
提交回复
热议问题