Why is a `for` over a Python list faster than over a Numpy array?

后端 未结 2 1179
太阳男子
太阳男子 2020-12-02 23:45

So without telling a really long story I was working on some code where I was reading in some data from a binary file and then looping over every single point using a for lo

相关标签:
2条回答
  • 2020-12-03 00:23

    We can do a little sleuthing to figure this out:

    >>> import numpy as np
    >>> a = np.arange(32)
    >>> a
    array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
           17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])
    >>> a.data
    <read-write buffer for 0x107d01e40, size 256, offset 0 at 0x107d199b0>
    >>> id(a.data)
    4433424176
    >>> id(a[0])
    4424950096
    >>> id(a[1])
    4424950096
    >>> for item in a:
    ...   print id(item)
    ... 
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    4424950096
    4424950120
    

    So what is going on here? First, I took a look at the memory location of the array's memory buffer. It's at 4433424176. That in itself isn't too illuminating. However, numpy stores it's data as a contiguous C array, so the first element in the numpy array should correspond to the memory address of the array itself, but it doesn't:

    >>> id(a[0])
    4424950096
    

    and it's a good thing it doesn't because that would break the invariant in python that 2 objects never have the same id during their lifetimes.

    So, how does numpy accomplish this? Well, the answer is that numpy has to wrap the returned object with a python type (e.g. numpy.float64 or numpy.int64 in this case) which takes time if you're iterating item-by-item1. Further proof of this is demonstrated when iterating -- We see that we're alternating between 2 separate IDs while iterating over the array. This means that python's memory allocator and garbage collector are working overtime to create new objects and then free them.

    A list doesn't have this memory allocator/garbage collector overhead. The objects in the list already exist as python objects (and they'll still exist after iteration), so neither plays any role in the iteration over a list.

    Timing methodology:

    Also note, your timings are thrown off a little bit by your assumptions. You were assuming that k + 1 should take about the same amount of time in both cases, but it doesn't. Notice if I repeat your timings without doing any addition:

    mgilson$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k"
    1000 loops, best of 3: 233 usec per loop
    mgilson$ python -m timeit "for k in range(5000): k"
    10000 loops, best of 3: 114 usec per loop
    

    there's only about a factor of 2 difference. Doing the addition however leads to a factor of 5 difference or so:

    mgilson$ python -m timeit "for k in range(5000): k+1"
    10000 loops, best of 3: 179 usec per loop
    mgilson$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k+1"
    1000 loops, best of 3: 786 usec per loop
    

    For fun, lets just do the addition:

    $ python -m timeit -s "v = 1" "v + 1"
    10000000 loops, best of 3: 0.0261 usec per loop
    mgilson$ python -m timeit -s "import numpy; v = numpy.int64(1)" "v + 1"
    10000000 loops, best of 3: 0.121 usec per loop
    

    And finally, your timeit also includes list/array construction time which isn't ideal:

    mgilson$ python -m timeit -s "v = range(5000)" "for k in v: k"
    10000 loops, best of 3: 80.2 usec per loop
    mgilson$ python -m timeit -s "import numpy; v = numpy.arange(5000)" "for k in v: k"
    1000 loops, best of 3: 237 usec per loop
    

    Notice that numpy actually got further away from the list solution in this case. This shows that iteration really is slower and you might get some speedups if you convert the numpy types to standard python types.

    1Note, this doesn't take a lot of time when slicing because that only has to allocate O(1) new objects since numpy returns a view into the original array.

    0 讨论(0)
  • 2020-12-03 00:34

    Using python 2.7

    Here are my speeds along with xrange:

    python -m timeit -s "import numpy" "for k in numpy.arange(5000): k+1"
    

    1000 loops, best of 3: 1.22 msec per loop

    python -m timeit "for k in range(5000): k+1"
    

    10000 loops, best of 3: 186 usec per loop

    python -m timeit "for k in xrange(5000): k+1"
    

    10000 loops, best of 3: 161 usec per loop


    Numpy is noticeibly slower because it's iterating over a numpy-specific array. This is not its primarily intended function. In many cases, they should be treated more like a monolithic collection of numbers as opposed to simple lists/iterables. For example, if we have a rather large-ish python list of numbers that we want to raise to the third power, we might do something like this:

    python -m timeit "lst1 = [x for x in range(100000)];" "lst2 = map(lambda x: x**3, lst1)"
    

    10 loops, best of 3: 125 msec per loop

    Note: the lst1 represents an arbitrary list. I'm aware you can speed this up within the original lambda by doing x**3 for x in range, but this is dealign with a list that should already exist and may very well not be sequential.

    Anyway, numpy is meant to be treated as an array would be:

    python -m timeit -s "import numpy" "lst1 = numpy.arange(100000)" "lst2 = lst1**2"
    

    10000 loops, best of 3: 120 usec per loop

    Say you had two lists of arbitrary values, each of which you want to multiply together. In vanilla python, you might do:

    python -m timeit -s "lst1 = [x for x in xrange(0, 10000, 2)]" "lst2 = [x for x in xrange(2, 10002, 2)]" "lst3 = [x*y for x,y in zip(lst1, lst2)]"
    

    1000 loops, best of 3: 736 usec per loop

    And in Numpy:

    python -m timeit -s "import numpy" "lst1 = numpy.arange(0, 10000, 2)" "lst2 = numpy.arange(2, 10002, 2)" "lst3 = lst1*lst2"
    

    100000 loops, best of 3: 10.9 usec per loop

    In these last two examples, NumPy skyrockets ahead as the clear winner. For simple iteration over a list, range or xrange is perfectly sufficient, but your example does not take into account the true purpose of Numpy arrays. It's comparing planes and cars; yeah, planes are generally faster for what they are intended to do, but trying to fly to your local supermarket is not prudent.

    0 讨论(0)
提交回复
热议问题