How do numpy's in-place operations (e.g. `+=`) work?

前端 未结 4 1572
一向
一向 2020-12-01 14:47

The basic question is: What happens under the hood when doing: a[i] += b?

Given the following:

import numpy as np
a = np.arange(4)
i = a         


        
相关标签:
4条回答
  • 2020-12-01 15:09

    As Ivc explains, there is no in-place item add method, so under the hood it uses __getitem__, then __iadd__, then __setitem__. Here's a way to empirically observe that behavior:

    import numpy
    
    class A(numpy.ndarray):
        def __getitem__(self, *args, **kwargs):
            print "getitem"
            return numpy.ndarray.__getitem__(self, *args, **kwargs)
        def __setitem__(self, *args, **kwargs):
            print "setitem"
            return numpy.ndarray.__setitem__(self, *args, **kwargs)
        def __iadd__(self, *args, **kwargs):
            print "iadd"
            return numpy.ndarray.__iadd__(self, *args, **kwargs)
    
    a = A([1,2,3])
    print "about to increment a[0]"
    a[0] += 1
    

    It prints

    about to increment a[0]
    getitem
    iadd
    setitem
    
    0 讨论(0)
  • 2020-12-01 15:20

    I don't know what's going on under the hood, but in-place operations on items in NumPy arrays and in Python lists will return the same reference, which IMO can lead to confusing results when passed into a function.

    Start with Python

    >>> a = [1, 2, 3]
    >>> b = a
    >>> a is b
    True
    >>> id(a[2])
    12345
    >>> id(b[2])
    12345
    

    ... where 12345 is a unique id for the location of the value at a[2] in memory, which is the same as b[2].

    So a and b refer to the same list in memory. Now try in-place addition on an item in the list.

    >>> a[2] += 4
    >>> a
    [1, 2, 7]
    >>> b
    [1, 2, 7]
    >>> a is b
    True
    >>> id(a[2])
    67890
    >>> id(b[2])
    67890
    

    So in-place addition of the item in the list only changed the value of the item at index 2, but a and b still reference the same list, although the 3rd item in the list was reassigned to a new value, 7. The reassignment explains why if a = 4 and b = a were integers (or floats) instead of lists, then a += 1 would cause a to be reassigned, and then b and a would be different references. However, if list addition is called, eg: a += [5] for a and b referencing the same list, it does not reassign a; they will both be appended.

    Now for NumPy

    >>> import numpy as np
    >>> a = np.array([1, 2, 3], float)
    >>> b = a
    >>> a is b
    True
    

    Again these are the same reference, and in-place operators seem have the same effect as for list in Python:

    >>> a += 4
    >>> a
    array([ 5.,  6.,  7.])
    >>> b
    array([ 5.,  6.,  7.])
    

    In place addition of an ndarray updates the reference. This is not the same as calling numpy.add which creates a copy in a new reference.

    >>> a = a + 4
    >>> a
    array([  9.,  10.,  11.])
    >>> b
    array([ 5.,  6.,  7.])
    

    In-place operations on borrowed references

    I think the danger here is if the reference is passed to a different scope.

    >>> def f(x):
    ...     x += 4
    ...     return x
    

    The argument reference to x is passed into the scope of f which does not make a copy and in fact changes the value at that reference and passes it back.

    >>> f(a)
    array([ 13.,  14.,  15.])
    >>> f(a)
    array([ 17.,  18.,  19.])
    >>> f(a)
    array([ 21.,  22.,  23.])
    >>> f(a)
    array([ 25.,  26.,  27.])
    

    The same would be true for a Python list as well:

    >>> def f(x, y):
    ...     x += [y]
    
    >>> a = [1, 2, 3]
    >>> b = a
    >>> f(a, 5)
    >>> a
    [1, 2, 3, 5]
    >>> b
    [1, 2, 3, 5]
    

    IMO this can be confusing and sometimes difficult to debug, so I try to only use in-place operators on references that belong to the current scope, and I try be careful of borrowed references.

    0 讨论(0)
  • 2020-12-01 15:26

    Actually that has nothing to do with numpy. There is no "set/getitem in-place" in python, these things are equivalent to a[indices] = a[indices] + x. Knowing that, it becomes pretty obvious what is going on. (EDIT: As lvc writes, actually the right hand side is in place, so that it is a[indices] = (a[indices] += x) if that was legal syntax, that has largly the same effect though)

    Of course a += x actually is in-place, by mapping a to the np.add out argument.

    It has been discussed before and numpy cannot do anything about it as such. Though there is an idea to have a np.add.at(array, index_expression, x) to at least allow such operations.

    0 讨论(0)
  • 2020-12-01 15:29

    The first thing you need to realise is that a += x doesn't map exactly to a.__iadd__(x), instead it maps to a = a.__iadd__(x). Notice that the documentation specifically says that in-place operators return their result, and this doesn't have to be self (although in practice, it usually is). This means a[i] += x trivially maps to:

    a.__setitem__(i, a.__getitem__(i).__iadd__(x))
    

    So, the addition technically happens in-place, but only on a temporary object. There is still potentially one less temporary object created than if it called __add__, though.

    0 讨论(0)
提交回复
热议问题