The basic question is: What happens under the hood when doing: a[i] += b
?
Given the following:
import numpy as np
a = np.arange(4)
i = a
As Ivc explains, there is no in-place item add method, so under the hood it uses __getitem__
, then __iadd__
, then __setitem__
. Here's a way to empirically observe that behavior:
import numpy
class A(numpy.ndarray):
def __getitem__(self, *args, **kwargs):
print "getitem"
return numpy.ndarray.__getitem__(self, *args, **kwargs)
def __setitem__(self, *args, **kwargs):
print "setitem"
return numpy.ndarray.__setitem__(self, *args, **kwargs)
def __iadd__(self, *args, **kwargs):
print "iadd"
return numpy.ndarray.__iadd__(self, *args, **kwargs)
a = A([1,2,3])
print "about to increment a[0]"
a[0] += 1
It prints
about to increment a[0]
getitem
iadd
setitem
I don't know what's going on under the hood, but in-place operations on items in NumPy arrays and in Python lists will return the same reference, which IMO can lead to confusing results when passed into a function.
>>> a = [1, 2, 3]
>>> b = a
>>> a is b
True
>>> id(a[2])
12345
>>> id(b[2])
12345
... where 12345
is a unique id
for the location of the value at a[2]
in memory, which is the same as b[2]
.
So a
and b
refer to the same list in memory. Now try in-place addition on an item in the list.
>>> a[2] += 4
>>> a
[1, 2, 7]
>>> b
[1, 2, 7]
>>> a is b
True
>>> id(a[2])
67890
>>> id(b[2])
67890
So in-place addition of the item in the list only changed the value of the item at index 2
, but a
and b
still reference the same list, although the 3rd item in the list was reassigned to a new value, 7
. The reassignment explains why if a = 4
and b = a
were integers (or floats) instead of lists, then a += 1
would cause a
to be reassigned, and then b
and a
would be different references. However, if list addition is called, eg: a += [5]
for a
and b
referencing the same list, it does not reassign a
; they will both be appended.
>>> import numpy as np
>>> a = np.array([1, 2, 3], float)
>>> b = a
>>> a is b
True
Again these are the same reference, and in-place operators seem have the same effect as for list in Python:
>>> a += 4
>>> a
array([ 5., 6., 7.])
>>> b
array([ 5., 6., 7.])
In place addition of an ndarray
updates the reference. This is not the same as calling numpy.add
which creates a copy in a new reference.
>>> a = a + 4
>>> a
array([ 9., 10., 11.])
>>> b
array([ 5., 6., 7.])
I think the danger here is if the reference is passed to a different scope.
>>> def f(x):
... x += 4
... return x
The argument reference to x
is passed into the scope of f
which does not make a copy and in fact changes the value at that reference and passes it back.
>>> f(a)
array([ 13., 14., 15.])
>>> f(a)
array([ 17., 18., 19.])
>>> f(a)
array([ 21., 22., 23.])
>>> f(a)
array([ 25., 26., 27.])
The same would be true for a Python list as well:
>>> def f(x, y):
... x += [y]
>>> a = [1, 2, 3]
>>> b = a
>>> f(a, 5)
>>> a
[1, 2, 3, 5]
>>> b
[1, 2, 3, 5]
IMO this can be confusing and sometimes difficult to debug, so I try to only use in-place operators on references that belong to the current scope, and I try be careful of borrowed references.
Actually that has nothing to do with numpy. There is no "set/getitem in-place" in python, these things are equivalent to a[indices] = a[indices] + x
. Knowing that, it becomes pretty obvious what is going on. (EDIT: As lvc writes, actually the right hand side is in place, so that it is a[indices] = (a[indices] += x)
if that was legal syntax, that has largly the same effect though)
Of course a += x
actually is in-place, by mapping a to the np.add
out
argument.
It has been discussed before and numpy cannot do anything about it as such. Though there is an idea to have a np.add.at(array, index_expression, x)
to at least allow such operations.
The first thing you need to realise is that a += x
doesn't map exactly to a.__iadd__(x)
, instead it maps to a = a.__iadd__(x)
. Notice that the documentation specifically says that in-place operators return their result, and this doesn't have to be self
(although in practice, it usually is). This means a[i] += x
trivially maps to:
a.__setitem__(i, a.__getitem__(i).__iadd__(x))
So, the addition technically happens in-place, but only on a temporary object. There is still potentially one less temporary object created than if it called __add__
, though.