How do numpy's in-place operations (e.g. `+=`) work?

前端未结

关注

 4  1572

The basic question is: What happens under the hood when doing: a[i] += b?

Given the following:

import numpy as np
a = np.arange(4)
i = a


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-12-01 15:09
              
            
            
                                                                       
As Ivc explains, there is no in-place item add method, so under the hood it uses __getitem__, then __iadd__, then __setitem__.  Here's a way to empirically observe that behavior:

import numpy

class A(numpy.ndarray):
    def __getitem__(self, *args, **kwargs):
        print "getitem"
        return numpy.ndarray.__getitem__(self, *args, **kwargs)
    def __setitem__(self, *args, **kwargs):
        print "setitem"
        return numpy.ndarray.__setitem__(self, *args, **kwargs)
    def __iadd__(self, *args, **kwargs):
        print "iadd"
        return numpy.ndarray.__iadd__(self, *args, **kwargs)

a = A([1,2,3])
print "about to increment a[0]"
a[0] += 1


It prints

about to increment a[0]
getitem
iadd
setitem

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2020-12-01 15:20
              
            
            
                                                                       
I don't know what's going on under the hood, but in-place operations on items in NumPy arrays and in Python lists will return the same reference, which IMO can lead to confusing results when passed into a function. 

Start with Python

>>> a = [1, 2, 3]
>>> b = a
>>> a is b
True
>>> id(a[2])
12345
>>> id(b[2])
12345


... where 12345 is a unique id for the location of the value at a[2] in memory, which is the same as b[2].

So a and b refer to the same list in memory. Now try in-place addition on an item in the list.

>>> a[2] += 4
>>> a
[1, 2, 7]
>>> b
[1, 2, 7]
>>> a is b
True
>>> id(a[2])
67890
>>> id(b[2])
67890


So in-place addition of the item in the list only changed the value of the item at index 2, but a and b still reference the same list, although the 3rd item in the list was reassigned to a new value, 7. The reassignment explains why if a = 4 and b = a were integers (or floats) instead of lists, then a += 1 would cause a to be reassigned, and then b and a would be different references. However, if list addition is called, eg: a += [5] for a and b referencing the same list, it does not reassign a; they will both be appended.

Now for NumPy

>>> import numpy as np
>>> a = np.array([1, 2, 3], float)
>>> b = a
>>> a is b
True


Again these are the same reference, and in-place operators seem have the same effect as for list in Python:

>>> a += 4
>>> a
array([ 5.,  6.,  7.])
>>> b
array([ 5.,  6.,  7.])


In place addition of an ndarray updates the reference. This is not the same as calling numpy.add which creates a copy in a new reference.

>>> a = a + 4
>>> a
array([  9.,  10.,  11.])
>>> b
array([ 5.,  6.,  7.])


In-place operations on borrowed references

I think the danger here is if the reference is passed to a different scope.

>>> def f(x):
...     x += 4
...     return x


The argument reference to x is passed into the scope of f which does not make a copy and in fact changes the value at that reference and passes it back.

>>> f(a)
array([ 13.,  14.,  15.])
>>> f(a)
array([ 17.,  18.,  19.])
>>> f(a)
array([ 21.,  22.,  23.])
>>> f(a)
array([ 25.,  26.,  27.])


The same would be true for a Python list as well:

>>> def f(x, y):
...     x += [y]

>>> a = [1, 2, 3]
>>> b = a
>>> f(a, 5)
>>> a
[1, 2, 3, 5]
>>> b
[1, 2, 3, 5]


IMO this can be confusing and sometimes difficult to debug, so I try to only use in-place operators on references that belong to the current scope, and I try be careful of borrowed references.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-01 15:26
              
            
            
                                                                       
Actually that has nothing to do with numpy. There is no "set/getitem in-place" in python, these things are equivalent to a[indices] = a[indices] + x. Knowing that, it becomes pretty obvious what is going on. (EDIT: As lvc writes, actually the right hand side is in place, so that it is a[indices] = (a[indices] += x) if that was legal syntax, that has largly the same effect though)

Of course a += x actually is in-place, by mapping a to the np.add out argument.

It has been discussed before and numpy cannot do anything about it as such. Though there is an idea to have a np.add.at(array, index_expression, x) to at least allow such operations.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2020-12-01 15:29
              
            
            
                                                                       
The first thing you need to realise is that a += x doesn't map exactly to a.__iadd__(x), instead it maps to a = a.__iadd__(x). Notice that the documentation specifically says that in-place operators return their result, and this doesn't have to be self (although in practice, it usually is). This means a[i] += x trivially maps to:

a.__setitem__(i, a.__getitem__(i).__iadd__(x))


So, the addition technically happens in-place, but only on a temporary object. There is still potentially one less temporary object created than if it called __add__, though.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复