问题
I have a class that returns large NumPy arrays. These arrays are cached within the class. I would like the returned arrays to be copy-on-write arrays. If the caller ends up just reading from the array, no copy is ever made. This will case no extra memory will be used. However, the array is "modifiable", but does not modify the internal cached arrays.
My solution at the moment is to make any cached arrays readonly (a.flags.writeable = False)
. This means that if the caller of the function may have to make their own copy of the array if they want to modify it. Of course, if the source was not from cache and the array was already writable, then they would duplicate the data unnecessarily.
So, optimally I would love something like a.view(flag=copy_on_write)
. There seems to be a flag for the reverse of this UPDATEIFCOPY
which causes a copy to update the original once deallocated.
Thanks!
回答1:
Copy-on-write is a nice concept, but explicit copying seems to be "the NumPy philosophy". So personally I would keep the "readonly" solution if it isn't too clumsy.
But I admit having written my own copy-on-write wrapper class. I don't try to detect write access to the array. Instead the class has a method "get_array(readonly)" returning its (otherwise private) numpy array. The first time you call it with "readonly=False" it makes a copy. This is very explicit, easy to read and quickly understood.
If your copy-on-write numpy array looks like a classical numpy array, the reader of your code (possibly you in 2 years) may have a hard time.
回答2:
To implement copy on write, we need to modify base
, data
, strides
of ndarray object. I think this can't be done in pure Python code. I use some Cython
code to modify these attributes.
Here is the code in IPython notebook:
%load_ext cythonmagic
use Cython define copy_view()
:
%%cython
cimport numpy as np
np.import_array()
np.import_ufunc()
def copy_view(np.ndarray a):
cdef np.ndarray b
cdef object base
cdef int i
base = np.get_array_base(a)
if base is None or isinstance(base, a.__class__):
return a
else:
print "copy"
b = a.copy()
np.set_array_base(a, b)
a.data = b.data
for i in range(b.ndim):
a.strides[i] = b.strides[i]
define a subclass of ndarray:
class cowarray(np.ndarray):
def __setitem__(self, key, value):
copy_view(self)
np.ndarray.__setitem__(self, key, value)
def __array_prepare__(self, array, context=None):
if self is array:
copy_view(self)
return array
def __array__(self):
copy_view(self)
return self
some test:
a = np.array([1.0, 2, 3, 4])
b = a.view(cowarray)
b[1] = 100 #copy
print a, b
b[2] = 200 #no copy
print a, b
c = a[::2].view(cowarray)
c[0] = 1000 #copy
print a, c
d = a.view(cowarray)
np.sin(d, d) #copy
print a, d
the output:
copy
[ 1. 2. 3. 4.] [ 1. 100. 3. 4.]
[ 1. 2. 3. 4.] [ 1. 100. 200. 4.]
copy
[ 1. 2. 3. 4.] [ 1000. 3.]
copy
[ 1. 2. 3. 4.] [ 0.84147098 0.90929743 0.14112001 -0.7568025 ]
来源:https://stackoverflow.com/questions/21896030/numpy-array-copy-on-write