问题
I failed to save a boolean mask as an attribute of a Cython class. In the real code I need this mask to perform tasks more efficiently. Here it follows a sample code:
core.pyx
import numpy as np
cimport numpy as np
cdef class MyClass():
cdef public np.uint8_t[:] mask # uint8 has the same data structure of a boolean array
cdef public np.float64_t[:] data
def __init__(self, size):
self.data = np.random.rand(size).astype(np.float64)
self.mask = np.zeros(size, np.uint8)
script.py
import numpy as np
import pyximport
pyximport.install(setup_args={'include_dirs': np.get_include()})
from core import MyClass
mc = MyClass(1000000)
mc.mask = np.asarray(mc.data) > 0.5
Error
When I run script.py
it successfully compiles Cython, but throws the error:
Traceback (most recent call last):
File "script.py", line 8, in <module>
mc.mask = np.asarray(mc.data) > 0.5
File "core.pyx", line 6, in core.MyClass.mask.__set__
cdef public np.uint8_t[:] mask
ValueError: Does not understand character buffer dtype format string ('?')
Workaround
My current workaround is to pass the mask to all functions where I need, using cast=True
, for example:
cpdef func(MyClass mc, np.ndarray[np.uint8_t, ndim=1, cast=True] mask):
return np.asarray(mc.data)[mask]
Question
Are there any ideas out there on how the mask could be stored in the Cython class?
回答1:
So I don't believe memoryviews actually support boolean indexing anyway. Therefore to index the array you're always going to have to do
np.asarray(mc.data)[mask]
# or
mc.data.base[mask] # if you're sure it's always a view of something that supports boolean indexing)
I don't think this will change with the Cython update that @ead mentions. I suspect the reason for this is that it's probably fairly easy to do assignment (mc.data[mask] = x
), but it isn't obvious what type should be returned by mc.data[mask]
- it isn't a memoryview.
Therefore, whatever you do is going to involve some messy code.
For the part of the Assignment to the memoryview can be done with
mc.mask = (np.asarray(mc.data) > 0.5).view(np.uint8)
and returning it to a Numpy bool array with:
np.asarray(mc.mask).view(np.bool)
neither of which should involve copying.
If it were me designing this I'd keep the memoryviews non-public (for Cython-only use) and have normal object attributes that just hold the underlying Numpy arrays for the Python interface. You could use property
to keep them in-sync (and do the casting):
cdef class MyClass:
cdef np.uint8_t[:] mask_mview
cdef object _mask
@property
def mask(self):
return np.asarray(self._mask).view(np.bool)
@mask.setter
def mask(self, value):
self._mask = value
self.mask_view = value.view(np.uint8)
# and the same for data
That way you have a memoryview to use for things that memoryviews are good at (iterating quickly element-by-element in Cython), access to the plain Numpy array for Python, and the two are held in sync (at least by the Python interface).
回答2:
Your best option (if you don’t want to use the workaround) is probably to wait for Cython 0.29.14 to be released. This hiccup was fixed and will be probably part of 0.29.14.
The following minimal example
%%cython
import numpy as np
cimport numpy as np
cdef np.uint8_t[:] mask = np.random.rand(20)>.5
will fail to import with the usual
ValueError: Does not understand character buffer dtype format string ('?')
for Cython 0.29.13, but work with the current state from 0.29.x-branch on github (or master).
来源:https://stackoverflow.com/questions/58252561/how-to-store-a-boolean-mask-as-an-attribute-of-a-cython-class