How to store a boolean mask as an attribute of a Cython class?

╄→尐↘猪︶ㄣ 提交于 2021-01-27 13:26:33

问题


I failed to save a boolean mask as an attribute of a Cython class. In the real code I need this mask to perform tasks more efficiently. Here it follows a sample code:

core.pyx

import numpy as np
cimport numpy as np

cdef class MyClass():
    cdef public np.uint8_t[:] mask # uint8 has the same data structure of a boolean array
    cdef public np.float64_t[:] data

    def __init__(self, size):
        self.data = np.random.rand(size).astype(np.float64)
        self.mask = np.zeros(size, np.uint8)

script.py

import numpy as np
import pyximport
pyximport.install(setup_args={'include_dirs': np.get_include()})

from core import MyClass

mc = MyClass(1000000)
mc.mask = np.asarray(mc.data) > 0.5 

Error

When I run script.py it successfully compiles Cython, but throws the error:

Traceback (most recent call last):
  File "script.py", line 8, in <module>
    mc.mask = np.asarray(mc.data) > 0.5
  File "core.pyx", line 6, in core.MyClass.mask.__set__
    cdef public np.uint8_t[:] mask
ValueError: Does not understand character buffer dtype format string ('?')

Workaround

My current workaround is to pass the mask to all functions where I need, using cast=True, for example:

cpdef func(MyClass mc, np.ndarray[np.uint8_t, ndim=1, cast=True] mask):
    return np.asarray(mc.data)[mask]

Question

Are there any ideas out there on how the mask could be stored in the Cython class?


回答1:


So I don't believe memoryviews actually support boolean indexing anyway. Therefore to index the array you're always going to have to do

np.asarray(mc.data)[mask]
# or
mc.data.base[mask] # if you're sure it's always a view of something that supports boolean indexing)

I don't think this will change with the Cython update that @ead mentions. I suspect the reason for this is that it's probably fairly easy to do assignment (mc.data[mask] = x), but it isn't obvious what type should be returned by mc.data[mask] - it isn't a memoryview.

Therefore, whatever you do is going to involve some messy code.


For the part of the Assignment to the memoryview can be done with

mc.mask = (np.asarray(mc.data) > 0.5).view(np.uint8)

and returning it to a Numpy bool array with:

np.asarray(mc.mask).view(np.bool)

neither of which should involve copying.


If it were me designing this I'd keep the memoryviews non-public (for Cython-only use) and have normal object attributes that just hold the underlying Numpy arrays for the Python interface. You could use property to keep them in-sync (and do the casting):

cdef class MyClass:
    cdef np.uint8_t[:] mask_mview
    cdef object _mask

    @property
    def mask(self):
        return np.asarray(self._mask).view(np.bool)

    @mask.setter
    def mask(self, value):
        self._mask = value
        self.mask_view = value.view(np.uint8)

    # and the same for data

That way you have a memoryview to use for things that memoryviews are good at (iterating quickly element-by-element in Cython), access to the plain Numpy array for Python, and the two are held in sync (at least by the Python interface).




回答2:


Your best option (if you don’t want to use the workaround) is probably to wait for Cython 0.29.14 to be released. This hiccup was fixed and will be probably part of 0.29.14.

The following minimal example

%%cython
import numpy as np
cimport numpy as np
cdef np.uint8_t[:] mask  = np.random.rand(20)>.5

will fail to import with the usual

ValueError: Does not understand character buffer dtype format string ('?')

for Cython 0.29.13, but work with the current state from 0.29.x-branch on github (or master).



来源:https://stackoverflow.com/questions/58252561/how-to-store-a-boolean-mask-as-an-attribute-of-a-cython-class

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!