问题
I was looking into numpy issue 2972 and several related problems. It turns out that all those problems are related to the situation where the array itself is structured, but its mask is not:
In [38]: R = numpy.zeros(10, dtype=[("A", "<f2"), ("B", "<f4")])
In [39]: Rm = numpy.ma.masked_where(R["A"]<5, R)
In [41]: Rm.dtype
Out[41]: dtype([('A', '<f2'), ('B', '<f4')])
In [42]: Rm.mask.dtype
Out[42]: dtype('bool')
# Now, both `__getitem__` and `__repr__` will result in errors — see issue #2972
If I create a masked array differently, the mask dtype is structured like the dtype of the array itself:
In [44]: Q.dtype
Out[44]: dtype([('A', '<f4'), ('B', '<f4')])
In [45]: Q.mask.dtype
Out[45]: dtype([('A', '?'), ('B', '?')])
The former situation exposes several problems. For example, Rm.__repr__()
and Rm["A"]
both result in IndexError
, although it was a ValueError
in the past.
By design, is the pattern supposed to be possible, where A.dtype
is structured, but A.mask.dtype
is not structured?
In other words: is the bug in the __repr__
and __getitem__
methods in numpy.ma.core.MaskedArray
, or is the real bug occurring before — by permitting such a masked structured array to exist in the first place?
回答1:
The errors in your 1st case indicate that the methods expect the mask to have the same number (and names) of fields as the base array
__getitem__: dout._mask = _mask[indx]
_recursive_printoption: (curdata, curmask) = (result[name], mask[name])
If the masked array is make with the 'main' constructor, the mask has the same structure
Rn = np.ma.masked_array(R, mask=R['A']>5)
Rn.mask.dtype: dtype([('A', '?'), ('B', '?')])
In other words, there is a mask value for each field of each element.
The masked_array
doc evidently intends for 'same shape' to include dtype
structure. Mask: Must be convertible to an array of booleans with the same shape as 'data'.
If I try to set the mask in the same way that masked_where
does
Rn._mask=R['A']>5
I get the same print error. The structured mask gets overwritten with the new boolean, changing its dtype. In contrast if I use
Rn.mask=R['A']<5
Rn
prints fine. .mask
is a property, whose set
method evidently handles the structured mask correctly.
Without digging into the code history (on github) my guess is that masked_where
is a convenience function that wasn't updated when structure dtypes were added to other parts of the ma
code. Compared to ma.masked_array
it's a simple function that does not look at the dtype at all. Other convenience functions like ma.masked_greater
use masked_where
. Changing result._mask = cond
to result.mask = cond
might be all that is need to correct this issue.
How thoroughly have you tested the consequences of an unstructured mask?
Rm.flatten()
returns an array with a structured mask, even when it started with an unstructured one. That's because it uses Rm.__setmask__
, which is sensitive to fields. And that's the set
function for the mask
property.
Rm.tolist() # same error as str()
masked_where
starts with:
cond = make_mask(condition)
make_mask
returns the simple 'bool' dtype. It can also be called with a dtype, producing a structured mask: np.ma.make_mask(R['A']<5,dtype=R.dtype)
. But such a structured mask gets flattened when used in masked_where
. masked_where
not only allows a unstructured mask, it forces it to be unstructured.
Your unstructured mask is already partly implemented, the recordmask
property:
recordmask = property(fget=_get_recordmask)
I say partly because it has a get
method, but the set
method is not yet implemented. See def _set_recordmask(self):
The more I look at this the more I'm convinced that masked_where
is wrong. It could be changed to set a structured mask, but then it's not much different from masked_array
. It might better if it raises an error when the array is structured (has dtype.names
). That way masked_where
will remain useful for unstructured numeric arrays, while preventing misapplication to structured ones.
I should also look at the test code.
来源:https://stackoverflow.com/questions/28182408/is-the-mask-of-a-structured-array-supposed-to-be-structured-itself