How to subclass numpy.`ma.core.masked_array`?

问题

I'm trying to write a subclass a masked_array. What I've got so far is this:

class gridded_array(ma.core.masked_array):
    def __init__(self, data, dimensions, mask=False, dtype=None,
                 copy=False, subok=True, ndmin=0, fill_value=None,
                 keep_mask=True, hard_mask=None, shrink=True):
        ma.core.masked_array.__init__(data, mask, dtype, copy, subok,
                                      ndmin, fill_value, keep_mask, hard_mask,
                                      shrink)
        self.dimensions = dimensions

However, when now I create a gridded_array, I don't get what I expect:

dims = OrderedDict()
dims['x'] = np.arange(4)
gridded_array(np.random.randn(4), dims)

masked_array(data = [-- -- -- --],
             mask = [ True  True  True  True],
             fill_value = 1e+20)

I would expect an unmasked array. I have the suspicion that the dimensions argument I'm passing gets passed on the the masked_array.__init__ call, but since I'm quite new to OOP, I don't know how to resolve this.

Any help is greatly appreciated.

PS: I'm on Python 2.7

回答1:

A word of warning: if you're brand new to OOP, subclassing ndarrays and MaskedArrays is not the easiest way to get started, by far...

Before anything else, you should go and check this tutorial. That should introduce you to the mechanisms involved in subclassing ndarrays.

MaskedArrays, like ndarrays, uses the __new__ method for creating class instances, not the __init__. By the time you get to the __init__ of your subclass, you already have a fully instanciated object, with the actual initialization delegated to the __array_finalize__ method. In simpler terms: your __init__ doesn't work as you would expect with standard Python object. (actually, I wonder whether it's called at all... After __array_finalize__, if I recall correctly...)

Now that you've been warned, you may want to consider whether you really need to go through the hassle of subclassing a ndarray:

What are your objectives with your gridded_array?
Should you support all methods of ndarrays, or only some? All dtypes?
What should happen when you take a single element or a slice of your object?
Will you be using gridded_arrays extensively as inputs of NumPy functions ?

If you have a doubt, then it might be easier to design gridded_array as a generic class that takes a ndarray (or a MaskedArray) as attribute (say, gridded_array._array), and add only the methods you would need to operate on your self._array.

Suggestions

If you only need to "tag" each item of your gridded_array, you may be interested in pandas.
If you only have to deal with floats, MaskedArray might be a bit overkill: just use nans to represent invalid data, a lot of numpy functions have nans equivalent. At worst, you can always mask your gridded_array when needed: taking a view of a subclass of ndarray with .view(np.ma.MaskedArray) should return a masked version of your input...

回答2:

The issue is that masked_array uses __new__ instead of __init__, so your dimensions argument is being misinterpreted.

To override __new__, use:

class gridded_array(ma.core.masked_array):
    def __new__(cls, data, dimensions, *args, **kwargs):
        self = super(gridded_array, cls).__new__(cls, data, *args, **kwargs)
        self.dimensions = dimensions
        return self

来源：https://stackoverflow.com/questions/12597827/how-to-subclass-numpy-ma-core-masked-array

标签

python

oop

inheritance

python-2.7

subclass