Subclassing a Pandas DataFrame, updates?

后端 未结 1 1970
北海茫月
北海茫月 2020-12-03 20:17

To inherit, or not to inherit?

What is the latest on the subclassing issue for Pandas? (Most of the other threads are 3-4 years old).

I am hoping to do somet

相关标签:
1条回答
  • 2020-12-03 21:16

    This is how I've done it. I've followed advice found:

    • subclassing-pandas-data-structures
    • Fix Finalize Issue

    The example below only shows the use of constructing new subclasses of pandas.DataFrame. If you follow the advice in my first link, you may consider subclassing pandas.Series as well to account for taking single dimensional slices of your pandas.DataFrame subclass.

    Defining SomeData

    import pandas as pd
    import numpy as np
    
    class SomeData(pd.DataFrame):
        # This class variable tells Pandas the name of the attributes
        # that are to be ported over to derivative DataFrames.  There
        # is a method named `__finalize__` that grabs these attributes
        # and assigns them to newly created `SomeData`
        _metadata = ['my_attr']
    
        @property
        def _constructor(self):
            """This is the key to letting Pandas know how to keep
            derivative `SomeData` the same type as yours.  It should
            be enough to return the name of the Class.  However, in
            some cases, `__finalize__` is not called and `my_attr` is
            not carried over.  We can fix that by constructing a callable
            that makes sure to call `__finlaize__` every time."""
            def _c(*args, **kwargs):
                return SomeData(*args, **kwargs).__finalize__(self)
            return _c
    
        def __init__(self, *args, **kwargs):
            # grab the keyword argument that is supposed to be my_attr
            self.my_attr = kwargs.pop('my_attr', None)
            super().__init__(*args, **kwargs)
    
        def my_method(self, other):
            return self * np.sign(self - other)
    

    Demonstration

    mydata = SomeData(dict(A=[1, 2, 3], B=[4, 5, 6]), my_attr='an attr')
    
    print(mydata, type(mydata), mydata.my_attr, sep='\n' * 2)
    
       A  B
    0  1  4
    1  2  5
    2  3  6
    
    <class '__main__.SomeData'>
    
    an attr
    
    newdata = mydata.mul(2)
    
    print(newdata, type(newdata), newdata.my_attr, sep='\n' * 2)
    
       A   B
    0  2   8
    1  4  10
    2  6  12
    
    <class '__main__.SomeData'>
    
    an attr
    
    newerdata = mydata.my_method(newdata)
    
    print(newerdata, type(newerdata), newerdata.my_attr, sep='\n' * 2)
    
       A  B
    0 -1 -4
    1 -2 -5
    2 -3 -6
    
    <class '__main__.SomeData'>
    
    an attr
    

    Gotchas

    This borks on the method pd.DataFrame.equals

    newerdata.equals(newdata)  # Should be `False`
    
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-304-866170ab179e> in <module>()
    ----> 1 newerdata.equals(newdata)
    
    ~/anaconda3/envs/3.6.ml/lib/python3.6/site-packages/pandas/core/generic.py in equals(self, other)
       1034         the same location are considered equal.
       1035         """
    -> 1036         if not isinstance(other, self._constructor):
       1037             return False
       1038         return self._data.equals(other._data)
    
    TypeError: isinstance() arg 2 must be a type or tuple of types
    

    What happens is that this method expected to find an object of type type in the _constructor attribute. Instead, it found my callable that I placed there in order to fix the __finalize__ issue I came across.

    Work around

    Override the equals method with the following in your class definition.

        def equals(self, other):
            try:
                pd.testing.assert_frame_equal(self, other)
                return True
            except AssertionError:
                return False
    
    newerdata.equals(newdata)  # Should be `False`
    
    False
    
    0 讨论(0)
提交回复
热议问题