Error in copying a composite object consisting mostly of pandas.DataFrame

穿精又带淫゛_ 提交于 2019-12-14 03:58:32

问题


I try to use composition with pandas.DataFrame in the following way, but it is giving me errors when I try to copy the object.

import numpy as np
import pandas as pd
import copy


class Foo(object):
    """
    Foo is composed mostly of a pd.DataFrame, and behaves like it too. 
    """

    def __init__(self, df, attr_custom):
        self._ = df
        self.attr_custom = attr_custom

    # the following code allows Foo objects to behave like pd.DataFame,
    # and I want to keep this behavior.
    def __getattr__(self, attr):
        return getattr(self._, attr)


df = pd.DataFrame(np.random.randint(0,2,(3,2)), columns=['A','B'])
foo = Foo(df)
foo_cp = copy.deepcopy(foo)

The error I get:

---> 16 foo_cp = copy.deepcopy(foo)

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in deepcopy(x, memo, _nil)
    188                             raise Error(
    189                                 "un(deep)copyable object of type %s" % cls)
--> 190                 y = _reconstruct(x, rv, 1, memo)
    191 
    192     memo[d] = y

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in _reconstruct(x, info, deep, memo)
    341                 slotstate = None
    342             if state is not None:
--> 343                 y.__dict__.update(state)
    344             if slotstate is not None:
    345                 for key, value in slotstate.iteritems():

TypeError: 'BlockManager' object is not iterable 

My questions:

  1. Any idea what is going on here?
  2. What is the "recommended" way of using composition with pandas.DataFrame?
  3. If for some reasons it is a bad idea to use _ as the name of the dummy attribute, please let me know.

回答1:


The standard way to do this is define a _constructor property:

class Foo(pd.DataFrame):
    @property
    def _constructor(self):
        return Foo

Then most DataFrame methods should work, and return a Foo.

In [11]: df = pd.DataFrame([[1, 2], [3, 4]])

In [12]: foo = Foo(df)

In [13]: foo.copy()
Out[13]:
   0  1
0  1  2
1  3  4

In [14]: type(foo.copy())
Out[14]: __main__.Foo

Including copy.deepcopy:

In [15]: copy.deepcopy(foo)
Out[15]:
   0  1
0  1  2
1  3  4

In [16]: type(copy.deepcopy(foo))
Out[16]: __main__.Foo

Aside: I wouldn't use _ as a variable/method name, it's not descriptive at all. You can prefix a name with _ to show that it should be considered "private", but give it a (descriptive!) name e.g. _df.

_ is often used in python to mean "discard this variable", so you might write:

sum(1 for _ in x)  # this is basically the same as len!

Although it would be perfectly valid python to use the _ e.g.:

sum( _ ** 2 for _ in x)

This would generally be frowned upon (instead use i or something).

In ipython _ means the previous returned value.



来源:https://stackoverflow.com/questions/29569005/error-in-copying-a-composite-object-consisting-mostly-of-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!