问题
I try to use composition with pandas.DataFrame in the following way, but it is giving me errors when I try to copy the object.
import numpy as np
import pandas as pd
import copy
class Foo(object):
"""
Foo is composed mostly of a pd.DataFrame, and behaves like it too.
"""
def __init__(self, df, attr_custom):
self._ = df
self.attr_custom = attr_custom
# the following code allows Foo objects to behave like pd.DataFame,
# and I want to keep this behavior.
def __getattr__(self, attr):
return getattr(self._, attr)
df = pd.DataFrame(np.random.randint(0,2,(3,2)), columns=['A','B'])
foo = Foo(df)
foo_cp = copy.deepcopy(foo)
The error I get:
---> 16 foo_cp = copy.deepcopy(foo)
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in deepcopy(x, memo, _nil)
188 raise Error(
189 "un(deep)copyable object of type %s" % cls)
--> 190 y = _reconstruct(x, rv, 1, memo)
191
192 memo[d] = y
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in _reconstruct(x, info, deep, memo)
341 slotstate = None
342 if state is not None:
--> 343 y.__dict__.update(state)
344 if slotstate is not None:
345 for key, value in slotstate.iteritems():
TypeError: 'BlockManager' object is not iterable
My questions:
- Any idea what is going on here?
- What is the "recommended" way of using composition with pandas.DataFrame?
- If for some reasons it is a bad idea to use
_
as the name of the dummy attribute, please let me know.
回答1:
The standard way to do this is define a _constructor
property:
class Foo(pd.DataFrame):
@property
def _constructor(self):
return Foo
Then most DataFrame methods should work, and return a Foo.
In [11]: df = pd.DataFrame([[1, 2], [3, 4]])
In [12]: foo = Foo(df)
In [13]: foo.copy()
Out[13]:
0 1
0 1 2
1 3 4
In [14]: type(foo.copy())
Out[14]: __main__.Foo
Including copy.deepcopy:
In [15]: copy.deepcopy(foo)
Out[15]:
0 1
0 1 2
1 3 4
In [16]: type(copy.deepcopy(foo))
Out[16]: __main__.Foo
Aside: I wouldn't use _
as a variable/method name, it's not descriptive at all. You can prefix a name with _
to show that it should be considered "private", but give it a (descriptive!) name e.g. _df
.
_
is often used in python to mean "discard this variable", so you might write:
sum(1 for _ in x) # this is basically the same as len!
Although it would be perfectly valid python to use the _
e.g.:
sum( _ ** 2 for _ in x)
This would generally be frowned upon (instead use i
or something).
In ipython _
means the previous returned value.
来源:https://stackoverflow.com/questions/29569005/error-in-copying-a-composite-object-consisting-mostly-of-pandas-dataframe