Why is it not possible to pass attributes of an instance through a copy? I want to pass the name
attribute to another dataframe.
import copy
df = pd
Attaching custom metadata to DataFrames seems to be unsupported for pandas. See this answer (possible duplicate?) and this github issue.
This code is worked:
>>> class test():
... @property
... def name(self):
... return self._name
... @name.setter
... def name(self, value):
... self._name = value
...
>>>
>>> a = test()
>>> a.name = 'Test123'
>>> import copy
>>> a2 = copy.deepcopy(a)
>>> print(a2.name)
Test123
so I think that behavior is defined by pd.DataFrame
I found that pandas define the function __deepcopy__
, but I cannot totally understand the reason.
pandas/core/indexes/base.py#L960
As noted elsewhere, the DataFrame
class has a custom __deepcopy__ method which does not necessarily copy arbitrary attributes assigned to an instance, as with a normal object.
Interestingly, there is an internal _metadata
attribute that seems intended to be able to list additional attributes of an NDFrame
that should be kept when copying/serializing it. This is discussed some here: https://github.com/pandas-dev/pandas/issues/9317
Unfortunately this is still considered an undocumented internal detail, so it probably shouldn't be used. From looking at the code you can in principle do:
mydf = pd.DataFrame(...)
mydf.name = 'foo'
mydf._metadata += ['name']
and when you copy it it should take the name with it.
You could subclass DataFrame
to make this the default:
import functools
class NamedDataFrame(pd.DataFrame):
_metadata = pd.DataFrame._metadata + ['name']
def __init__(self, name, *args, **kwargs):
self.name = name
super().__init__(*args, **kwargs)
@property
def _constructor(self):
return functools.partial(self.__class__, self.name)
You could also do this without relying on this internal _metadata
attribute if you provide your own wrapper to the existing copy
method, and possibly also __getstate__
and __setstate__
.
Update: It seems actually use of the _metadata
attribute for extending Pandas classes is now documented. So the above example should more or less work. These docs are more for development of Pandas itself so it might still be a bit volatile. But this is how Pandas itself extends subclasses of NDFrame
.
The copy.deepcopy will use a custom __deepcopy__
method if it is found in the MRO, which may return whatever it likes (including completely bogus results). Indeed dataframes implement a __deepcopy__
method:
def __deepcopy__(self, memo=None):
if memo is None:
memo = {}
return self.copy(deep=True)
It delegates to self.copy
, where you will find this note in the docstring:
Notes
-----
When ``deep=True``, data is copied but actual Python objects
will not be copied recursively, only the reference to the object.
This is in contrast to `copy.deepcopy` in the Standard Library,
which recursively copies object data (see examples below).
And you will find in the v0.13 release notes (merged in PR 4039):
__deepcopy__
now returns a shallow copy (currently: a view) of the data - allowing metadata changes.
Related issue: 17406.