Pandas _metadata of DataFrame persistence error

那年仲夏 提交于 2019-12-09 18:20:10

问题


I have finally figured out how to use _metadata from a DataFrame, everything works except I am unable to persist it such as to hdf5 or json. I know it works because I copy the frame and _metadata attributes copy over "non _metadata" attributes don't.

example

df = pandas.DataFrame #make up a frame to your liking
pandas.DataFrame._metadata = ["testmeta"]
df.testmeta = "testmetaval"
df.badmeta = "badmetaval"
newframe = df.copy()
newframe.testmeta -->outputs "testmetaval"
newframe.badmeta ---> raises attribute error

#json test
df.to_json(Path)
revivedjsonframe = pandas.io.json.read_json(Path)
revivedjsonframe.testmeta ---->raises Attribute Error

#hdf5 test
revivedhdf5frame.testmeta ---> returns None

this person https://stackoverflow.com/a/25715719/4473236 says it worked for him but I'm new to this site (and pandas) and can't post to that thread or ask him directly.


回答1:


_metadata is prefaced with an underscore, which means it's not part of the public API. It's not intended for user code -- we might break it in any future version of pandas without warning.

I would strongly recommend against using this "feature". For now, the best option for persisting metadata with a DataFrame is probably to write your own wrapper class and handle the persistence yourself.




回答2:


This is my code which works using python 3.3.3.2 64-bit

In [69]:

df = pd.DataFrame() #make up a frame to your liking
pd.DataFrame._metadata = ["testmeta"]
print(pd.DataFrame._metadata)
df.testmeta = "testmetaval"
df.badmeta = "badmetaval"
newframe = df.copy()
print(newframe.testmeta)
print("newframe", newframe.badmeta)
df.to_json(r'c:\data\test.json')
read_json = pd.read_json(r'c:\data\test.json')
read_json.testmeta
print(pd.version.version)
print(np.version.full_version)
Out[69]:

['testmeta']
testmetaval
newframe badmetaval
0.15.2
1.9.1

JSON contents as df:

In [70]:

read_json
Out[70]:
Empty DataFrame
Columns: []
Index: []
In [71]:

read_json.info()
<class 'pandas.core.frame.DataFrame'>
Float64Index: 0 entries
Empty DataFrame

In [72]:

read_json.testmeta
Out[72]:
'testmetaval'

Strangely the json that is written is just an empty parentheses:

{}

which would indicate that the metadata is actually being propagated by the statement line: pd.DataFrame._metadata = ["testmeta"]

Seems to still work if you overwrite a 2nd atrtibute's metadata:

In [75]:

df.testmeta = 'foo'
df2 = pd.DataFrame()
df2.testmeta = 'bar'
read_json = pd.read_json(r'c:\data\test.json')
print(read_json.testmeta)
print(df2.testmeta)
testmetaval
bar


来源:https://stackoverflow.com/questions/28041762/pandas-metadata-of-dataframe-persistence-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!