Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

流过昼夜 提交于 2019-12-22 06:37:40

问题


I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new Timestamp is available), replace this with new values from a given pandas DataFrame and append this to the Pytable. Basically, just update a Pytable. I can get the combined DataFrame using the combine_first method in Pandas. Below the Pytable is created with dummy data:

import pandas as pd
import numpy as np
import datetime as dt
index = pd.DatetimeIndex(start = dt.datetime(2001,1,1,0,0), periods = 20000,freq='10T')
data_in_pytable = pd.DataFrame(index=index,data=np.random.randn(20000,2),columns=['value_1','value_2'])
data.to_hdf(r'C:\pytable.h5','test',mode='r+',append=True,complevel=9,complib='zlib')

So the pytable is created. Assuming I have another dataFrame with which I want to update the Pytable with:

new_index = pd.DatetimeIndex(start = dt.datetime(2001,5,1,0,0), periods = 10000,freq='10T')
data_to_update=pd.DataFrame(index=new_index,data=np.random.randn(10000,2),columns=['value_1','value_2'])
store=pd.HDFStore(r'C:\pytable.h5',mode='r+',complevel=9,complib='zlib')
store.append('test',store.select('test').combine_first(data_to_update))
store.close()

The problem is that the PyTable keeps the original values, does not update the existing ones. I now have duplicate entries (by index) because the original values are not overwritten.

Summary: How can I update a PyTable with another DataFrame?

Thanks, Elv


回答1:


In the end, I found it out myself. In my case, when it is okay to overwrite the entire Node as the "combine_first" gets you the original and the new values', it is fine to use

store.put(key,value,table=True,append=False) 

instead of the

store.append(key,value).



回答2:


This is not supported currently. PyTables does support an update method, but is not implemented in pandas.

Easiest method is to either use mode='w' and write a new file or

store.remove(key); store.append(.....)

HDF5 is not a 'regular' database and updating is not a common operation, if you need that SQL might be an option.

Feel free to request update as an enhancement as an issue.



来源:https://stackoverflow.com/questions/17020987/update-pandas-dataframe-in-stored-in-a-pytable-with-another-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!