I\'m trying to write data from a Pandas dataframe into a nested hdf5 file, with multiple groups and datasets within each group. I\'d like to keep it as a single file which w
df.to_hdf() expects a string as a key
parameter (second parameter):
key : string
identifier for the group in the store
so try this:
df.to_hdf('database.h5', ds.name, table=True, mode='a')
where ds.name
should return you a string (key name):
In [26]: ds.name
Out[26]: '/A1'
I thought to have a go with pandas\pytables and the HDFStore class instead of h5py. So I tried the following
import numpy as np
import pandas as pd
db = pd.HDFStore('Database.h5')
index = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=['Col1', 'Col2', 'Col3'])
groups = ['A','B','C']
i = 1
for m in groups:
subgroups = ['d','e','f']
for n in subgroups:
db.put(m + '/' + n, df, format = 'table', data_columns = True)
It works, 9 groups (groups instead of datasets in pyatbles instead fo h5py?) created from A/d to C/f. Columns and indexes preserved and can do the dataframe operations I need. Still wondering though whether this is an efficient way to retrieve data from a specific group which will become huge in the the future i.e. operations like
db['A/d'].Col1[4:]