How to write a Pandas Dataframe into a HDF5 dataset

后端 未结 2 528
既然无缘
既然无缘 2021-01-12 11:59

I\'m trying to write data from a Pandas dataframe into a nested hdf5 file, with multiple groups and datasets within each group. I\'d like to keep it as a single file which w

相关标签:
2条回答
  • 2021-01-12 12:21

    df.to_hdf() expects a string as a key parameter (second parameter):

    key : string

    identifier for the group in the store

    so try this:

    df.to_hdf('database.h5', ds.name, table=True, mode='a')
    

    where ds.name should return you a string (key name):

    In [26]: ds.name
    Out[26]: '/A1'
    
    0 讨论(0)
  • 2021-01-12 12:26

    I thought to have a go with pandas\pytables and the HDFStore class instead of h5py. So I tried the following

    import numpy as np
    import pandas as pd
    
    db = pd.HDFStore('Database.h5')
    
    index = pd.date_range('1/1/2000', periods=8)
    
    df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=['Col1', 'Col2', 'Col3'])
    
    groups = ['A','B','C']     
    
    i = 1    
    
    for m in groups:
    
        subgroups = ['d','e','f']
    
        for n in subgroups:
    
            db.put(m + '/' + n, df, format = 'table', data_columns = True)
    

    It works, 9 groups (groups instead of datasets in pyatbles instead fo h5py?) created from A/d to C/f. Columns and indexes preserved and can do the dataframe operations I need. Still wondering though whether this is an efficient way to retrieve data from a specific group which will become huge in the the future i.e. operations like

    db['A/d'].Col1[4:]
    
    0 讨论(0)
提交回复
热议问题