Iteratively writing to HDF5 Stores in Pandas

后端 未结 2 1911
别跟我提以往
别跟我提以往 2020-12-23 07:48

Pandas has the following examples for how to store Series, DataFrames and Panelsin HDF5 files:

Prepare some data:

         


        
2条回答
  •  生来不讨喜
    2020-12-23 08:27

    1. As soon as the statement is exectued, eg store['df'] = df. The close just closes the actual file (which will be closed for you if the process exists, but will print a warning message)

    2. Read the section http://pandas.pydata.org/pandas-docs/dev/io.html#storing-in-table-format

      It is generally not a good idea to put a LOT of nodes in an .h5 file. You probably want to append and create a smaller number of nodes.

      You can just iterate thru your .csv and store/append them one by one. Something like:

      for f in files:
        df = pd.read_csv(f)
        df.to_hdf('file.h5',f,df)
      

      Would be one way (creating a separate node for each file)

    3. Not appendable - once you write it, you can only retrieve it all at once, e.g. you cannot select a sub-section

      If you have a table, then you can do things like:

      pd.read_hdf('my_store.h5','a_table_node',['index>100'])
      

      which is like a database query, only getting part of the data

      Thus, a store is not appendable, nor queryable, while a table is both.

提交回复
热议问题