hdf5 | 易学教程

Converting CSV file to HDF5 using pandas

阅读更多关于 Converting CSV file to HDF5 using pandas

问题 When i use pandas to convert csv files to hdf5 files the resulting file is extremely large. For example a test csv file (23 columns, 1.3 million rows) of 170Mb results in an hdf5 file of 2Gb. However if pandas is bypassed and the hdf5 file is directly written (using pytables) it is only 20Mb. In the following code (that is used to do the conversion in pandas) the values of the object columns in the dataframe are explicitly converted to string objects (to prevent pickling): # Open the csv file

Python 2.7: Appending Data to Table in Pandas

阅读更多关于 Python 2.7: Appending Data to Table in Pandas

问题 I am reading data from image files and I want to append this data into a single HDF file. Here is my code: datafile = pd.HDFStore(os.path.join(path,'imageData.h5')) for file in fileList: data = {'X Position' : pd.Series(xpos, index=index1), 'Y Position' : pd.Series(ypos, index=index1), 'Major Axis Length' : pd.Series(major, index=index1), 'Minor Axis Length' : pd.Series(minor, index=index1), 'X Velocity' : pd.Series(xVelocity, index=index1), 'Y Velocity' : pd.Series(yVelocity, index=index1) }

HDF5 file (h5py) with version control - hash changes on every save

阅读更多关于 HDF5 file (h5py) with version control - hash changes on every save

问题 I am using h5py to store intermediate data from numerical work in an HDF5 file. I have the project under version control, but this doesn't work well with the HDF5 files because every time a script is re-run which generates a HDF5 file, the binary file changes even if the data within does not. Here is a small example to illustrate this: In [1]: import h5py, numpy as np In [2]: A = np.arange(5) In [3]: f = h5py.File('test.h5', 'w'); f['A'] = A; f.close() In [4]: !md5sum test.h5

HDF5 file (h5py) with version control - hash changes on every save

阅读更多关于 HDF5 file (h5py) with version control - hash changes on every save

Permission denied while starting HDF5 library on Python

阅读更多关于 Permission denied while starting HDF5 library on Python

问题 I'am working with Unix and I need to use HDF5 to store data. According to HDF5's guide, http://docs.h5py.org/en/latest/quick.html#quick, one should start by creating a new file. import h5py import numpy as np f = h5py.File("mytestfile.hdf5", "w") However, as soon as I run this code I get a weird error. IOError: Unable to create file (Unable to open file: name = 'mytestfile.hdf5', errno = 13, error message = 'permission denied', flags = 13, o_flags = 602) I don't get the meaning of the error.

save multiple pd.DataFrames with hierarchy to hdf5

阅读更多关于 save multiple pd.DataFrames with hierarchy to hdf5

问题 I have multiple pd.DataFrames which have hierarchical organization. Let's say I have: day_temperature_london_df = pd.DataFrame(...) night_temperature_london_df = pd.DataFrame(...) day_temperature_paris_df = pd.DataFrame(...) night_temperature_paris_df = pd.DataFrame(...) And I want to group them into hdf5 file so two of them go to group 'london' and two of others go to 'paris'. If I use h5py I lose the format of the pd.DataFrame , lose indexes and columns. f = h5py.File("temperature.h5", "w")

HDF5: How to append data to a dataset (extensible array)

阅读更多关于 HDF5: How to append data to a dataset (extensible array)

问题 By following this tutorial, I've tried to extend my HDF5 dataset. The code is the following, however the data is not properly written to the dataset (the dataset has the proper final size but contains only zeros). The only difference from the tutorial is that I have to use dynamic arrays. Any idea? int main() { hsize_t dims[1], max_dims[1], newdims[1], chunk_dims[1], offset[1]; hid_t file, file_space, plist, dataset, mem_space; int32_t *buffer1, *buffer2; file = H5Fcreate("test.h5", H5F_ACC

HDF5: How to append data to a dataset (extensible array)

阅读更多关于 HDF5: How to append data to a dataset (extensible array)

compressed files bigger in h5py

阅读更多关于 compressed files bigger in h5py

问题 I'm using h5py to save numpy arrays in HDF5 format from python. Recently, I tried to apply compression and the size of the files I get is bigger... I went from things (every file has several datasets) like this self._h5_current_frame.create_dataset( 'estimated position', shape=estimated_pos.shape, dtype=float, data=estimated_pos) to things like this self._h5_current_frame.create_dataset( 'estimated position', shape=estimated_pos.shape, dtype=float, data=estimated_pos, compression="gzip",

reading nested .h5 group into numpy array

阅读更多关于 reading nested .h5 group into numpy array

问题 I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found many questions and answers here about reading these files but I couldn't find a way to get to lower level of the groups or folders the file contains. The file contains two main folders, i.e. X and Y X contains a folder named 0 which contains two folders named A and B. Y contains ten folders named 1-10. The data I want to