hdf5

Converting CSV file to HDF5 using pandas

不打扰是莪最后的温柔 提交于 2021-02-08 11:01:05
问题 When i use pandas to convert csv files to hdf5 files the resulting file is extremely large. For example a test csv file (23 columns, 1.3 million rows) of 170Mb results in an hdf5 file of 2Gb. However if pandas is bypassed and the hdf5 file is directly written (using pytables) it is only 20Mb. In the following code (that is used to do the conversion in pandas) the values of the object columns in the dataframe are explicitly converted to string objects (to prevent pickling): # Open the csv file

Python 2.7: Appending Data to Table in Pandas

假如想象 提交于 2021-02-08 09:29:14
问题 I am reading data from image files and I want to append this data into a single HDF file. Here is my code: datafile = pd.HDFStore(os.path.join(path,'imageData.h5')) for file in fileList: data = {'X Position' : pd.Series(xpos, index=index1), 'Y Position' : pd.Series(ypos, index=index1), 'Major Axis Length' : pd.Series(major, index=index1), 'Minor Axis Length' : pd.Series(minor, index=index1), 'X Velocity' : pd.Series(xVelocity, index=index1), 'Y Velocity' : pd.Series(yVelocity, index=index1) }

HDF5 file (h5py) with version control - hash changes on every save

自作多情 提交于 2021-02-07 13:49:39
问题 I am using h5py to store intermediate data from numerical work in an HDF5 file. I have the project under version control, but this doesn't work well with the HDF5 files because every time a script is re-run which generates a HDF5 file, the binary file changes even if the data within does not. Here is a small example to illustrate this: In [1]: import h5py, numpy as np In [2]: A = np.arange(5) In [3]: f = h5py.File('test.h5', 'w'); f['A'] = A; f.close() In [4]: !md5sum test.h5

HDF5 file (h5py) with version control - hash changes on every save

故事扮演 提交于 2021-02-07 13:49:06
问题 I am using h5py to store intermediate data from numerical work in an HDF5 file. I have the project under version control, but this doesn't work well with the HDF5 files because every time a script is re-run which generates a HDF5 file, the binary file changes even if the data within does not. Here is a small example to illustrate this: In [1]: import h5py, numpy as np In [2]: A = np.arange(5) In [3]: f = h5py.File('test.h5', 'w'); f['A'] = A; f.close() In [4]: !md5sum test.h5

Permission denied while starting HDF5 library on Python

喜你入骨 提交于 2021-02-07 12:47:42
问题 I'am working with Unix and I need to use HDF5 to store data. According to HDF5's guide, http://docs.h5py.org/en/latest/quick.html#quick, one should start by creating a new file. import h5py import numpy as np f = h5py.File("mytestfile.hdf5", "w") However, as soon as I run this code I get a weird error. IOError: Unable to create file (Unable to open file: name = 'mytestfile.hdf5', errno = 13, error message = 'permission denied', flags = 13, o_flags = 602) I don't get the meaning of the error.

save multiple pd.DataFrames with hierarchy to hdf5

◇◆丶佛笑我妖孽 提交于 2021-02-07 09:48:30
问题 I have multiple pd.DataFrames which have hierarchical organization. Let's say I have: day_temperature_london_df = pd.DataFrame(...) night_temperature_london_df = pd.DataFrame(...) day_temperature_paris_df = pd.DataFrame(...) night_temperature_paris_df = pd.DataFrame(...) And I want to group them into hdf5 file so two of them go to group 'london' and two of others go to 'paris'. If I use h5py I lose the format of the pd.DataFrame , lose indexes and columns. f = h5py.File("temperature.h5", "w")

HDF5: How to append data to a dataset (extensible array)

穿精又带淫゛_ 提交于 2021-02-07 08:44:55
问题 By following this tutorial, I've tried to extend my HDF5 dataset. The code is the following, however the data is not properly written to the dataset (the dataset has the proper final size but contains only zeros). The only difference from the tutorial is that I have to use dynamic arrays. Any idea? int main() { hsize_t dims[1], max_dims[1], newdims[1], chunk_dims[1], offset[1]; hid_t file, file_space, plist, dataset, mem_space; int32_t *buffer1, *buffer2; file = H5Fcreate("test.h5", H5F_ACC

HDF5: How to append data to a dataset (extensible array)

牧云@^-^@ 提交于 2021-02-07 08:43:35
问题 By following this tutorial, I've tried to extend my HDF5 dataset. The code is the following, however the data is not properly written to the dataset (the dataset has the proper final size but contains only zeros). The only difference from the tutorial is that I have to use dynamic arrays. Any idea? int main() { hsize_t dims[1], max_dims[1], newdims[1], chunk_dims[1], offset[1]; hid_t file, file_space, plist, dataset, mem_space; int32_t *buffer1, *buffer2; file = H5Fcreate("test.h5", H5F_ACC

compressed files bigger in h5py

蓝咒 提交于 2021-02-07 07:29:57
问题 I'm using h5py to save numpy arrays in HDF5 format from python. Recently, I tried to apply compression and the size of the files I get is bigger... I went from things (every file has several datasets) like this self._h5_current_frame.create_dataset( 'estimated position', shape=estimated_pos.shape, dtype=float, data=estimated_pos) to things like this self._h5_current_frame.create_dataset( 'estimated position', shape=estimated_pos.shape, dtype=float, data=estimated_pos, compression="gzip",

reading nested .h5 group into numpy array

戏子无情 提交于 2021-02-06 11:53:53
问题 I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found many questions and answers here about reading these files but I couldn't find a way to get to lower level of the groups or folders the file contains. The file contains two main folders, i.e. X and Y X contains a folder named 0 which contains two folders named A and B. Y contains ten folders named 1-10. The data I want to