h5py | 易学教程

HDF5 file (h5py) with version control - hash changes on every save

阅读更多关于 HDF5 file (h5py) with version control - hash changes on every save

问题 I am using h5py to store intermediate data from numerical work in an HDF5 file. I have the project under version control, but this doesn't work well with the HDF5 files because every time a script is re-run which generates a HDF5 file, the binary file changes even if the data within does not. Here is a small example to illustrate this: In [1]: import h5py, numpy as np In [2]: A = np.arange(5) In [3]: f = h5py.File('test.h5', 'w'); f['A'] = A; f.close() In [4]: !md5sum test.h5

HDF5 file (h5py) with version control - hash changes on every save

阅读更多关于 HDF5 file (h5py) with version control - hash changes on every save

save multiple pd.DataFrames with hierarchy to hdf5

阅读更多关于 save multiple pd.DataFrames with hierarchy to hdf5

问题 I have multiple pd.DataFrames which have hierarchical organization. Let's say I have: day_temperature_london_df = pd.DataFrame(...) night_temperature_london_df = pd.DataFrame(...) day_temperature_paris_df = pd.DataFrame(...) night_temperature_paris_df = pd.DataFrame(...) And I want to group them into hdf5 file so two of them go to group 'london' and two of others go to 'paris'. If I use h5py I lose the format of the pd.DataFrame , lose indexes and columns. f = h5py.File("temperature.h5", "w")

compressed files bigger in h5py

阅读更多关于 compressed files bigger in h5py

问题 I'm using h5py to save numpy arrays in HDF5 format from python. Recently, I tried to apply compression and the size of the files I get is bigger... I went from things (every file has several datasets) like this self._h5_current_frame.create_dataset( 'estimated position', shape=estimated_pos.shape, dtype=float, data=estimated_pos) to things like this self._h5_current_frame.create_dataset( 'estimated position', shape=estimated_pos.shape, dtype=float, data=estimated_pos, compression="gzip",

reading nested .h5 group into numpy array

阅读更多关于 reading nested .h5 group into numpy array

问题 I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found many questions and answers here about reading these files but I couldn't find a way to get to lower level of the groups or folders the file contains. The file contains two main folders, i.e. X and Y X contains a folder named 0 which contains two folders named A and B. Y contains ten folders named 1-10. The data I want to

TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

阅读更多关于 TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

问题 My model uses pre-processed data to predict if a customer is a private or non-private customer. The pre-processing-step is using steps like feature_column.bucketized_column(…), feature_column.embedding_column(…) and so on. After the training, I am trying to save the model but I get the following error: File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py\h5o.pyx", line 202, in h5py.h5o.link OSError

TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

阅读更多关于 TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

I want to convert very large csv data to hdf5 in python

阅读更多关于 I want to convert very large csv data to hdf5 in python

问题 I have a very large csv data. It looks like this. [Date, Firm name, value 1, value 2, ..., value 60] I want to convert this to a hdf5 file. For example, let's say I have two dates (2019-07-01, 2019-07-02), each date has 3 firms (firm 1, firm 2, firm 3) and each firm has [value 1, value 2, ... value 60]. I want to use date and firm name as a group. Specifically, I want this hierarchy: 'Date/Firm name'. For example, 2019-07-01 has firm 1, firm 2, and firm 3. When you look at each firm, there

How in python 3.6 to get data array from hdf5 file if dtype is “<u4”?

阅读更多关于 How in python 3.6 to get data array from hdf5 file if dtype is “

问题 I want to get dataset with format {N, 16, 512, 128} as 4D numpy array from hdf5 file. N is a number of 3D arrays with {16, 512, 128} format. I try to do this: import os import sys import h5py as h5 import numpy as np import subprocess import re file_name = sys.argv[1] path = sys.argv[2] f = h5.File(file_name, 'r') data = f[path] print(data.shape) #{27270, 16, 512, 128} print(data.dtype) #"<u4" data = np.array(data, dtype=np.uint32) print(data.shape) Unfortunately, after data = np.array(data,

Python: Can I write to a file without loading its contents in RAM?

阅读更多关于 Python: Can I write to a file without loading its contents in RAM?

问题 Got a big data-set that I want to shuffle. The entire set won't fit into RAM so it would be good if I could open several files (e.g. hdf5, numpy) simultaneously, loop through my data chronologically and randomly assign each data-point to one of the piles (then afterwards shuffle each pile). I'm really inexperienced with working with data in python so I'm not sure if it's possible to write to files without holding the rest of its contents in RAM (been using np.save and savez with little