h5py

Python: Can I write to a file without loading its contents in RAM?

淺唱寂寞╮ 提交于 2021-01-29 07:56:36
问题 Got a big data-set that I want to shuffle. The entire set won't fit into RAM so it would be good if I could open several files (e.g. hdf5, numpy) simultaneously, loop through my data chronologically and randomly assign each data-point to one of the piles (then afterwards shuffle each pile). I'm really inexperienced with working with data in python so I'm not sure if it's possible to write to files without holding the rest of its contents in RAM (been using np.save and savez with little

Adding new data into HDF5 file results an empty array

点点圈 提交于 2021-01-28 19:00:59
问题 While playing with HDF5 package for Python I discovered a strange behavior. I want to insert more data into table. But somehow I cannot get it work properly. As you can see from the source code, I am getting the last row of data in key 'X' using fromRow = hf["X"].shape[0] and writing the tempArray2 afterwards. Result is an empty table. import h5py tempArray1 = [[0.9293237924575806, -0.32789671421051025, 0.18110771477222443], [0.9293237924575806, -0.32789671421051025, 0.18110771477222443], [0

AttributeError: 'int' object has no attribute 'encode' HDF5

夙愿已清 提交于 2021-01-28 05:41:06
问题 I'm trying to open a HDF5 file in Python using the following code: with h5py.File('example.hdf5', 'r') as f: ls = list(f.keys()) dat = f.get('data') dt = np.array(dat) However, I get this error when executing the last line: AttributeError: 'int' object has no attribute 'encode' dat has the following class: h5py._hl.group.Group' Anyone knows where the error could come? The output from iterating inside the file is the following. How can I access inside each part of the file: checking hdf5 file

Is there a way to open hdf5 files with the POSIX_FADV_DONTNEED flag?

隐身守侯 提交于 2021-01-27 20:35:50
问题 We are working with large (1.2TB) uncompressed, unchunked hdf5 files with h5py in python for a machine learning application, which requires us to work through the full dataset repeatedly, loading slices of ~15MB individually in a randomized order. We are working on a linux (Ubuntu 18.04) machine with 192 GB RAM. We noticed that the program is slowly filling the cache. When total size of cache reaches size comparable with full machine RAM (free memory in top almost 0 but plenty ‘available’

Read h5 file using AWS S3 s3fs/boto3

*爱你&永不变心* 提交于 2021-01-27 07:06:49
问题 I am trying to read h5 file from AWS S3. I am getting the following errors using s3fs/boto3. Can you help? Thanks! import s3fs fs = s3fs.S3FileSystem(anon=False, key='key', secret='secret') with fs.open('file', mode='rb') as f: h5 = pd.read_hdf(f) TypeError: expected str, bytes or os.PathLike object, not S3File fs = s3fs.S3FileSystem(anon=False, key='key', secret='secret') with fs.open('file', mode='rb') as f: hf = h5py.File(f) TypeError: expected str, bytes or os.PathLike object, not S3File

h5py randomly unable to open object (component not found)

生来就可爱ヽ(ⅴ<●) 提交于 2021-01-07 02:55:48
问题 I'm trying to load hdf5 datasets into a pytorch training for loop. Regardless of num_workers in dataloader, this randomly throws "KeyError: 'Unable to open object (component not found)' " (traceback below). I'm able to start the training loop, but not able to get through 1/4 of one epoch without this error which happens for random 'datasets' (which are 2darrays each). I'm able to separately load these arrays in the console using the regular f['group/subroup'][()] so it doesn't appear like the

parallel write to different groups with h5py

拟墨画扇 提交于 2020-12-15 06:18:41
问题 I'm trying to use parallel h5py to create an independent group for each process and fill each group with some data.. what happens is that only one group gets created and filled with data. This is the program: from mpi4py import MPI import h5py rank = MPI.COMM_WORLD.Get_rank() f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD) data = range(1000) dset = f.create_dataset(str(rank), data=data) f.close() Any thoughts on what is going wrong here? Thanks alot 回答1: Ok, so as

RuntimeError: Unable to create link (name already exists) when I append hdf5 file?

让人想犯罪 __ 提交于 2020-11-29 03:30:45
问题 I am trying to append the hdf5 dataset to the previous hdf5 dataset following error occured h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 202, in h5py.h5o.link RuntimeError: Unable to create link (name already exists) sal_maps = np.array([], dtype=np.float32).reshape((0,) + img_size) probs = np.array([], dtype=np

RuntimeError: Unable to create link (name already exists) when I append hdf5 file?

时光总嘲笑我的痴心妄想 提交于 2020-11-29 03:30:06
问题 I am trying to append the hdf5 dataset to the previous hdf5 dataset following error occured h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 202, in h5py.h5o.link RuntimeError: Unable to create link (name already exists) sal_maps = np.array([], dtype=np.float32).reshape((0,) + img_size) probs = np.array([], dtype=np