I want to get dataset with format {N, 16, 512, 128} as 4D numpy array from hdf5 file. N is a number of 3D arrays with {16, 512, 128} format. I try to do this:
import os
import sys
import h5py as h5
import numpy as np
import subprocess
import re
file_name = sys.argv[1]
path = sys.argv[2]
f = h5.File(file_name, 'r')
data = f[path]
print(data.shape) #{27270, 16, 512, 128}
print(data.dtype) #"<u4"
data = np.array(data, dtype=np.uint32)
Unfortunately, after data = np.array(data, dtype=np.uint32)
command it seems that code crashed because nothing happened after.
I need to retrieve this dataset as a numpy array or maybe somthng similiar for further calculations. If you have any suggestion, let me know.
I have no problems writing/fetching <u4
and np.uint32
In [14]: import h5py
In [15]: f=h5py.File('u4.h5','w')
In [16]: ds = f.create_dataset('data', dtype='<u4', shape=(10,))
In [17]: ds
Out[17]: <HDF5 dataset "data": shape (10,), type "<u4">
In [18]: ds[:]
Out[18]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)
In [19]: ds[:] = np.arange(-5,5)
In [20]: ds
Out[20]: <HDF5 dataset "data": shape (10,), type "<u4">
In [21]: ds[:]
Out[21]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4], dtype=uint32)
In [22]: np.array(ds, dtype='uint32')
Out[22]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4], dtype=uint32)
In [23]: f.close()
You could be hitting a memory limit. I get a memory error when trying to create an array of that size:
In [24]: np.zeros((27270, 16, 512, 128),np.uint32);
MemoryError Traceback (most recent call last)
<ipython-input-24-2cfe704044b6> in <module>
----> 1 np.zeros((27270, 16, 512, 128),np.uint32);
MemoryError: Unable to allocate 107. GiB for an array with shape (27270, 16, 512, 128) and data type uint32
You might still be able to load a slice of data
, e.g. data[0:100]
Turns out you don't even need to reshape. Here's an example of accessing a dataset then slicing to get the array. I think it's exactly what you want.
Edit 30-April-2020
The OP was in regards to uint32. My initial answer used an array of floats. It demonstrated the desired behavior. For completeness, I made a slight modification to create the dataset from an uint32 integer array.
Note: I used a0=100
. The HDF5 file it creates is 840 MB for floats and 416 MB for uint32. Multiply that by 273 for a0=27270
. I don't have nearly enough RAM to create that in one shot. The code below shows the process.
(Note: The dataset was created with maxshape=None
for axis=0 to allow for expansion. If you are interested in testing larger datasets, you can modify this example by adding a loop to create more data and add to the end of the dataset.)
import numpy as np
import h5py
a0 = 27270
a0 = 100
a1= 16
a2 = 512
a3 = 128
f_arr = np.random.rand(a0*a1*a2*a3).reshape(a0, a1, a2, a3)
i_arr = np.random.randint(0,254, (a0, a1, a2, a3), dtype=np.uint32 )
with h5py.File('SO_61508870.h5', mode='w') as h5w:
h5f.create_dataset('array1', data=i_arr, maxshape=(None, a1, a2, a3) )
with h5py.File('SO_61508870.h5', mode='r') as h5r:
data_ds = h5r['array1']
print ('dataset shape:', data_ds.shape)
for i in range(5):
sliced_arr = data_ds[i,:,:,:]
print ('array shape:', sliced_arr.shape)