How in python 3.6 to get data array from hdf5 file if dtype is “<u4”?

会有一股神秘感。 提交于 2021-01-29 16:21:29

问题


I want to get dataset with format {N, 16, 512, 128} as 4D numpy array from hdf5 file. N is a number of 3D arrays with {16, 512, 128} format. I try to do this:

import os
import sys
import h5py as h5
import numpy as np
import subprocess
import re

file_name = sys.argv[1]
path = sys.argv[2]

f = h5.File(file_name, 'r')
data = f[path]
print(data.shape) #{27270, 16, 512, 128}
print(data.dtype) #"<u4"

data = np.array(data, dtype=np.uint32)
print(data.shape)

Unfortunately, after data = np.array(data, dtype=np.uint32) command it seems that code crashed because nothing happened after.

I need to retrieve this dataset as a numpy array or maybe somthng similiar for further calculations. If you have any suggestion, let me know.


回答1:


I have no problems writing/fetching <u4 and np.uint32:

In [14]: import h5py                                                                                   
In [15]: f=h5py.File('u4.h5','w')                                                                      
In [16]: ds = f.create_dataset('data', dtype='<u4', shape=(10,))                                       
In [17]: ds                                                                                            
Out[17]: <HDF5 dataset "data": shape (10,), type "<u4">
In [18]: ds[:]                                                                                         
Out[18]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)
In [19]: ds[:] = np.arange(-5,5)                                                                       
In [20]: ds                                                                                            
Out[20]: <HDF5 dataset "data": shape (10,), type "<u4">
In [21]: ds[:]                                                                                         
Out[21]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4], dtype=uint32)
In [22]: np.array(ds, dtype='uint32')                                                                  
Out[22]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4], dtype=uint32)
In [23]: f.close()     

You could be hitting a memory limit. I get a memory error when trying to create an array of that size:

In [24]: np.zeros((27270, 16, 512, 128),np.uint32);                                                    
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-24-2cfe704044b6> in <module>
----> 1 np.zeros((27270, 16, 512, 128),np.uint32);

MemoryError: Unable to allocate 107. GiB for an array with shape (27270, 16, 512, 128) and data type uint32

You might still be able to load a slice of data, e.g. data[0:100].




回答2:


Turns out you don't even need to reshape. Here's an example of accessing a dataset then slicing to get the array. I think it's exactly what you want.

Edit 30-April-2020
The OP was in regards to uint32. My initial answer used an array of floats. It demonstrated the desired behavior. For completeness, I made a slight modification to create the dataset from an uint32 integer array.
Note: I used a0=100. The HDF5 file it creates is 840 MB for floats and 416 MB for uint32. Multiply that by 273 for a0=27270. I don't have nearly enough RAM to create that in one shot. The code below shows the process.

(Note: The dataset was created with maxshape=None for axis=0 to allow for expansion. If you are interested in testing larger datasets, you can modify this example by adding a loop to create more data and add to the end of the dataset.)

import numpy as np
import h5py

a0 = 27270
a0 = 100
a1= 16
a2 = 512
a3 = 128

f_arr = np.random.rand(a0*a1*a2*a3).reshape(a0, a1, a2, a3)
i_arr = np.random.randint(0,254, (a0, a1, a2, a3), dtype=np.uint32 )

with h5py.File('SO_61508870.h5', mode='w') as h5w:
     h5f.create_dataset('array1', data=i_arr, maxshape=(None, a1, a2, a3) )

with h5py.File('SO_61508870.h5', mode='r') as h5r:
     data_ds = h5r['array1']
     print ('dataset shape:', data_ds.shape)
     for i in range(5):
         sliced_arr = data_ds[i,:,:,:]
         print ('array shape:', sliced_arr.shape)


来源:https://stackoverflow.com/questions/61508870/how-in-python-3-6-to-get-data-array-from-hdf5-file-if-dtype-is-u4

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!