问题
I know that in c
we can construct a compound dataset easily using struct
type and assign data chunk by chunk. I am currently implementing a similar structure in Python
with h5py
.
import h5py
import numpy as np
# we create a h5 file
f = h5py.File("test.h5") # default is mode "a"
# We define a compound datatype using np.dtype
dt_type = np.dtype({"names":["image","feature"],
"formats":[('<f4',(4,4)),('<f4',(10,))]})
# we define our dataset with 5 instances
a = f.create_dataset("test", shape=(5,), dtype=dt_type)
To write data, we can do this...
# "feature" array is 1D
a['feature']
output is
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
# Write 1s to data field "feature"
a["feature"] = np.ones((5,10))
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)
The problem is when I wrote 2D array "image" into file.
a["image"] = np.ones((5,4,4))
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
I read the documentation and did research. Unfortunately, I did not find a good solution. I understand that we apply group/dataset
to mimic this compound data but I really want to keep this structure. Is there a good way to do this?
Any help would be appreciated. Thank you.
回答1:
You can use PyTables (aka tables) to populate your HDF5 file with the desired arrays. You should think of each row as an independent entry (defined by a dtype). So, the 'image' array is stored as 5 (4x4) ndarrays, not a single (5x4x4) ndarray. The same goes for the 'feature' array.
This example adds each 'feature' and 'image' array one row at a time. Alternately, you can create a numpy record array with both arrays with data for multiple rows, then add with a Table.append() function.
See code below to create the file, then open read only to check the data.
import tables as tb
import numpy as np
# open h5 file for writing
with tb.File('test1_tb.h5','w') as h5f:
# define a compound datatype using np.dtype
dt_type = np.dtype({"names":["feature","image"],
"formats":[('<f4',(10,)) , ('<f4',(4,4)) ] })
# create empty table (dataset)
a = h5f.create_table('/', "test1", description=dt_type)
# create dataset row interator
a_row = a.row
# create array data and append to dataset
for i in range(5):
a_row['feature'] = i*np.ones(10)
a_row['image'] = np.random.random(4*4).reshape(4,4)
a_row.append()
a.flush()
# open h5 file read only and print contents
with tb.File('test1_tb.h5','r') as h5fr:
a = h5fr.get_node('/','test1')
print (a.coldtypes)
print ('# of rows:',a.nrows)
for row in a:
print (row['feature'])
print (row['image'])
回答2:
This blogpost has helped me with this issue: https://www.christopherlovell.co.uk/blog/2016/04/27/h5py-intro.html
The key code for writing a compound dataset:
import numpy as np
import h5py
# Load your dataset into numpy
audio = np.load(path.join(root_dir, 'X_dev.npy')).astype(np.float32)
text = np.load(path.join(root_dir, 'T_dev.npy')).astype(np.float32)
gesture = np.load(path.join(root_dir, 'Y_dev.npy')).astype(np.float32)
# open a hdf5 file
hf = h5py.File(root_dir+"/dev.hdf5", 'a')
# create group
g1 = hf.create_group('dev')
# put dataset in subgroups
g1.create_dataset('audio', data=audio)
g1.create_dataset('text', data=text)
g1.create_dataset('gesture', data=gesture)
# close the hdf5 file
hf.close()
来源:https://stackoverflow.com/questions/57667412/how-to-write-data-to-a-compound-data-using-h5py