问题
I have selected specific hdf5 datasets and want to copy them to a new hdf5 file. I could find some tutorials on copying between two files, but what if you have just created a new file and you want to copy datasets to the file? I thought the way below would work, but it doesn't. Are there any simple ways to do this?
>>> dic_oldDataset['old_dataset']
<HDF5 dataset "old_dataset": shape (333217,), type "|V14">
>>> new_file = h5py.File('new_file.h5', 'a')
>>> new_file.create_group('new_group')
>>> new_file['new_group']['new_dataset'] = dic_oldDataset['old_dataset']
RuntimeError: Unable to create link (interfile hard links are not allowed)
回答1:
Answer 1 (using h5py):
This creates a simple structured array to populate the first dataset in the first file.
The data is then read from that dataset and copied to the second file using my_array
.
import h5py, numpy as np
arr = np.array([(1,'a'), (2,'b')],
dtype=[('foo', int), ('bar', 'S1')])
print (arr.dtype)
h5file1 = h5py.File('test1.h5', 'w')
h5file1.create_dataset('/ex_group1/ex_ds1', data=arr)
print (h5file1)
my_array=h5file1['/ex_group1/ex_ds1']
h5file2 = h5py.File('test2.h5', 'w')
h5file2.create_dataset('/exgroup2/ex_ds2', data=my_array)
print (h5file2)
h5file1.close()
h5file2.close()
回答2:
Answer 2 (using pytables):
This follows the same process as above with pytables functions. It creates the same simple structured array to populate the first dataset in the first file. The data is then read from that dataset and copied to the second file using my_array
.
import tables, numpy as np
arr = np.array([(1,'a'), (2,'b')],
dtype=[('foo', int), ('bar', 'S1')])
print (arr.dtype)
h5file1 = tables.open_file('test1.h5', mode = 'w', title = 'Test file')
my_group = h5file1.create_group('/', 'ex_group1', 'Example Group')
my_table = h5file1.create_table(my_group, 'ex_ds1', None, 'Example dataset', obj=arr)
print (h5file1)
my_array=my_table.read()
h5file2 = tables.open_file('test2.h5', mode = 'w', title = 'Test file')
h5file2.create_table('/exgroup2', 'ex_ds2', createparents=True, obj=my_array)
print (h5file2)
h5file1.close()
h5file2.close()
来源:https://stackoverflow.com/questions/53455713/how-to-copy-a-dataset-object-to-a-different-hdf5-file-using-pytables-or-h5py