Failing to write in hdf5 file

天涯浪子 提交于 2021-02-11 13:59:08

问题


I am trying to create hdf5 file, but the output file is empty.

I have written a python code which is supposed to run in loop and write string in the created datasets. After the file gets saved, I found that the output file is always empty.

Below is the piece of code I have written:

h5_file_name = 'sample.h5'
hf = h5py.File(h5_file_name, 'w')
g1 = hf.create_group('Objects')
dt = h5py.special_dtype(vlen=str)
d1 = g1.create_dataset('D1', (2, 10), dtype=dt)
d2 = g1.create_dataset('D2', (3, 10), dtype=dt)
for i in range(10):
    d1[0][i] = 'Sample'
    d1[1][i] = str(i)
    d2[0][i] = 'Hello'
    d2[1][i] = 'World'
    d2[2][i] = str(i)
hf.close()

The output file is empty as mentioned above.

Can anyone please point out what am I missing here, many thanks in advance !


回答1:


Your code works for me (in an ipython session):

In [1]: import h5py                                                                                    
In [2]: h5_file_name = 'sample.h5' 
   ...: hf = h5py.File(h5_file_name, 'w') 
   ...: g1 = hf.create_group('Objects') 
   ...: dt = h5py.special_dtype(vlen=str) 
   ...: d1 = g1.create_dataset('D1', (2, 10), dtype=dt) 
   ...: d2 = g1.create_dataset('D2', (3, 10), dtype=dt) 
   ...: for i in range(10): 
   ...:     d1[0][i] = 'Sample' 
   ...:     d1[1][i] = str(i) 
   ...:     d2[0][i] = 'Hello' 
   ...:     d2[1][i] = 'World' 
   ...:     d2[2][i] = str(i) 
   ...: hf.close()   

This runs, and creates a file. It is not "empty" in the normal sense. But if by file being empty you mean that it didn't write the words to the file? All that's present is the original ''.

In [4]: hf = h5py.File(h5_file_name, 'r')                                                              
In [5]: hf['Objects/D1']                                                                               
Out[5]: <HDF5 dataset "D1": shape (2, 10), type "|O">
In [6]: hf['Objects/D1'][:]                                                                            
Out[6]: 
array([['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']], dtype=object)

===

The problem isn't with the file setup, but rather with how you are trying to set elements:

In [45]: h5_file_name = 'sample.h5' 
    ...: hf = h5py.File(h5_file_name, 'w') 
    ...: g1 = hf.create_group('Objects') 
    ...: dt = h5py.special_dtype(vlen=str) 
    ...: d1 = g1.create_dataset('D1', (2, 10), dtype=dt) 
    ...: d2 = g1.create_dataset('D2', (3, 10), dtype=dt) 
    ...:                                                                                               
In [46]: d1[:]                                                                                         
Out[46]: 
array([['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']], dtype=object)
In [47]: d1[0][0] = 'sample'                                                                           
In [48]: d1[:]                                                                                         
Out[48]: 
array([['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']], dtype=object)

Use the tuple style of indexing:

In [49]: d1[0, 0] = 'sample'                                                                           
In [50]: d1[:]                                                                                         
Out[50]: 
array([['sample', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']], dtype=object)

With a numpy array d1[0][0]=... works, but that's because d1[0] is a view of d1, but h5py (apparently) does not quite replicate this. d1[0] is a copy, an actual numpy array, not the dataset itself.

Variations on that whole-array indexing:

In [51]: d1[0, :] = 'sample'                                                                           
In [52]: d1[1, :] = np.arange(10)                                                                      
In [53]: d1[:]                                                                                         
Out[53]: 
array([['sample', 'sample', 'sample', 'sample', 'sample', 'sample',
        'sample', 'sample', 'sample', 'sample'],
       ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']], dtype=object)
In [54]: d2[:,0] = ['one','two','three']                                                               
In [55]: d2[:]                                                                                         
Out[55]: 
array([['one', '', '', '', '', '', '', '', '', ''],
       ['two', '', '', '', '', '', '', '', '', ''],
       ['three', '', '', '', '', '', '', '', '', '']], dtype=object)

Verifying the change in type with indexing:

In [64]: type(d1)                                                                                      
Out[64]: h5py._hl.dataset.Dataset
In [65]: type(d1[0])                                                                                   
Out[65]: numpy.ndarray

d1[0][0]='foobar' would change that d1[0] array without affecting the d1 dataset.




回答2:


Not sure how this can be solved using h5py but if you are not bound to a specific library, take a look at HDFql as it is really easy to handle HDF5 files with it.

Using HDFql in Python, your use-case can be solved with the help of hyperslabs as follows:

import HDFql

HDFql.execute("CREATE FILE sample.h5")

HDFql.execute("USE FILE sample.h5")

HDFql.execute("CREATE CHUNKED(1) DATASET objects/D1 AS VARCHAR(10, 2)")

HDFql.execute("CREATE CHUNKED(1) DATASET objects/D2 AS VARCHAR(10, 3)")

for i in range(10):

    HDFql.execute("INSERT INTO objects/D1(%d:::1) VALUES(Sample, %d)" % (i, i))

    HDFql.execute("INSERT INTO objects/D2(%d:::1) VALUES(Hello, World, %d)" % (i, i))

HDFql.execute("CLOSE FILE")

Additional examples on how to use HDFql can be found here.



来源:https://stackoverflow.com/questions/61162318/failing-to-write-in-hdf5-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!