Inexplicable behavior when using vlen with h5py

前端 未结 1 1398
北海茫月
北海茫月 2020-12-07 03:47

I am using h5py to build a dataset. Since I want to store arrays with different #of rows dimension, I use the h5py special_type vlen. However, I experience behavior I can\'t

相关标签:
1条回答
  • 2020-12-07 04:35

    I think

    train_targets[0] = test
    

    has stored your (11,5) array as an F ordered array in a row of train_targets. According to the (9549,5) shape, that's a row of 5 elements. And since it is vlen, each element is a 1d array of length 11.

    That's what you get back in train_targets[0] - an array of 5 arrays, each shape (11,), with values taken from test (order F).

    So I think there are 2 issues - what a 2d shape means, and what vlen allows.


    My version of h5py is pre v2.3, so I only get string vlen. But I suspect your problem may be that vlen only works with 1d arrays, an extension, so to speak, of byte strings.

    Does the 5 in shape=(9549, 5,) have anything to do with 5 in the test.shape? I don't think it does, at least not as numpy and h5py see it.

    When I make a file following the string vlen example:

    >>> f = h5py.File('foo.hdf5')
    >>> dt = h5py.special_dtype(vlen=str)
    >>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)
    

    and then do:

    ds[0]='this one string'
    

    and look at ds[0], I get an object array with 100 elements, each being this string. That is, I've set a whole row of ds.

    ds[0,0]='another'
    

    is the correct way to set just one element.

    vlen is 'variable length', not 'variable shape'. While the https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html documentation is not entirely clear on this, I think you can store 1d arrays with shape (11,) and (38,) with vlen, but not 2d ones.


    Actually, train_targets output is reproduced with:

    In [54]: test1=np.empty((5,),dtype=object)
    In [55]: for i in range(5):
        test1[i]=test.T.flatten()[i:i+11]
    

    It's 11 values taken from the transpose (F order), but shifted for each sub array.

    0 讨论(0)
提交回复
热议问题