I am using h5py to build a dataset. Since I want to store arrays with different #of rows dimension, I use the h5py special_type vlen. However, I experience behavior I can\'t
I think
train_targets[0] = test
has stored your (11,5)
array as an F
ordered array in a row of train_targets
. According to the (9549,5)
shape, that's a row of 5 elements. And since it is vlen
, each element is a 1d array of length 11.
That's what you get back in train_targets[0]
- an array of 5 arrays, each shape (11,)
, with values taken from test
(order F).
So I think there are 2 issues - what a 2d shape means, and what vlen allows.
My version of h5py
is pre v2.3, so I only get string vlen. But I suspect your problem may be that vlen
only works with 1d arrays, an extension, so to speak, of byte strings.
Does the 5
in shape=(9549, 5,)
have anything to do with 5
in the test.shape
? I don't think it does, at least not as numpy
and h5py
see it.
When I make a file following the string vlen example:
>>> f = h5py.File('foo.hdf5')
>>> dt = h5py.special_dtype(vlen=str)
>>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)
and then do:
ds[0]='this one string'
and look at ds[0]
, I get an object array with 100 elements, each being this string. That is, I've set a whole row of ds
.
ds[0,0]='another'
is the correct way to set just one element.
vlen
is 'variable length', not 'variable shape'. While the https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html documentation is not entirely clear on this, I think you can store 1d arrays with shape (11,)
and (38,)
with vlen
, but not 2d ones.
Actually, train_targets
output is reproduced with:
In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
test1[i]=test.T.flatten()[i:i+11]
It's 11 values taken from the transpose (F order), but shifted for each sub array.