I\'m running into trouble reading a hdf5 matlab 7.3 file with Python. I\'m using h5py 2.0.1.
I can read all the matrices that are stored in the file, but I can not r
I assume you mean it is a cell array of strings in MATLAB? This output looks normal: the dataset is an array of objects (|O4
is the NumPy object datatype). Each object is an array of 2-byte integers (
The easiest way to get the strings out would be to use an iterator using unichr to convert the characters, like this:
strlist = [u''.join(unichr(c) for c in h5file[obj_ref]) for obj_ref in dataset])
What this does is iterate over the dataset (for obj_ref in dataset
) to create a new list. For each object reference, it dereferences the object (h5file[obj_ref]
) to get an array of integers. It converts each integer into a character (unichr(c)
) and joins those characters all together into a Unicode string (u''.join()
).
Note that this produces a list of unicode strings. If you are absolutely sure that every string contains only ASCII characters, you can replace u''
by ''
and unichr
by chr
.
Caveat: I don't have h5py; this post is based on my experiences with MATLAB and NumPy. You may need to adjust the syntax or iteration order to suite your dataset.