Loading hdf5 matlab strings into Python

后端 未结 2 866

I\'m running into trouble reading a hdf5 matlab 7.3 file with Python. I\'m using h5py 2.0.1.

I can read all the matrices that are stored in the file, but I can not r

相关标签:
2条回答
  • 2021-01-18 08:32

    You can get the original Matlab class name of Group and Dataset objects by

    dataset.attrs['MATLAB_class']
    

    if dataset contains a string, it will return b'char'.

    0 讨论(0)
  • 2021-01-18 08:33

    I assume you mean it is a cell array of strings in MATLAB? This output looks normal: the dataset is an array of objects (|O4 is the NumPy object datatype). Each object is an array of 2-byte integers (<u2 is the NumPy little-endian unsigned 2-byte integer datatype). h5py has no way of knowing that the dataset is a cell array of strings; it could just as well be a cell array of arbitrary 16-bit integers.

    The easiest way to get the strings out would be to use an iterator using unichr to convert the characters, like this:

    strlist = [u''.join(unichr(c) for c in h5file[obj_ref]) for obj_ref in dataset])
    

    What this does is iterate over the dataset (for obj_ref in dataset) to create a new list. For each object reference, it dereferences the object (h5file[obj_ref]) to get an array of integers. It converts each integer into a character (unichr(c)) and joins those characters all together into a Unicode string (u''.join()).

    Note that this produces a list of unicode strings. If you are absolutely sure that every string contains only ASCII characters, you can replace u'' by '' and unichr by chr.

    Caveat: I don't have h5py; this post is based on my experiences with MATLAB and NumPy. You may need to adjust the syntax or iteration order to suite your dataset.

    0 讨论(0)
提交回复
热议问题