Can I store my own class object into hdf5?

前端 未结 1 474
独厮守ぢ
独厮守ぢ 2021-01-16 06:31

I have a class like this:

class C:
     def __init__(self, id, user_id, photo):
         self.id = id
         self.user_id = user_id
         self.photo =          


        
相关标签:
1条回答
  • 2021-01-16 07:20

    Although you can store the whole data structure in a single HDF5 table, it is probably much easier to store the described class as three separate variables - two 1D arrays of integers and a data structure for storing your 'photo' attribute.

    If you care about file size and speed and do not care about human-readability of your files, you can model your 64 bool values either as 8 1D arrays of UINT8 or a 2D array N x 8 of UINT8 (or CHARs). Then, you can implement a simple interface that would pack your bool values into bits of UINT8 and back (e.g., How to convert a boolean array to an int array)

    As far as know, there are no built-in search functions in HDF5, but you can read in the variable containing user_ids and then simply use Python to find indexes of all elements matching your user_id.

    Once you have the indexes, you can read in the relevant slices of your other variables. HDF5 natively supports efficient slicing, but it works on ranges, so you might want to think how to store records with the same user_id in continuous chunks, see discussion over here

    h5py: Correct way to slice array datasets

    You might also want to look into pytables - a python interace that builds over hdf5 to store data in table-like strucutres.

    import numpy as np
    import h5py
    
    
    class C:
        def __init__(self, id, user_id, photo):
            self.id = id
            self.user_id = user_id
            self.photo = photo
    
    def write_records(records, file_out):
    
        f = h5py.File(file_out, "w")
    
        dset_id = f.create_dataset("id", (1000000,), dtype='i')
        dset_user_id = f.create_dataset("user_id", (1000000,), dtype='i')
        dset_photo = f.create_dataset("photo", (1000000,8), dtype='u8')
        dset_id[0:len(records)] = [r.id for r in records]
        dset_user_id[0:len(records)] = [r.user_id for r in records]
        dset_photo[0:len(records)] = [np.packbits(np.array(r.photo, dtype='bool').astype(int)) for r in records]
        f.close()
    
    def read_records_by_id(file_in, record_id):
        f = h5py.File(file_in, "r")
        dset_id = f["id"]
        data = dset_id[0:2]
        res = []
        for idx in np.where(data == record_id)[0]:
            record = C(f["id"][idx:idx+1][0], f["user_id"][idx:idx+1][0], np.unpackbits( np.array(f["photo"][idx:idx+1][0],  dtype='uint8') ).astype(bool))
            res.append(record)
        return res 
    
    m = [ True, False,  True,  True, False,  True,  True,  True]
    m = m+m+m+m+m+m+m+m
    records = [C(1, 3, m), C(34, 53, m)]
    
    # Write records to file
    write_records(records, "mytestfile.h5")
    
    # Read record from file
    res = read_records_by_id("mytestfile.h5", 34)
    
    print res[0].id
    print res[0].user_id
    print res[0].photo
    
    0 讨论(0)
提交回复
热议问题