Can I store my own class object into hdf5?

前端未结

关注

 1  474

I have a class like this:

class C:
     def __init__(self, id, user_id, photo):
         self.id = id
         self.user_id = user_id
         self.photo =


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2021-01-16 07:20
              
            
            
                                                                       
Although you can store the whole data structure in a single HDF5 table, it is probably much easier to store the described class as three separate variables - two 1D arrays of integers and a data structure for storing your 'photo' attribute.

If you care about file size and speed and do not care about human-readability of your files, you can model your 64 bool values either as 8 1D arrays of UINT8 or a 2D array N x 8 of UINT8 (or CHARs). Then, you can implement a simple interface that would pack your bool values into bits of UINT8 and back (e.g., How to convert a boolean array to an int array)

As far as know, there are no built-in search functions in HDF5, but you can read in the variable containing user_ids and then simply use Python to find indexes of all elements matching your user_id.

Once you have the indexes, you can read in the relevant slices of your other variables. HDF5 natively supports efficient slicing, but it works on ranges, so you might want to think how to store records with the same user_id in continuous chunks, see discussion over here 

h5py: Correct way to slice array datasets

You might also want to look into pytables - a python interace that builds over hdf5 to store data in table-like strucutres.

import numpy as np
import h5py


class C:
    def __init__(self, id, user_id, photo):
        self.id = id
        self.user_id = user_id
        self.photo = photo

def write_records(records, file_out):

    f = h5py.File(file_out, "w")

    dset_id = f.create_dataset("id", (1000000,), dtype='i')
    dset_user_id = f.create_dataset("user_id", (1000000,), dtype='i')
    dset_photo = f.create_dataset("photo", (1000000,8), dtype='u8')
    dset_id[0:len(records)] = [r.id for r in records]
    dset_user_id[0:len(records)] = [r.user_id for r in records]
    dset_photo[0:len(records)] = [np.packbits(np.array(r.photo, dtype='bool').astype(int)) for r in records]
    f.close()

def read_records_by_id(file_in, record_id):
    f = h5py.File(file_in, "r")
    dset_id = f["id"]
    data = dset_id[0:2]
    res = []
    for idx in np.where(data == record_id)[0]:
        record = C(f["id"][idx:idx+1][0], f["user_id"][idx:idx+1][0], np.unpackbits( np.array(f["photo"][idx:idx+1][0],  dtype='uint8') ).astype(bool))
        res.append(record)
    return res 

m = [ True, False,  True,  True, False,  True,  True,  True]
m = m+m+m+m+m+m+m+m
records = [C(1, 3, m), C(34, 53, m)]

# Write records to file
write_records(records, "mytestfile.h5")

# Read record from file
res = read_records_by_id("mytestfile.h5", 34)

print res[0].id
print res[0].user_id
print res[0].photo

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复