pytables add repetitive subclass as column

问题

I am creating a HDF5 file with strict parameters. It has 1 table consisting of variable columns. At one point the columns become repetitive with the different data being appended. Apparently, I can't add loop inside IsDescription class. Currently the class Segments has been added under class Summary_data twice. I need to call segments_k 70 times. What is the best approach to it? Thank you.

class Header(IsDescription):
    _v_pos    = 1
    id        = Int16Col(dflt=1, pos = 0)
    timestamp = Int16Col(dflt=1, pos = 1)

class Segments(IsDescription):
    segment_id      = Int16Col(dflt=1, pos = 0)
    segment_quality = Float32Col(dflt=1, pos = 1)
    segment_length  = Float32Col(dflt=1, pos = 2)

class Summary_data(IsDescription):
    latency     = Float32Col(dflt=1, pos = 2)
    segments_k  = Int16Col(dflt=1, pos = 4)
    segments_k0 = Segments()
    segments_k1 = Segments()

class Everything(IsDescription):
    header       = Header()
    summary_data = Summary_data()
    
def write_new_file():
    h5file = "results.hdf5"
    with open_file(h5file, mode = "w") as f:
        root    = f.root
        table1  = f.create_table(root, "Table1", Everything)
        row     = table1.row
        length  = [[23.5, 16.3], [8, 6]]
        quality = [[0.9, 0.7], [0.6, 0.4]]
        for i in range(2):
            row['header/id'] = i
            row['header/timestamp'] = i * 2.
            row['summary_data/latency'] = 0.0
            row['summary_data/segments_k'] = 0

            for data in range(2):
                row['summary_data/segments_k'+str(data)+'/segment_id'] = data
                row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
                row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
            row.append()

回答1:

Ok, I think I understand, and will attempt to explain how I did this (and how to extend to handle all 70 segments). As an aside, your nested fields are exceedingly complex, far more complicated than anything I've seen. Are you sure you need this many levels of nested fields?

The key is using a np.dtype() to define the table description. I always use them to define my tables, not the IsDescription method. (I use NumPy to process my HDF5 data, so I'm comfortable with the module.) In your case, you need a dytpe because it is the only way I know to create your complex table structure with code. Otherwise you will be creating IsDescription entires for hours. :-)

The code below uses 3 different methods to create 3 tables (schema and data in each table should be identical). An explanation for each:

Table 1: is created with your code. It uses theIsDescription method to create 3 summary_data/segments_k# entries. (I added segments_k2 = Segments() to class Summary_data() ). Note this line of code: print (tb.description.dtype_from_descr(Everything) ). It prints the equivalent np.dtype for Everything description used by Table1. I referenced this for Tables 2 and 3 below.
Table 2 description references np.dtype tb2_dt. I copied/pasted this from the previous output. I could have referenced as a variable, but I want you to see it to understand what I did for Table 3. Code to populate the table is the same as Table 1.
Table 3 description references np.dtype tb3_dt. This is where it things get tricky. The np.dtype structure is COMPLICATED: it is a list of tuples and tuples of lists. The dtype is built from seg_kn_list and tb3_dt_list. Code to populate the table is the same as Table 1 and 2.

To get this to work for 70 segments, "all" you have to do is change the 2 range(3) arguments that create seg_kn_tlist and populate the data rows. (Of course, you also need to provide the data.)

Code below:

    import tables as tb
    import numpy as np

    h5file = "SO_64449277np.h5"
    with tb.open_file(h5file, mode = "w") as h5f:
        length  = [[23.5, 16.3], [8, 6], [11.0, 7.7]]
        quality = [[0.9, 0.7], [0.6, 0.4], [0.8, 0.5]]

        root    = h5f.root
        table1  = h5f.create_table(root, "Table1", Everything)
        print (tb.description.dtype_from_descr(Everything) )

        row     = table1.row
        for i in range(2):
            row['header/id'] = i
            row['header/timestamp'] = i * 2.
            row['summary_data/latency'] = 0.0
            row['summary_data/segments_k'] = 0

            for data in range(3):
                row['summary_data/segments_k'+str(data)+'/segment_id'] = data
                row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
                row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
            row.append()

        tb2_dt = np.dtype([('header', [('id', '<i2'), ('timestamp', '<i2')]), 
                           ('summary_data', [('latency', '<f4'), ('segments_k', '<i2'), 
                           ('segments_k0', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]), 
                           ('segments_k1', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
                           ('segments_k2', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
                           ])] )

        table2  = h5f.create_table(root, "Table2", tb2_dt)
        row     = table2.row
        for i in range(2):
            row['header/id'] = i
            row['header/timestamp'] = i * 2.
            row['summary_data/latency'] = 0.0
            row['summary_data/segments_k'] = 0

            for data in range(3):
                row['summary_data/segments_k'+str(data)+'/segment_id'] = data
                row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
                row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
            row.append()

# Create np.dtype() iteratively
# Start with laency and segments_k, and use a loop to add segments_k# id, quality and length
            
        seg_kn_tlist = [('latency', '<f4'), ('segments_k', '<i2') ]
        for cnt in range(3) :            
            seg_kn_tlist.append( ('segments_k'+str(cnt), 
                                [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')] ) ) 
 
# Finish np.dtype() definition with fileds for header, timestamp and summary_data, followed by tuple with list above         
        tb3_dt_list = [ ('header', [('id', '<i2'), ('timestamp', '<i2')]), ('summary_data', seg_kn_tlist) ]
        
        tb3_dt = np.dtype( tb3_dt_list ) 

        table3  = h5f.create_table(root, "Table3", tb3_dt)
        row     = table3.row
        for i in range(2):
            row['header/id'] = i
            row['header/timestamp'] = i * 2.
            row['summary_data/latency'] = 0.0
            row['summary_data/segments_k'] = 0

            for data in range(3):
                row['summary_data/segments_k'+str(data)+'/segment_id'] = data
                row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
                row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
            row.append()

回答2:

Sorry, I'm not following your explanation in the comments. Below is an screen grab from HDFView showing the schema/table layout. As I see it, segments_k0{0} and segments_k1{0} are nested under summary_data. See how it shows summary_data->segments_k0{0}. Is that what you want? 70 nested columns with each with 3 nested columns (segment_id, segment_quality and segment_length)?

Or, do you just want segments_k0{0} nested under the 0 row.? I created a second table (Table2) with that schema. See the screen grab below. Notice that segments_k0{0} is not preceded by summary_data-> (it's not nested).

Either of these is possible when you enter the description using an np.dtype(). You can define dtypes with a dictionary and populate the field names and formats programmatically. The second table is easier to define. I want to be sure which you want before I show you how to create the np.dtype().

来源：https://stackoverflow.com/questions/64449277/pytables-add-repetitive-subclass-as-column

标签

python

pandas

hdf5

pytables