问题
I am creating a HDF5 file with strict parameters. It has 1 table consisting of variable columns. At one point the columns become repetitive with the different data being appended. Apparently, I can't add loop inside IsDescription class. Currently the class Segments has been added under class Summary_data twice. I need to call segments_k 70 times. What is the best approach to it? Thank you.
class Header(IsDescription):
_v_pos = 1
id = Int16Col(dflt=1, pos = 0)
timestamp = Int16Col(dflt=1, pos = 1)
class Segments(IsDescription):
segment_id = Int16Col(dflt=1, pos = 0)
segment_quality = Float32Col(dflt=1, pos = 1)
segment_length = Float32Col(dflt=1, pos = 2)
class Summary_data(IsDescription):
latency = Float32Col(dflt=1, pos = 2)
segments_k = Int16Col(dflt=1, pos = 4)
segments_k0 = Segments()
segments_k1 = Segments()
class Everything(IsDescription):
header = Header()
summary_data = Summary_data()
def write_new_file():
h5file = "results.hdf5"
with open_file(h5file, mode = "w") as f:
root = f.root
table1 = f.create_table(root, "Table1", Everything)
row = table1.row
length = [[23.5, 16.3], [8, 6]]
quality = [[0.9, 0.7], [0.6, 0.4]]
for i in range(2):
row['header/id'] = i
row['header/timestamp'] = i * 2.
row['summary_data/latency'] = 0.0
row['summary_data/segments_k'] = 0
for data in range(2):
row['summary_data/segments_k'+str(data)+'/segment_id'] = data
row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
row.append()
回答1:
Ok, I think I understand, and will attempt to explain how I did this (and how to extend to handle all 70 segments). As an aside, your nested fields are exceedingly complex, far more complicated than anything I've seen. Are you sure you need this many levels of nested fields?
The key is using a np.dtype()
to define the table description. I always use them to define my tables, not the IsDescription
method. (I use NumPy to process my HDF5 data, so I'm comfortable with the module.) In your case, you need a dytpe because it is the only way I know to create your complex table structure with code. Otherwise you will be creating IsDescription
entires for hours. :-)
The code below uses 3 different methods to create 3 tables (schema and data in each table should be identical). An explanation for each:
- Table 1: is created with your code. It uses the
IsDescription
method to create 3summary_data/segments_k#
entries. (I addedsegments_k2 = Segments()
toclass Summary_data()
). Note this line of code:print (tb.description.dtype_from_descr(Everything) )
. It prints the equivalent np.dtype forEverything
description used by Table1. I referenced this for Tables 2 and 3 below. - Table 2 description references np.dtype
tb2_dt
. I copied/pasted this from the previous output. I could have referenced as a variable, but I want you to see it to understand what I did for Table 3. Code to populate the table is the same as Table 1. - Table 3 description references np.dtype
tb3_dt
. This is where it things get tricky. The np.dtype structure is COMPLICATED: it is a list of tuples and tuples of lists. The dtype is built fromseg_kn_list
andtb3_dt_list
. Code to populate the table is the same as Table 1 and 2.
To get this to work for 70 segments, "all" you have to do is change the 2 range(3)
arguments that create seg_kn_tlist
and populate the data rows. (Of course, you also need to provide the data.)
Code below:
import tables as tb
import numpy as np
h5file = "SO_64449277np.h5"
with tb.open_file(h5file, mode = "w") as h5f:
length = [[23.5, 16.3], [8, 6], [11.0, 7.7]]
quality = [[0.9, 0.7], [0.6, 0.4], [0.8, 0.5]]
root = h5f.root
table1 = h5f.create_table(root, "Table1", Everything)
print (tb.description.dtype_from_descr(Everything) )
row = table1.row
for i in range(2):
row['header/id'] = i
row['header/timestamp'] = i * 2.
row['summary_data/latency'] = 0.0
row['summary_data/segments_k'] = 0
for data in range(3):
row['summary_data/segments_k'+str(data)+'/segment_id'] = data
row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
row.append()
tb2_dt = np.dtype([('header', [('id', '<i2'), ('timestamp', '<i2')]),
('summary_data', [('latency', '<f4'), ('segments_k', '<i2'),
('segments_k0', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
('segments_k1', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
('segments_k2', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
])] )
table2 = h5f.create_table(root, "Table2", tb2_dt)
row = table2.row
for i in range(2):
row['header/id'] = i
row['header/timestamp'] = i * 2.
row['summary_data/latency'] = 0.0
row['summary_data/segments_k'] = 0
for data in range(3):
row['summary_data/segments_k'+str(data)+'/segment_id'] = data
row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
row.append()
# Create np.dtype() iteratively
# Start with laency and segments_k, and use a loop to add segments_k# id, quality and length
seg_kn_tlist = [('latency', '<f4'), ('segments_k', '<i2') ]
for cnt in range(3) :
seg_kn_tlist.append( ('segments_k'+str(cnt),
[('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')] ) )
# Finish np.dtype() definition with fileds for header, timestamp and summary_data, followed by tuple with list above
tb3_dt_list = [ ('header', [('id', '<i2'), ('timestamp', '<i2')]), ('summary_data', seg_kn_tlist) ]
tb3_dt = np.dtype( tb3_dt_list )
table3 = h5f.create_table(root, "Table3", tb3_dt)
row = table3.row
for i in range(2):
row['header/id'] = i
row['header/timestamp'] = i * 2.
row['summary_data/latency'] = 0.0
row['summary_data/segments_k'] = 0
for data in range(3):
row['summary_data/segments_k'+str(data)+'/segment_id'] = data
row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
row.append()
回答2:
Sorry, I'm not following your explanation in the comments. Below is an screen grab from HDFView showing the schema/table layout. As I see it, segments_k0{0}
and segments_k1{0}
are nested under summary_data
.
See how it shows summary_data->segments_k0{0}
. Is that what you want? 70 nested columns with each with 3 nested columns (segment_id
, segment_quality
and segment_length
)?
Or, do you just want segments_k0{0}
nested under the 0
row.?
I created a second table (Table2) with that schema. See the screen grab below.
Notice that segments_k0{0}
is not preceded by summary_data->
(it's not nested).
Either of these is possible when you enter the description using an np.dtype()
. You can define dtypes with a dictionary and populate the field names and formats programmatically. The second table is easier to define. I want to be sure which you want before I show you how to create the np.dtype()
.
来源:https://stackoverflow.com/questions/64449277/pytables-add-repetitive-subclass-as-column