I have several hdf5
files, each of them with the same structure. I'd like to create one pytable
out of them by somehow merging the hdf5
files.
What I mean is that if an array in file1 has size x and array in file2 has size y, the resulting array in the pytable
will be of size x+y, containing first all the entries from file1 and then all the entries from file2.
How you want to do this depends slightly on the data type that you have. Arrays and CArrays have a static size so you need to preallocate the data space. Thus you would do something like the following:
import tables as tb
file1 = tb.open_file('/path/to/file1', 'r')
file2 = tb.open_file('/path/to/file2', 'r')
file3 = tb.open_file('/path/to/file3', 'r')
x = file1.root.x
y = file2.root.y
z = file3.create_array('/', 'z', atom=x.atom, shape=(x.nrows + y.nrows,))
z[:x.nrows] = x[:]
z[x.nrows:] = y[:]
However, EArrays and Tables are extendable. Thus you don't need to preallocate the size and can copy_node() and append() instead.
import tables as tb
file1 = tb.open_file('/path/to/file1', 'r')
file2 = tb.open_file('/path/to/file2', 'r')
file3 = tb.open_file('/path/to/file3', 'r')
x = file1.root.x
y = file2.root.y
z = file1.copy_node('/', name='x', newparent=file3.root, newname='z')
z.append(y)
来源:https://stackoverflow.com/questions/19116917/merging-several-hdf5-files-into-one-pytable