问题
I have created external links leading from one hdf5 file to another using pytables. My question is how to de-reference it in a loop?
for example:
Let's assume file_name = "collection.h5"
, where external links are stored
I created external links under the root node and when i traverse the nodes under the root, i get the following output :
/link1 (ExternalLink) -> /files/data1.h5:/weights/Image
/link2 (ExternalLink) -> /files/data2.h5:/weights/Image
and so on,
I know that for de-referencing a link, it can be done like this, using natural naming in the below manner:
f = open_file('collection.h5',mode='r')
plink1 = f.root.link1()
plink2 = f.root.link2()
but I want to do this in a for-loop, any help regarding this?
回答1:
This is a more complete (robust and complicated) answer to handle the general condition when you have an ExternalLink at any group level. It is similar to above, but uses walk_nodes()
because it has 3 groups at the root level, and includes a test for ExternalLink types (see isinstance()
). Also, it shows how to use the _v_children
attribute to get a dictionary of nodes. (I couldn't get list_nodes()
to work with an ExternalLink.)
import tables as tb
import glob
h5f = tb.open_file('collection.h5',mode='w')
link_cnt = 0
pre_list = ['SO_53', 'SO_54', 'SO_55']
for h5f_pre in pre_list :
h5f_pre_grp = h5f.create_group('/', h5f_pre)
for h5name in glob.glob('./'+h5f_pre+'*.h5'):
link_cnt += 1
h5f.create_external_link(h5f_pre_grp, 'link_'+'%02d'%(link_cnt), h5name+':/')
h5f.close()
h5f = tb.open_file('collection.h5',mode='r')
for link_node in h5f.walk_nodes('/') :
if isinstance(link_node, tb.link.ExternalLink) :
print('\nFor Node %s:' % (link_node._v_pathname) )
print("``%s`` is an external link to: ``%s``" % (link_node, link_node.target))
plink = link_node(mode='r') # this returns a file object for the linked file
linked_nodes = plink._v_children
print (linked_nodes)
h5f.close()
回答2:
You can use iter_nodes()
or walk_nodes()
; walk_nodes
is recursive, iter_nodes
is not. An example of iter_nodes()
is explained in my answer to this SO topic:
cannot-retrieve-datasets-in-pytables-using-natural-naming
I discovered you can't use get_node()
to reference an ExternalLink. You need to reference differently.
Here's a simple example that creates collection.h5
from a list of HDF5 files in my local folder, then uses iter_nodes()
in a for
loop. Note that this is a very basic example. It does not check the Node's object type (Group or Leaf or ExternalLink). It assumes each Node at the root level is an ExternalLink, and creates a file object from the node. There are additional PyTables methods and attributes to check for these situations. See detailed answer below for a more robust (complicated) method.
import tables as tb
import glob
h5f = tb.open_file('collection.h5',mode='w')
link_cnt = 0
for h5name in glob.glob('./SO*.h5'):
link_cnt += 1
h5f.create_external_link('/', 'link'+str(link_cnt), h5name+':/')
h5f.close()
h5f = tb.open_file('collection.h5',mode='r')
for link_node in h5f.iter_nodes('/') :
print("``%s`` is an external link to: ``%s``" % (link_node, link_node.target))
plink = link_node(mode='r') # returns a FILE object
h5f.close()
来源:https://stackoverflow.com/questions/55391339/how-to-de-reference-a-list-of-external-links-using-pytables