I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a
You should try following:
for node in nodes:
print node.attrib
This will return dict of all attributes of node as {'id': '1', 'weight': '80', 'height': '160'}
If you want to get something like [('@id', '1'), ('@height', '160'), ('@weight', '80')]
:
list_of_attributes = []
for node in nodes:
attrs = []
for att in node.attrib:
attrs.append(("@" + att, node.attrib[att]))
list_of_attributes.append(attrs)
Output:
[[('@id', '1'), ('@height', '160'), ('@weight', '80')], [('@id', '2'), ('@weight', '70')], [('@id', '3'), ('@height', '140')]]
I was wrong in my assertion that I was not going to use Python. I found that the lxml/etree implementation is easily extended to that I can use the XPath DSL with modifications.
I registered the function "dictify". I changed the XPath expression to :
dictify('@id|@height|@weight|weight|height')
The new code is:
from lxml import etree
xml = """
<records>
<row id="1" height="160" weight="80" />
<row id="2" weight="70" ><height>150</height></row>
<row id="3" height="140" />
</records>
"""
def dictify(context, names):
node = context.context_node
rv = []
rv.append('__dictify_start_marker__')
names = names.split('|')
for n in names:
if n.startswith('@'):
val = node.attrib.get(n[1:])
if val != None:
rv.append(n)
rv.append(val)
else:
children = node.findall(n)
for child_node in children:
rv.append(n)
rv.append(child_node.text)
rv.append('__dictify_end_marker__')
return rv
etree_functions = etree.FunctionNamespace(None)
etree_functions['dictify'] = dictify
parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
print node.xpath("dictify('@id|@height|@weight|weight|height')")
This produces the following output:
['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']