I have an xml, small part of it looks like this:
From what I gather, it has something to do with the namespace recognition in ET.
from here http://effbot.org/zone/element-namespaces.htm
When you save an Element tree to XML, the standard Element serializer generates unique prefixes for all URI:s that appear in the tree. The prefixes usually have the form “ns” followed by a number. For example, the above elements might be serialized with the prefix ns0 for “http://www.w3.org/1999/xhtml” and ns1 for “http://effbot.org/namespace/letters”.
If you want to use specific prefixes, you can add prefix/uri mappings to a global table in the ElementTree module. In 1.3 and later, you do this by calling the register_namespace function. In earlier versions, you can access the internal table directly:
ET.register_namespace(prefix, uri)
ET._namespace_map[uri] = prefix
Note the argument order; the function takes the prefix first, while the raw dictionary maps from URI:s to prefixes.
This snippet from your question,
for a in root.findall('{urn:com:xml:data}image'):
print a.attrib
does not output anything because it only looks for direct {urn:com:xml:data}image
children of the root of the tree.
This slightly modified code,
for a in root.findall('.//{urn:com:xml:data}image'):
print a.attrib
will print {'imageId': '1'}
because it uses .//
, which selects matching subelements on all levels.
Reference: https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax.
It is a bit annoying that ElementTree does not just retain the original namespace prefixes by default, but keep in mind that it is not the prefixes that matter anyway. The register_namespace()
function can be used to set the wanted prefix when serializing the XML. The function does not have any effect on parsing or searching.