I have the following XML which I want to parse using Python\'s ElementTree
:
ElementTree is not too smart about namespaces. You need to give the .find()
, findall()
and iterfind()
methods an explicit namespace dictionary. This is not documented very well:
namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed
root.findall('owl:Class', namespaces)
Prefixes are only looked up in the namespaces
parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl:
part, looks up the corresponding namespace URL in the namespaces
dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class
instead. You can use the same syntax yourself too of course:
root.findall('{http://www.w3.org/2002/07/owl#}Class')
If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in a .nsmap
attribute on elements.