Python elementTree seems unusable with namespaces. What are my alternatives? BeautifulSoup is pretty rubbish with namespaces too. I don\'t want to strip them out.
Ex
How about:
http://docs.python.org/library/pyexpat.html
lxml is namespace-aware.
>>> from lxml import etree
>>> et = etree.XML("""<root xmlns="foo" xmlns:stuff="bar"><bar><stuff:baz /></bar></root>""")
>>> etree.tostring(et, encoding=str) # encoding=str only needed in Python 3, to avoid getting bytes
'<root xmlns="foo" xmlns:stuff="bar"><bar><stuff:baz/></bar></root>'
>>> et.xpath("f:bar", namespaces={"b":"bar", "f": "foo"})
[<Element {foo}bar at ...>]
Edit: On your example:
from lxml import etree
# remove the b prefix in Python 2
# needed in python 3 because
# "Unicode strings with encoding declaration are not supported."
et = etree.XML(b"""...""")
ns = {
'lom': 'http://ltsc.ieee.org/xsd/LOM',
'zs': 'http://www.loc.gov/zing/srw/',
'dc': 'http://purl.org/dc/elements/1.1/',
'voc': 'http://www.schooletc.co.uk/vocabularies/',
'srw_dc': 'info:srw/schema/1/dc-schema'
}
# according to docs, .xpath returns always lists when querying for elements
# .find returns one element, but only supports a subset of XPath
record = et.xpath("zs:records/zs:record", namespaces=ns)[0]
# in this example, we know there's only one record
# but else, you should apply the following to all elements the above returns
name = record.xpath("//voc:name", namespaces=ns)[0].text
print("name:", name)
lom_entry = record.xpath("zs:recordData/srw_dc:dc/"
"lom:metaMetadata/lom:identifier/"
"lom:entry",
namespaces=ns)[0].text
print('lom_entry:', lom_entry)
lom_ids = [id.text for id in
record.xpath("zs:recordData/srw_dc:dc/"
"lom:classification/lom:taxonPath/"
"lom:taxon/lom:id",
namespaces=ns)]
print("lom_ids:", lom_ids)
Output:
name: Frank Malina
lom_entry: 2.6
lom_ids: ['PYTHON', 'XML', 'XML-NAMESPACES']
libxml (http://xmlsoft.org/) Best, faster lib for xml parsing. There are implementation for python.