Parsing XML with namespace in Python via 'ElementTree'

前端 未结 6 1693
臣服心动
臣服心动 2020-11-21 09:48

I have the following XML which I want to parse using Python\'s ElementTree:



        
6条回答
  •  -上瘾入骨i
    2020-11-21 10:35

    Note: This is an answer useful for Python's ElementTree standard library without using hardcoded namespaces.

    To extract namespace's prefixes and URI from XML data you can use ElementTree.iterparse function, parsing only namespace start events (start-ns):

    >>> from io import StringIO
    >>> from xml.etree import ElementTree
    >>> my_schema = u'''
    ... 
    ...     
    ...         basketball league
    ...         
    ...           a group of sports teams that compete against each other
    ...           in Basketball
    ...         
    ...     
    ... 
    ... '''
    >>> my_namespaces = dict([
    ...     node for _, node in ElementTree.iterparse(
    ...         StringIO(my_schema), events=['start-ns']
    ...     )
    ... ])
    >>> from pprint import pprint
    >>> pprint(my_namespaces)
    {'': 'http://dbpedia.org/ontology/',
     'owl': 'http://www.w3.org/2002/07/owl#',
     'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
     'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
     'xsd': 'http://www.w3.org/2001/XMLSchema#'}
    

    Then the dictionary can be passed as argument to the search functions:

    root.findall('owl:Class', my_namespaces)
    

提交回复
热议问题