Python XML Parsing without root

前端 未结 3 883
庸人自扰
庸人自扰 2021-02-14 14:33

I wanted to parse a fairly huge xml-like file which doesn\'t have any root element. The format of the file is:




         


        
3条回答
  •  别跟我提以往
    2021-02-14 15:17

    lxml.html can parse fragments:

    from lxml import html
    s = """
     
     
    
    
    
     
    """
    doc = html.fromstring(s)
    for thing in doc:
        print thing
        for other in thing:
            print other
    """
    >>> 
    
    
    
    
    >>>
    """
    

    Courtesy this SO answer

    And if there is more than one level of nesting:

    def flatten(nested):
        """recusively flatten nested elements
    
        yields individual elements
        """
        for thing in nested:
            yield thing
            for other in flatten(thing):
                yield other
    doc = html.fromstring(s)
    for thing in flatten(doc):
        print thing
    

    Similarly, lxml.etree.HTML will parse this. It adds html and body tags:

    d = etree.HTML(s)
    for thing in d.iter():
        print thing
    
    """ 
    
    
    
    
    
    
    """
    

提交回复
热议问题