I wanted to parse a fairly huge xml-like file which doesn\'t have any root element. The format of the file is:
lxml.html
can parse fragments:
from lxml import html
s = """
"""
doc = html.fromstring(s)
for thing in doc:
print thing
for other in thing:
print other
"""
>>>
>>>
"""
Courtesy this SO answer
And if there is more than one level of nesting:
def flatten(nested):
"""recusively flatten nested elements
yields individual elements
"""
for thing in nested:
yield thing
for other in flatten(thing):
yield other
doc = html.fromstring(s)
for thing in flatten(doc):
print thing
Similarly, lxml.etree.HTML
will parse this. It adds html and body tags:
d = etree.HTML(s)
for thing in d.iter():
print thing
"""
"""