I use Python\'s iterparse
to parse the XML result of a nessus scan (.nessus file). The parsing fails on unexpected records, wile similar ones have been parsed corre
From the iterparse() docs:
Note: iterparse() only guarantees that it has seen the “>” character of a starting tag when it emits a “start” event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present. If you need a fully populated element, look for “end” events instead.
Drop inReport*
variables and process ReportHost only on "end" events when it fully parsed. Use ElementTree API to get necessary info such as cvss_base_score
from current ReportHost element.
To preserve memory, do:
import xml.etree.cElementTree as etree
def getelements(filename_or_file, tag):
context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
_, root = next(context) # get root element
for event, elem in context:
if event == 'end' and elem.tag == tag:
yield elem
root.clear() # preserve memory
for host in getelements("test2.nessus", "ReportHost"):
for cvss_el in host.iter("cvss_base_score"):
print(cvss_el.text)