iterparse fails to parse a field, while other similar ones are fine

后端 未结 1 1236
我寻月下人不归
我寻月下人不归 2021-01-21 18:15

I use Python\'s iterparse to parse the XML result of a nessus scan (.nessus file). The parsing fails on unexpected records, wile similar ones have been parsed corre

1条回答
  •  囚心锁ツ
    2021-01-21 19:16

    From the iterparse() docs:

    Note: iterparse() only guarantees that it has seen the “>” character of a starting tag when it emits a “start” event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present. If you need a fully populated element, look for “end” events instead.

    Drop inReport* variables and process ReportHost only on "end" events when it fully parsed. Use ElementTree API to get necessary info such as cvss_base_score from current ReportHost element.

    To preserve memory, do:

    import xml.etree.cElementTree as etree
    
    def getelements(filename_or_file, tag):
        context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
        _, root = next(context) # get root element
        for event, elem in context:
            if event == 'end' and elem.tag == tag:
                yield elem
                root.clear() # preserve memory
    
    for host in getelements("test2.nessus", "ReportHost"):
        for cvss_el in host.iter("cvss_base_score"):
            print(cvss_el.text)
    

    0 讨论(0)
提交回复
热议问题