iterparse fails to parse a field, while other similar ones are fine

后端未结

关注

 1  1236

我寻月下人不归 2021-01-21 18:15

I use Python\'s iterparse to parse the XML result of a nessus scan (.nessus file). The parsing fails on unexpected records, wile similar ones have been parsed corre

1条回答

囚心锁ツ (楼主)

2021-01-21 19:16
From the iterparse() docs:

Note: iterparse() only guarantees that it has seen the “>” character of a starting tag when it emits a “start” event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present. If you need a fully populated element, look for “end” events instead.

Drop inReport* variables and process ReportHost only on "end" events when it fully parsed. Use ElementTree API to get necessary info such as cvss_base_score from current ReportHost element.

To preserve memory, do:
```
import xml.etree.cElementTree as etree

def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory

for host in getelements("test2.nessus", "ReportHost"):
    for cvss_el in host.iter("cvss_base_score"):
        print(cvss_el.text)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...