XML: Process large data

前端 未结 6 1622
盖世英雄少女心
盖世英雄少女心 2021-01-14 13:18

What XML-parser do you recommend for the following purpose:

The XML-file (formatted, containing whitespaces) is around 800 MB. It mostly contains three types of tag

6条回答
  •  迷失自我
    2021-01-14 13:26

    What XML-parser do you recommend for the following purpose: The XML-file (formatted, containing whitespaces) is around 800 MB.

    Perhaps you should take a look at VTD-XML: http://en.wikipedia.org/wiki/VTD-XML (see http://sourceforge.net/projects/vtd-xml/ for download)

    It mostly contains three types of tag (let's call them n, w and r). They have an attribute called id which i'd have to search for, as fast as possible.

    I know it's blasphemy but have you considered awk or grep to preprocess? I mean, I know you can't actually parse xml and detect errors in nested structures like XML with that, but perhaps your XML is in such a form that it might just happens to be possible?

    I know that XSLT could be used. Or are there any easy alternatives?

    As far as I know XSLT processors operate on a DOM tree of the source document...so they'd need to parse and load the entire document into memory...probably not a good idea for a document this large (or perhaps you have enough memory for that?) There is something called streaming XSLT but I think the technique is quite young and there aren't many implementations around, none free AFAIK so you could try.

提交回复
热议问题