What XML-parser do you recommend for the following purpose:
The XML-file (formatted, containing whitespaces) is around 800 MB. It mostly contains three types of tag
What XML-parser do you recommend for the following purpose: The XML-file (formatted, containing whitespaces) is around 800 MB.
Perhaps you should take a look at VTD-XML: http://en.wikipedia.org/wiki/VTD-XML (see http://sourceforge.net/projects/vtd-xml/ for download)
It mostly contains three types of tag (let's call them n, w and r). They have an attribute called id which i'd have to search for, as fast as possible.
I know it's blasphemy but have you considered awk or grep to preprocess? I mean, I know you can't actually parse xml and detect errors in nested structures like XML with that, but perhaps your XML is in such a form that it might just happens to be possible?
I know that XSLT could be used. Or are there any easy alternatives?
As far as I know XSLT processors operate on a DOM tree of the source document...so they'd need to parse and load the entire document into memory...probably not a good idea for a document this large (or perhaps you have enough memory for that?) There is something called streaming XSLT but I think the technique is quite young and there aren't many implementations around, none free AFAIK so you could try.