I\'m looking for an XML parser that instead of parsing from an InputStream or InputSource will instead allow blocks of text to be pushed into the parser. E.g. I would like t
Surprisingly no one mentioned one Java XML parser that does implement non-blocking ("async") parsing: Aalto. Part of the reason may be lack of documentation (and its low level of activity). Aalto implements basic Stax API, but also minor extensions to allow pushing input (this part has not been finalized; functionality exists but API is not finalized). For more information you could check out related discussion group.
Edit: Now I see. You receive the XML in chunks and you want to feed it into a proper XML parser. So you need an object, which is a queue at the one end, and an InputStream at the other end?
You could aggregate the byte arrays received into a ByteArrayOutputStream, convert it to ByteArrayInputStream and feed it to the SAXParser.
Or you could check out the PipedInputStream/PipedOutputStream pair. In this case, you'll need to do the parsing in another thread as SAX parser uses the current thread to emit events, blocking your receive().
Edit: Based on the comments I suggest taking the aggregation route. You collect the chunks into a ByteArrayOutputStream. To know whether you received all chunks for your XML, check if the current chunk or the contents of the ByteArrayOutputStream contains your end tag of the XML root node. Then you could just pass the data into a SAXParser which can now run in the current thread without problems. To avoid unnecessary array re-creation you could implement your own unsynchronized simple byte array wrapper or look for such implementation.
Check openfire's XMLLeightweightParser and how it generates XML messages from single chunks because of NIO. The whole project is a great source for answers regarding NIO and XMPP questions.
Adding another answer as this question remains high for relevant Google searches - aalto-xml 0.9.7 (March 2011) has asynchronous XML pasing. This allows you to pass arbitrary sized chunks of a document to continue parsing, and a new StaX event type EVENT_INCOMPLETE
to indicate the input buffer is exhausted and the document remains incomplete.
This is Tatu Salorant's (the author's) example:
byte[] msg = "<html>Very <b>simple</b> input document!</html>".getBytes();
AsyncXMLStreamReader asyncReader = new InputFactoryImpl().createAsyncXMLStreamReader();
final AsyncInputFeeder feeder = asyncReader.getInputFeeder();
int inputPtr = 0; // as we feed byte at a time
int type = 0;
do {
// May need to feed multiple "segments"
while ((type = asyncReader.next()) == AsyncXMLStreamReader.EVENT_INCOMPLETE) {
feeder.feedInput(msg, inputPtr++, 1);
if (inputPtr >= msg.length) { // to indicate end-of-content (important for error handling)
feeder.endOfInput();
}
}
// and once we have full event, we just dump out event type (for now)
System.out.println("Got event of type: "+type);
// could also just copy event as is, using Stax, or do any other normal non-blocking handling:
// xmlStreamWriter.copyEventFromReader(asyncReader, false);
} while (type != AsyncXMLStreamReader.END_DOCUMENT);
NioSax works with ByteBuffers
http://blog.retep.org/2010/06/25/niosax-sax-style-xml-parser-for-java-nio/
The source code for the latest version I could find (10.6 from 2010) is in the Sonatype Maven repository:
https://oss.sonatype.org/content/repositories/releases/uk/org/retep/
This is a (April 2009) post from the Xerces J-Users mailing list, where the original poster is having the exact same issue. One potentially very good response by "Jeff" is given, but there is no follow up to the original poster's response:
http://www.nabble.com/parsing-an-xml-document-chunk-by-chunk-td22945319.html
It's potentially new enough to bump on the list, or at very least help with the search.
Edit
Found another useful link, mentioning a library called Woodstox and describing the state of Stream vs. NIO based parsers and some possible approaches to emulating a stream:
http://markmail.org/message/ogqqcj7dt3lwkbov