Does node js “response.on('data', func(chunk))” guarantee the chunk boundary

问题

I am trying to implement my node js service to use http streaming (transfer-encoding: chunked) to send back response.

I use "response.on('data', func(chunk))" to receive each chunk sent back from the service, and it is working now.

My question is that, does the "response.on('data', func(chunk))" guarantee that the chunks in the callback are exactly the same chunks that the service sends? (or it may combine multiple service sent chunks and invoke the callback only once, or split a single service sent chunk into pieces and invoke the callback multiple times?)

Thanks Michael

回答1:

My question is that, does the "response.on('data', func(chunk))" guarantee that the chunks in the callback are exactly the same chunks that the service sends?

No. Streams have zero guarantees about chunk boundaries. Boundaries can literally be anywhere.

or it may combine multiple service sent chunks and invoke the callback only once

Yes, it may.

or split a single service sent chunk into pieces and invoke the callback multiple times

Yes, it may.

If you have to process discrete pieces of data (e.g. specific chunks of data) which is not generally the best fit for a stream, then you need to create delineations in the stream that tell you where a chunk you want to process starts and stops so you can read and buffer a chunk until you get to the end of the chunk and then process that whole chunk, even if the chunk spans two or more actual data events.

There are lots of different ways to delineate a specific chunk of data in a stream and which technique to use depends entirely upon the type of data. The simplest example of delineations are the CRLF that delineate lines in a text file. There are many other ways of doing this. For example in binary work, you may stream a header that contains a content length that tells you exactly how many bytes to expect before the end of the chunk. MIME creates unique string markers that delineate sections. There are lots of different ways to do it, depending upon the circumstances of the data.

FYI, if the writer of the data (on the other end of the stream), writes a chunk of data, then pauses for a bit (long enough that the chunk is physically sent over the network), then writes another chunk of data, then pauses again, then the recipient is likely to get each chunk together at once. But, that is by no means guaranteed and should not be counted on. Any perturbation in the transport could easily lead to delays or retransmissions that could end up co-mingling the data from separately sent chunks such that more than one chunk would be received in a given data event. Similarly, if the data being sent gets large or there are transmission hiccups or other network infrastructure in the path that causes the data to be broken into smaller pieces, a single chunk of data could be received in multiple data events.

If you need to gather a specific chunk of data before processing it, then you need to have your own code that assembles that chunk into a buffer as data arrives and recognizes when you have a whole chunk and then process that chunk. That code needs to handle all these situations:

One chunk arrives in multiple data events
One data event contains multiple chunks or parts of multiple chunks
The boundaries of what arrives in a data event are not the same as the boundaries of your chunk (e.g. your chunk gets broken into multiple data events and the next data event may contain the end of your chunk and the start of the next chunk).

FYI, it is often useful to create your own stream subclass that handles the boundaries in your data automatically and then emits its own message when it has a fully formed "chunk" of your data. It will typically have to use internal buffers and boundary detection in order to implement that.

For example, there are many modules that implement line by line reading of a text stream. They buffer the data from a data event, split it into whole lines (retaining the extra part of the last line that may not be a whole line), then emit their own line events for each arriving line.

You may want to do something similar for your own chunk data type. That will then let you use your derived stream in a much simpler manner where it will emit only whole chunks and the code you write to use that object will be a lot simpler because you will have centralized the chunk detection logic so the rest of your code doesn't have to worry about it.

来源：https://stackoverflow.com/questions/49020366/does-node-js-response-ondata-funcchunk-guarantee-the-chunk-boundary

标签

node.js

chunked-encoding