I want to create a write stream and write to it as my data comes in. However, I am able to create the file but nothing is written to it. Eventually, the process runs out of
To supplement @MikeC's excellent answer, here are some relevant details from the current docs (v8.4.0) for writable.write():
If
false
is returned, further attempts to write data to the stream should stop until the'drain'
event is emitted.While a stream is not draining, calls to
write()
will bufferchunk
, and returnfalse
. Once all currently buffered chunks are drained (accepted for delivery by the operating system), the'drain'
event will be emitted. It is recommended that oncewrite()
returnsfalse
, no more chunks be written until the'drain'
event is emitted. While callingwrite()
on a stream that is not draining is allowed, Node.js will buffer all written chunks until maximum memory usage occurs, at which point it will abort unconditionally. Even before it aborts, high memory usage will cause poor garbage collector performance and high RSS (which is not typically released back to the system, even after the memory is no longer required).
and for backpressuring in streams:
In any scenario where the data buffer has exceeded the
highWaterMark
or the write queue is currently busy,.write()
will returnfalse
.When a
false
value is returned, the backpressure system kicks in.Once the data buffer is emptied, a
.drain()
event will be emitted and resume the incoming data flow.Once the queue is finished, backpressure will allow data to be sent again. The space in memory that was being used will free itself up and prepare for the next batch of data.
+-------------------+ +=================+
| Writable Stream +---------> .write(chunk) |
+-------------------+ +=======+=========+
|
+------------------v---------+
+-> if (!chunk) | Is this chunk too big? |
| emit .end(); | Is the queue busy? |
+-> else +-------+----------------+---+
| emit .write(); | |
^ +--v---+ +---v---+
^-----------------------------------< No | | Yes |
+------+ +---v---+
|
emit .pause(); +=================+ |
^-----------------------+ return false; <-----+---+
+=================+ |
|
when queue is empty +============+ |
^-----------------------< Buffering | |
| |============| |
+> emit .drain(); | ^Buffer^ | |
+> emit .resume(); +------------+ |
| ^Buffer^ | |
+------------+ add chunk to queue |
| <---^---------------------<
+============+
Here are some visualisations (running the script with a V8 heap memory size of 512MB by using --max-old-space-size=512).
This visualisation shows the heap memory usage (red) and delta time (purple) for every 10,000 steps of i
(the X axis shows i
):
'use strict'
var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');
var latestTime = (new Date()).getTime();
var currentTime;
for (var i = 0; i < 10000000000; i++) {
wstream.write(i+'\n');
if (i % 10000 === 0) {
currentTime = (new Date()).getTime();
console.log([ // Output CSV data for visualisation
i,
(currentTime - latestTime) / 5,
process.memoryUsage().heapUsed / (1024 * 1024)
].join(','));
latestTime = currentTime;
}
}
console.log('End!')
wstream.end();
The script runs slower and slower as the memory usage approaches the maximum limit of 512MB, until it finally crashes when the limit is reached.
This visualisation uses v8.setFlagsFromString() with --trace_gc to show the current memory usage (red) and execution time (purple) of each garbage collection (the X axis shows total elapsed time in seconds):
'use strict'
var fs = require('fs');
var v8 = require('v8');
var wstream = fs.createWriteStream('myOutput.txt');
v8.setFlagsFromString('--trace_gc');
for (var i = 0; i < 10000000000; i++) {
wstream.write(i+'\n');
}
console.log('End!')
wstream.end();
Memory usage reaches 80% after about 4 seconds, and the garbage collector gives up trying to Scavenge and is forced to use Mark-sweep (more than 10 times slower) – see this article for more details.
For comparison, here are the same visualisations for @MikeC's code which waits for drain
when the write
buffer becomes full:
The problem is that you aren't ever giving it a chance to drain the buffer. Eventually this buffer gets full and you run out of memory.
WriteStream.write returns a boolean value indicating if the data was successfully written to disk. If the data was not successfully written, you should wait for the drain event, which indicates the buffer has been drained.
Here's one way of writing your code which utilizes the return value of write
and the drain
event:
'use strict'
var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');
function writeToStream(i) {
for (; i < 10000000000; i++) {
if (!wstream.write(i + '\n')) {
// Wait for it to drain then start writing data from where we left off
wstream.once('drain', function() {
writeToStream(i + 1);
});
return;
}
}
console.log('End!')
wstream.end();
}
writeToStream(0);
To supplement (even more) @Mike Cluck's answer I implemented solution with the same behaviour using node stream pipe()
. Maybe it will be useful for somebody.
According to docs (Node 11.13.0):
The readable.pipe() method attaches a Writable stream to the readable, causing it to switch automatically into flowing mode and push all of its data to the attached Writable. The flow of data will be automatically managed so that the destination Writable stream is not overwhelmed by a faster Readable stream.
So, pipe()
provides backpressure strategy out of the box. All what is needed is to create Readable stream somehow. In my example I'm extending Readable
class from node stream module to create simple counter:
const { Readable } = require('stream');
const fs = require('fs');
const writeStream = fs.createWriteStream('./bigFile.txt');
class Counter extends Readable {
constructor(opt) {
super(opt);
this._max = 1e7;
this._index = 1;
}
_read() {
const i = this._index++;
if (i > this._max)
this.push(null);
else {
this.push(i + '\n');
}
}
}
new Counter().pipe(writeStream);
The behaviour is exactly the same - data is push to file constantly in small chunks and memory consumption is constant (on my machine ~50MB).
The great thing about pipe()
is that if you have provided readable stream (from request i.e.) all what you need to do is use: readable.pipe(writable)
.