Node: fs write() doesn't write inside loop. Why not?

为君一笑 提交于 2019-12-18 05:08:18

问题


I want to create a write stream and write to it as my data comes in. However, I am able to create the file but nothing is written to it. Eventually, the process runs out of memory.

The problem, I've discovered is that I'm calling write() whilst inside a loop.

Here's a simple example:

'use strict'

var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');

for (var i = 0; i < 10000000000; i++) {
    wstream.write(i+'\n');
}

console.log('End!')
wstream.end();

Nothing ever gets written, not even hello. But why? How can I write to the file within a loop?


回答1:


To supplement @MikeC's excellent answer, here are some relevant details from the current docs (v8.4.0) for writable.write():

If false is returned, further attempts to write data to the stream should stop until the 'drain' event is emitted.

While a stream is not draining, calls to write() will buffer chunk, and return false. Once all currently buffered chunks are drained (accepted for delivery by the operating system), the 'drain' event will be emitted. It is recommended that once write() returns false, no more chunks be written until the 'drain' event is emitted. While calling write() on a stream that is not draining is allowed, Node.js will buffer all written chunks until maximum memory usage occurs, at which point it will abort unconditionally. Even before it aborts, high memory usage will cause poor garbage collector performance and high RSS (which is not typically released back to the system, even after the memory is no longer required).

and for backpressuring in streams:

In any scenario where the data buffer has exceeded the highWaterMark or the write queue is currently busy, .write() will return false.

When a false value is returned, the backpressure system kicks in.

Once the data buffer is emptied, a .drain() event will be emitted and resume the incoming data flow.

Once the queue is finished, backpressure will allow data to be sent again. The space in memory that was being used will free itself up and prepare for the next batch of data.

               +-------------------+         +=================+
               |  Writable Stream  +--------->  .write(chunk)  |
               +-------------------+         +=======+=========+
                                                     |
                                  +------------------v---------+
   +-> if (!chunk)                |    Is this chunk too big?  |
   |     emit .end();             |    Is the queue busy?      |
   +-> else                       +-------+----------------+---+
   |     emit .write();                   |                |
   ^                                   +--v---+        +---v---+
   ^-----------------------------------<  No  |        |  Yes  |
                                       +------+        +---v---+
                                                           |
           emit .pause();          +=================+     |
           ^-----------------------+  return false;  <-----+---+
                                   +=================+         |
                                                               |
when queue is empty     +============+                         |
^-----------------------<  Buffering |                         |
|                       |============|                         |
+> emit .drain();       |  ^Buffer^  |                         |
+> emit .resume();      +------------+                         |
                        |  ^Buffer^  |                         |
                        +------------+   add chunk to queue    |
                        |            <---^---------------------<
                        +============+

Here are some visualisations (running the script with a V8 heap memory size of 512MB by using --max-old-space-size=512).

This visualisation shows the heap memory usage (red) and delta time (purple) for every 10,000 steps of i (the X axis shows i):

'use strict'

var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');
var latestTime = (new Date()).getTime();
var currentTime;

for (var i = 0; i < 10000000000; i++) {
    wstream.write(i+'\n');
    if (i % 10000 === 0) {
        currentTime = (new Date()).getTime();
        console.log([  // Output CSV data for visualisation
            i,
            (currentTime - latestTime) / 5,
            process.memoryUsage().heapUsed / (1024 * 1024)
        ].join(','));
        latestTime = currentTime;
    }
}

console.log('End!')
wstream.end();

The script runs slower and slower as the memory usage approaches the maximum limit of 512MB, until it finally crashes when the limit is reached.


This visualisation uses v8.setFlagsFromString() with --trace_gc to show the current memory usage (red) and execution time (purple) of each garbage collection (the X axis shows total elapsed time in seconds):

'use strict'

var fs = require('fs');
var v8 = require('v8');
var wstream = fs.createWriteStream('myOutput.txt');

v8.setFlagsFromString('--trace_gc');

for (var i = 0; i < 10000000000; i++) {
    wstream.write(i+'\n');
}

console.log('End!')
wstream.end();

Memory usage reaches 80% after about 4 seconds, and the garbage collector gives up trying to Scavenge and is forced to use Mark-sweep (more than 10 times slower) – see this article for more details.


For comparison, here are the same visualisations for @MikeC's code which waits for drain when the write buffer becomes full:




回答2:


The problem is that you aren't ever giving it a chance to drain the buffer. Eventually this buffer gets full and you run out of memory.

WriteStream.write returns a boolean value indicating if the data was successfully written to disk. If the data was not successfully written, you should wait for the drain event, which indicates the buffer has been drained.

Here's one way of writing your code which utilizes the return value of write and the drain event:

'use strict'

var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');

function writeToStream(i) {
  for (; i < 10000000000; i++) {
    if (!wstream.write(i + '\n')) {
      // Wait for it to drain then start writing data from where we left off
      wstream.once('drain', function() {
        writeToStream(i + 1);
      });
      return;
    }
  }
  console.log('End!')
  wstream.end();
}

writeToStream(0);



回答3:


To supplement (even more) @Mike Cluck's answer I implemented solution with the same behaviour using node stream pipe(). Maybe it will be useful for somebody. According to docs (Node 11.13.0):

The readable.pipe() method attaches a Writable stream to the readable, causing it to switch automatically into flowing mode and push all of its data to the attached Writable. The flow of data will be automatically managed so that the destination Writable stream is not overwhelmed by a faster Readable stream.

So, pipe() provides backpressure strategy out of the box. All what is needed is to create Readable stream somehow. In my example I'm extending Readable class from node stream module to create simple counter:

const { Readable } = require('stream');
const fs = require('fs');
const writeStream = fs.createWriteStream('./bigFile.txt');

class Counter extends Readable {
    constructor(opt) {
        super(opt);
        this._max = 1e7;
        this._index = 1;
    }

    _read() {
        const i = this._index++;
        if (i > this._max)
            this.push(null);
        else {
            this.push(i + '\n');
        }
    }
}

new Counter().pipe(writeStream); 

The behaviour is exactly the same - data is push to file constantly in small chunks and memory consumption is constant (on my machine ~50MB).

The great thing about pipe() is that if you have provided readable stream (from request i.e.) all what you need to do is use: readable.pipe(writable).



来源:https://stackoverflow.com/questions/37579605/node-fs-write-doesnt-write-inside-loop-why-not

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!