Read a file one line at a time in node.js?

前端 未结 29 1077
深忆病人
深忆病人 2020-11-22 04:33

I am trying to read a large file one line at a time. I found a question on Quora that dealt with the subject but I\'m missing some connections to make the whole thing fit to

相关标签:
29条回答
  • 2020-11-22 04:57

    I have looked through all above answers, all of them use third-party library to solve it. It's have a simple solution in Node's API. e.g

    const fs= require('fs')
    
    let stream = fs.createReadStream('<filename>', { autoClose: true })
    
    stream.on('data', chunk => {
        let row = chunk.toString('ascii')
    }))
    
    0 讨论(0)
  • 2020-11-22 04:58

    Since Node.js v0.12 and as of Node.js v4.0.0, there is a stable readline core module. Here's the easiest way to read lines from a file, without any external modules:

    const fs = require('fs');
    const readline = require('readline');
    
    async function processLineByLine() {
      const fileStream = fs.createReadStream('input.txt');
    
      const rl = readline.createInterface({
        input: fileStream,
        crlfDelay: Infinity
      });
      // Note: we use the crlfDelay option to recognize all instances of CR LF
      // ('\r\n') in input.txt as a single line break.
    
      for await (const line of rl) {
        // Each line in input.txt will be successively available here as `line`.
        console.log(`Line from file: ${line}`);
      }
    }
    
    processLineByLine();
    

    Or alternatively:

    var lineReader = require('readline').createInterface({
      input: require('fs').createReadStream('file.in')
    });
    
    lineReader.on('line', function (line) {
      console.log('Line from file:', line);
    });
    

    The last line is read correctly (as of Node v0.12 or later), even if there is no final \n.

    UPDATE: this example has been added to Node's API official documentation.

    0 讨论(0)
  • 2020-11-22 04:58

    This is my favorite way of going through a file, a simple native solution for a progressive (as in not a "slurp" or all-in-memory way) file read with modern async/await. It's a solution that I find "natural" when processing large text files without having to resort to the readline package or any non-core dependency.

    let buf = '';
    for await ( const chunk of fs.createReadStream('myfile') ) {
        const lines = buf.concat(chunk).split(/\r?\n/);
        buf = lines.pop();
        for( const line of lines ) {
            console.log(line);
        }
    }
    if(buf.length) console.log(buf);  // last line, if file does not end with newline
    

    You can adjust encoding in the fs.createReadStream or use chunk.toString(<arg>). Also this let's you better fine-tune the line splitting to your taste, ie. use .split(/\n+/) to skip empty lines and control the chunk size with { highWaterMark: <chunkSize> }.

    Don't forget to create a function like processLine(line) to avoid repeating the line processing code twice due to the ending buf leftover. Unfortunately, the ReadStream instance does not update its end-of-file flags in this setup, so there's no way, afaik, to detect within the loop that we're in the last iteration without some more verbose tricks like comparing the file size from a fs.Stats() with .bytesRead. Hence the final buf processing solution, unless you're absolutely sure your file ends with a newline \n, in which case the for await loop should suffice.

    ★ If you prefer the evented asynchronous version, this would be it:

    let buf = '';
    fs.createReadStream('myfile')
    .on('data', chunk => {
        const lines = buf.concat(chunk).split(/\r?\n/);
        buf = lines.pop();
        for( const line of lines ) {
            console.log(line);
        }
    })
    .on('end', () => buf.length && console.log(buf) );
    

    ★ Now if you don't mind importing the stream core package, then this is the equivalent piped stream version, which allows for chaining transforms like gzip decompression:

    const { Writable } = require('stream');
    let buf = '';
    fs.createReadStream('myfile').pipe(
        new Writable({
            write: (chunk, enc, next) => {
                const lines = buf.concat(chunk).split(/\r?\n/);
                buf = lines.pop();
                for (const line of lines) {
                    console.log(line);
                }
                next();
            }
        })
    ).on('finish', () => buf.length && console.log(buf) );
    
    0 讨论(0)
  • 2020-11-22 04:59

    there is a very nice module for reading a file line by line, it's called line-reader

    with it you simply just write:

    var lineReader = require('line-reader');
    
    lineReader.eachLine('file.txt', function(line, last) {
      console.log(line);
      // do whatever you want with line...
      if(last){
        // or check if it's the last one
      }
    });
    

    you can even iterate the file with a "java-style" interface, if you need more control:

    lineReader.open('file.txt', function(reader) {
      if (reader.hasNextLine()) {
        reader.nextLine(function(line) {
          console.log(line);
        });
      }
    });
    
    0 讨论(0)
  • 2020-11-22 04:59

    Two questions we must ask ourselves while doing such operations are:

    1. What's the amount of memory used to perform it?
    2. Is the memory consumption increasing drastically with the file size?

    Solutions like require('fs').readFileSync() loads the whole file into memory. That means that the amount of memory required to perform operations will be almost equivalent to the file size. We should avoid these for anything larger than 50mbs

    We can easily track the amount of memory used by a function by placing these lines of code after the function invocation :

        const used = process.memoryUsage().heapUsed / 1024 / 1024;
        console.log(
          `The script uses approximately ${Math.round(used * 100) / 100} MB`
        );
    

    Right now the best way to read particular lines from a large file is using node's readline. The documentation has an amazing examples.

    Although we don't need any third-party module to do it. But, If you are writing an enterprise code, you have to handle lots of edge cases. I had to write a very lightweight module called Apick File Storage to handle all those edge cases.

    Apick File Storage module : https://www.npmjs.com/package/apickfs Documentation : https://github.com/apickjs/apickFS#readme

    Example file: https://1drv.ms/t/s!AtkMCsWInsSZiGptXYAFjalXOpUx

    Example : Install module

    npm i apickfs
    
    // import module
    const apickFileStorage = require('apickfs');
    
    //invoke readByLineNumbers() method
    apickFileStorage
      .readByLineNumbers(path.join(__dirname), 'big.txt', [163845])
      .then(d => {
        console.log(d);
      })
      .catch(e => {
        console.log(e);
      });
    

    This method was successfully tested with up to 4 GB dense files.

    big.text is a dense text file with 163,845 lines and is of 124 Mb. The script to read 10 different lines from this file uses approximately just 4.63 MB Memory only. And it parses valid JSON to Objects or Arrays for free.

    0 讨论(0)
  • 2020-11-22 05:00

    You can always roll your own line reader. I have'nt benchmarked this snippet yet, but it correctly splits the incoming stream of chunks into lines without the trailing '\n'

    var last = "";
    
    process.stdin.on('data', function(chunk) {
        var lines, i;
    
        lines = (last+chunk).split("\n");
        for(i = 0; i < lines.length - 1; i++) {
            console.log("line: " + lines[i]);
        }
        last = lines[i];
    });
    
    process.stdin.on('end', function() {
        console.log("line: " + last);
    });
    
    process.stdin.resume();
    

    I did come up with this when working on a quick log parsing script that needed to accumulate data during the log parsing and I felt that it would nice to try doing this using js and node instead of using perl or bash.

    Anyway, I do feel that small nodejs scripts should be self contained and not rely on third party modules so after reading all the answers to this question, each using various modules to handle line parsing, a 13 SLOC native nodejs solution might be of interest .

    0 讨论(0)
提交回复
热议问题