Read file from aws s3 bucket using node fs

后端 未结 11 717
逝去的感伤
逝去的感伤 2020-12-07 21:37

I am attempting to read a file that is in a aws s3 bucket using

fs.readFile(file, function (err, contents) {
  var myLines = contents.Body.toString().split(         


        
相关标签:
11条回答
  • 2020-12-07 22:16

    This will do it:

    new AWS.S3().getObject({ Bucket: this.awsBucketName, Key: keyName }, function(err, data)
    {
        if (!err)
            console.log(data.Body.toString());
    });
    
    0 讨论(0)
  • 2020-12-07 22:19

    Since you seem to want to process an S3 text file line-by-line. Here is a Node version that uses the standard readline module and AWS' createReadStream()

    const readline = require('readline');
    
    const rl = readline.createInterface({
        input: s3.getObject(params).createReadStream()
    });
    
    rl.on('line', function(line) {
        console.log(line);
    })
    .on('close', function() {
    });
    
    0 讨论(0)
  • 2020-12-07 22:22

    With the new version of sdk, the accepted answer does not work - it does not wait for the object to be downloaded. The following code snippet will help with the new version:

    // dependencies
    
    const AWS = require('aws-sdk');
    
    // get reference to S3 client
    
    const s3 = new AWS.S3();
    
    exports.handler = async (event, context, callback) => {
    
    var bucket = "TestBucket"
    
    var key = "TestKey"
    
       try {
    
          const params = {
                Bucket: Bucket,
                Key: Key
            };
    
           var theObject = await s3.getObject(params).promise();
    
        } catch (error) {
            console.log(error);
            return;
        }  
    }
    
    0 讨论(0)
  • 2020-12-07 22:23

    If you want to save memory and want to obtain each row as a json object, then you can use fast-csv to create readstream and can read each row as a json object as follows:

    const csv = require('fast-csv');
    const AWS = require('aws-sdk');
    
    const credentials = new AWS.Credentials("ACCESSKEY", "SECRETEKEY", "SESSIONTOKEN");
    AWS.config.update({
        credentials: credentials, // credentials required for local execution
        region: 'your_region'
    });
    const dynamoS3Bucket = new AWS.S3();
    const stream = dynamoS3Bucket.getObject({ Bucket: 'your_bucket', Key: 'example.csv' }).createReadStream();
    
    var parser = csv.fromStream(stream, { headers: true }).on("data", function (data) {
        parser.pause();  //can pause reading using this at a particular row
        parser.resume(); // to continue reading
        console.log(data);
    }).on("end", function () {
        console.log('process finished');
    });
    
    0 讨论(0)
  • 2020-12-07 22:24

    I couldn't figure why yet, but the createReadStream/pipe approach didn't work for me. I was trying to download a large CSV file (300MB+) and I got duplicated lines. It seemed a random issue. The final file size varied in each attempt to download it.

    I ended up using another way, based on AWS JS SDK examples:

    var s3 = new AWS.S3();
    var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
    var file = require('fs').createWriteStream('/path/to/file.jpg');
    
    s3.getObject(params).
        on('httpData', function(chunk) { file.write(chunk); }).
        on('httpDone', function() { file.end(); }).
        send();
    

    This way, it worked like a charm.

    0 讨论(0)
  • 2020-12-07 22:24

    I prefer Buffer.from(data.Body).toString('utf8'). It supports encoding parameters. With other AWS services (ex. Kinesis Streams) someone may want to replace 'utf8' encoding with 'base64'.

    new AWS.S3().getObject(
      { Bucket: this.awsBucketName, Key: keyName }, 
      function(err, data) {
        if (!err) {
          const body = Buffer.from(data.Body).toString('utf8');
          console.log(body);
        }
      }
    );
    
    0 讨论(0)
提交回复
热议问题