Upload CSV stream from Ruby to S3

前端未结

关注

 3  645

陌清茗 2021-02-13 20:56

I am dealing with potentially huge CSV files which I want to export from my Rails app, and since it runs on Heroku, my idea was to stream these CSV files directly to S3 when gen

3条回答

自闭症患者 (楼主)

2021-02-13 21:16

You can use s3 multipart upload that allows upload by splitting large objects to multiple chunks. https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html

Multipart upload requires more complex coding but aws-sdk-ruby V3 supports upload_stream method which seems to execute multipart upload internally and it's very easy to use. Maybe exact solution for this use case. https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_method

client = Aws::S3::Client.new(
  region: 'ap-northeast-1',
  credentials: your_credential
)

obj = Aws::S3::Object.new('your-bucket-here', 'path-to-output', client: client)

require "csv"
obj.upload_stream do |write_stream|
  [
    %w(this is first line),
    %w(this is second line),
    %w(this is third line),
  ].each do |line|
    write_stream << line.to_csv
  end
end

this,is,first,line
this,is,second,line
this,is,third,line

The argument to the upload_stream block can usually be used as an IO object, which allows you to chain and wrap CSV generation as you would for a file or other IO object:

obj.upload_stream do |write_stream|
  CSV(write_stream) do |csv|
    [
      %w(this is first line),
      %w(this is second line),
      %w(this is third line),
    ].each do |line|
      csv << line
    end
  end
end

Or for example, you could compress the CSV while you generate and upload it, using a tempfile to reduce memory footprint:

obj.upload_stream(tempfile: true) do |write_stream|
  # When uploading compressed data, use binmode to avoid an encoding error.
  write_stream.binmode

  Zlib::GzipWriter.wrap(write_stream) do |gzw|
    CSV(gzw) do |csv|
      [
        %w(this is first line),
        %w(this is second line),
        %w(this is third line),
      ].each do |line|
        csv << line
      end
    end
  end
end

Edited: In the compressed example code, you have to add binmode to fix the following error:

Aws::S3::MultipartUploadError: multipart upload failed: "\x8D" from ASCII-8BIT to UTF-8

0 讨论(0)

查看其它3个回答