Upload CSV stream from Ruby to S3

前端 未结 3 633
陌清茗
陌清茗 2021-02-13 20:56

I am dealing with potentially huge CSV files which I want to export from my Rails app, and since it runs on Heroku, my idea was to stream these CSV files directly to S3 when gen

相关标签:
3条回答
  • 2021-02-13 20:57

    I would have a look at http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html#write-instance_method as that might be what you're looking for.

    EDIT http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjSingleOpRuby.html might be more relevant as the first link points to ruby aws-sdk v1

    require 'aws-sdk'
    
    s3 = Aws::S3::Resource.new(region:'us-west-2')
    obj = s3.bucket('bucket-name').object('key')
    
    # string data
    obj.put(body: 'Hello World!')
    
    # IO object
    File.open('source', 'rb') do |file|
      obj.put(body: file)
    end
    
    0 讨论(0)
  • 2021-02-13 21:07
    s3 = Aws::S3::Resource.new(region:'us-west-2')
    obj = s3.bucket.object("#{FOLDER_NAME}/#{file_name}.csv")
    file_csv = CSV.generate do |csv|
        csv << ActionLog.column_names
        ActionLog.all.each do |action_log|
          csv << action_log.attributes.values
        end
      end
      obj.put body: file_csv
    

    file_csv = CSV.generate is to create a string of CSV data in Ruby. After creating this string of CSV, we put to S3 using bucket, with the path

    #{FOLDER_NAME}/#{file_name}.csv
    

    In my code, I export all the data to an ActionLog model.

    0 讨论(0)
  • 2021-02-13 21:16

    You can use s3 multipart upload that allows upload by splitting large objects to multiple chunks. https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html

    Multipart upload requires more complex coding but aws-sdk-ruby V3 supports upload_stream method which seems to execute multipart upload internally and it's very easy to use. Maybe exact solution for this use case. https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_method

    client = Aws::S3::Client.new(
      region: 'ap-northeast-1',
      credentials: your_credential
    )
    
    obj = Aws::S3::Object.new('your-bucket-here', 'path-to-output', client: client)
    
    require "csv"
    obj.upload_stream do |write_stream|
      [
        %w(this is first line),
        %w(this is second line),
        %w(this is third line),
      ].each do |line|
        write_stream << line.to_csv
      end
    end
    
    this,is,first,line
    this,is,second,line
    this,is,third,line
    

    The argument to the upload_stream block can usually be used as an IO object, which allows you to chain and wrap CSV generation as you would for a file or other IO object:

    obj.upload_stream do |write_stream|
      CSV(write_stream) do |csv|
        [
          %w(this is first line),
          %w(this is second line),
          %w(this is third line),
        ].each do |line|
          csv << line
        end
      end
    end
    

    Or for example, you could compress the CSV while you generate and upload it, using a tempfile to reduce memory footprint:

    obj.upload_stream(tempfile: true) do |write_stream|
      # When uploading compressed data, use binmode to avoid an encoding error.
      write_stream.binmode
    
      Zlib::GzipWriter.wrap(write_stream) do |gzw|
        CSV(gzw) do |csv|
          [
            %w(this is first line),
            %w(this is second line),
            %w(this is third line),
          ].each do |line|
            csv << line
          end
        end
      end
    end
    

    Edited: In the compressed example code, you have to add binmode to fix the following error:

    Aws::S3::MultipartUploadError: multipart upload failed: "\x8D" from ASCII-8BIT to UTF-8
    
    0 讨论(0)
提交回复
热议问题