I am dealing with potentially huge CSV files which I want to export from my Rails app, and since it runs on Heroku, my idea was to stream these CSV files directly to S3 when gen
You can use s3 multipart upload that allows upload by splitting large objects to multiple chunks. https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
Multipart upload requires more complex coding but aws-sdk-ruby V3 supports upload_stream
method which seems to execute multipart upload internally and it's very easy to use. Maybe exact solution for this use case.
https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_method
client = Aws::S3::Client.new(
region: 'ap-northeast-1',
credentials: your_credential
)
obj = Aws::S3::Object.new('your-bucket-here', 'path-to-output', client: client)
require "csv"
obj.upload_stream do |write_stream|
[
%w(this is first line),
%w(this is second line),
%w(this is third line),
].each do |line|
write_stream << line.to_csv
end
end
this,is,first,line
this,is,second,line
this,is,third,line
The argument to the upload_stream
block can usually be used as an IO object, which allows you to chain and wrap CSV generation as you would for a file or other IO object:
obj.upload_stream do |write_stream|
CSV(write_stream) do |csv|
[
%w(this is first line),
%w(this is second line),
%w(this is third line),
].each do |line|
csv << line
end
end
end
Or for example, you could compress the CSV while you generate and upload it, using a tempfile to reduce memory footprint:
obj.upload_stream(tempfile: true) do |write_stream|
# When uploading compressed data, use binmode to avoid an encoding error.
write_stream.binmode
Zlib::GzipWriter.wrap(write_stream) do |gzw|
CSV(gzw) do |csv|
[
%w(this is first line),
%w(this is second line),
%w(this is third line),
].each do |line|
csv << line
end
end
end
end
Edited: In the compressed example code, you have to add binmode
to fix the following error:
Aws::S3::MultipartUploadError: multipart upload failed: "\x8D" from ASCII-8BIT to UTF-8