Compress file on S3

问题

I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed.

I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the bottleneck (250kB/s).

I've not found any straightforward way to compress the file on S3, or enable compression on transfer in s3cmd, boto, or related tools.

回答1:

S3 does not support stream compression nor is it possible to compress the uploaded file remotely.

If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html

If you need this more frequently

Serving gzipped CSS and JavaScript from Amazon CloudFront via S3

回答2:

Late answer but I found this working perfectly.

aws s3 sync s3://your-pics .

for i in `find | grep -E "\.jpg$|\.jpg$"`; do gzip  "$i" ; echo $i;  done

aws s3 sync . s3://your-pics --content-encoding gzip --dryrun

This will download all files in s3 bucket to the machine (or ec2 instance), compresses the image files and upload them back to s3 bucket. Verify the data before removing dryrun flag.

来源：https://stackoverflow.com/questions/14495176/compress-file-on-s3

标签

amazon-s3

compression

Hive

file-transfer

emr

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!