问题
I've got a whole heap of files on a server, and I want to upload these onto S3. The files are stored with a .data extension, but really they're just a bunch of jpegs,pngs,zips or pdfs.
I've already written a short script which finds the mime type and uploads them onto S3 and that works but it's slow. Is there any way to make the below run using gnu parallel?
#!/bin/bash
for n in $(find -name "*.data")
do
data=".data"
extension=`file $n | cut -d ' ' -f2 | awk '{print tolower($0)}'`
mimetype=`file --mime-type $n | cut -d ' ' -f2`
fullpath=`readlink -f $n`
changed="${fullpath/.data/.$extension}"
filePathWithExtensionChanged=${changed#*internal_data}
s3upload="s3cmd put -m $mimetype --acl-public $fullpath s3://tff-xenforo-data"$filePathWithExtensionChanged
response=`$s3upload`
echo $response
done
Also I'm sure this code could be greatly improved in general :) Feedback tips would be greatly appreciated.
回答1:
You are clearly skilled in writing shell, and extremely close to a solution:
s3upload_single() {
n=$1
data=".data"
extension=`file $n | cut -d ' ' -f2 | awk '{print tolower($0)}'`
mimetype=`file --mime-type $n | cut -d ' ' -f2`
fullpath=`readlink -f $n`
changed="${fullpath/.data/.$extension}"
filePathWithExtensionChanged=${changed#*internal_data}
s3upload="s3cmd put -m $mimetype --acl-public $fullpath s3://tff-xenforo-data"$filePathWithExtensionChanged
response=`$s3upload`
echo $response
}
export -f s3upload_single
find -name "*.data" | parallel s3upload_single
回答2:
you can just use s3cmd-modified which allows you to put/get/sync with multiple workers in parallel
$ git clone https://github.com/pcorliss/s3cmd-modification.git
$ cd s3cmd-modification
$ python setup.py install
$ s3cmd --parallel --workers=4 sync /source/path s3://target/path
回答3:
Try s3-cli: Command line utility frontend to node-s3-client. Inspired by s3cmd and attempts to be a drop-in replacement.
Paraphrasing from https://erikzaadi.com/2015/04/27/s3cmd-is-dead-long-live-s3-cli/ :
This is a inplace replace to s3cmd, written in node (yaay!), which works flawlessly with the existing s3cmd configuration, which (amongs other awsome stuff), uploads to S3 in parallel, saving LOADS of time.
- system "s3cmd sync --delete-removed . s3://yourbucket.com/" + system "s3-cli sync --delete-removed . s3://yourbucket.com/"
回答4:
Use aws cli. It supports parallel upload of files and it is really fast while uploading and downloading.
http://docs.aws.amazon.com/cli/latest/reference/s3/
来源:https://stackoverflow.com/questions/26934506/uploading-files-to-s3-using-s3cmd-in-parallel