问题
I am transferring Dynamo DB data to S3 using Data Pipeline. In the S3 bucket I get the backup but it is split into multiple files. To get the data in a single file I used a Shell Command Activity which runs the following command:
aws s3 cat #{myOutputS3Loc}/#{format(@scheduledStartTime,'YYYY-MM-dd')}/* > #{myRenamedFile}
This should concatenate all the files present in the S3 folder to a single file named #{myRenamedFile}
. But I get the following error in data pipeline:
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters] To see help text, you can run: aws help aws <command> help aws <command> <subcommand> help aws: error: argument subcommand: Invalid choice, valid choices are: ls | website cp | mv rm | sync mb | rb
Does this mean cat
is not supported in Shell Command Activity or is there something wrong here? Is there any other method to combine the different files to a single file in S3 itself?
回答1:
There is no cat
command in aws s3
. Other options:
cp/sync
the files and catenate all the files usingcat
command in shell- Get the file names and loop through the list by calling
aws s3 cp s3://<file> -
and append the output to a new file. You can do this in a single command with--recursive
option tocp
but--recursive
is not supported if the file is copied to stdout
来源:https://stackoverflow.com/questions/36217110/shellcommandactivity-in-aws-data-pipeline