ShellCommandActivity in AWS Data Pipeline

问题

I am transferring Dynamo DB data to S3 using Data Pipeline. In the S3 bucket I get the backup but it is split into multiple files. To get the data in a single file I used a Shell Command Activity which runs the following command:

aws s3 cat #{myOutputS3Loc}/#{format(@scheduledStartTime,'YYYY-MM-dd')}/* > #{myRenamedFile}

This should concatenate all the files present in the S3 folder to a single file named #{myRenamedFile}. But I get the following error in data pipeline:

usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters] To see help text, you can run: aws help aws <command> help aws <command> <subcommand> help aws: error: argument subcommand: Invalid choice, valid choices are: ls | website cp | mv rm | sync mb | rb

Does this mean cat is not supported in Shell Command Activity or is there something wrong here? Is there any other method to combine the different files to a single file in S3 itself?

回答1:

There is no cat command in aws s3. Other options:

cp/sync the files and catenate all the files using cat command in shell
Get the file names and loop through the list by calling aws s3 cp s3://<file> - and append the output to a new file. You can do this in a single command with --recursive option to cp but --recursive is not supported if the file is copied to stdout

来源：https://stackoverflow.com/questions/36217110/shellcommandactivity-in-aws-data-pipeline

标签

shell

amazon-web-services

amazon-s3

amazon-data-pipeline