Using amazon data pipeline to backup dynamoDB data to S3

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-05 03:30:58

问题


I need to backup my dynamoDB table data to S3 using amazon Data pipeline.

My question is- Can i use a single data pipeline to backup multiple dynamoDB tables to S3, or do I have to make a separate pipeline for each of them??

Also, since my tables have a year_month prefix( ex- 2014_3_tableName) , I was thinking of using datapipeline SDK to change the table name in pipeline definition once the month changes.Will this work? Is there an alternate/better way??

Thanks!!


回答1:


If you are setting up your Data Pipeline through the DynamoDB Console's Import/Export button, you will have to create a separate pipeline per table. If you are using Data Pipeline directly (either through the Data Pipeline API or through the Data Pipeline console), you can export multiple tables in the same pipeline. For each table, simply add an additional DynamoDBDataNode, and an EmrActivity to link that Data Node to the output S3DataNode.

Regarding your year_month prefix use case, using the data pipeline sdk to change the table names periodically seems like the best approach. Another approach could be to make a copy of the script that export EmrActivity is running (you can see the script location under the "step" of the activity), and instead change the way that the hive script determines the table name by checking the current date. You would need to make a copy of this script and host the modified script in your own S3 bucket, and point the EmrActivity to that location instead of the default. I have not tried either approach before, but both are theoretically possible.

More general information about exporting DynamoDB tables can be found in the DynamoDB Developer Guide, and more detailed information can be found in the AWS Data Pipeline developer guide.




回答2:


Its a old question but I was looking for the answer in last days. When adding multiple DynamoDBDataNode, you can still use one single S3DataNode like output. Just differentiate folders in the S3 bucket through specifying different output.directoryPath in the EmrActivity Step field.

Like this: #{output.directoryPath}/newFolder

Every new folder will be automatically created in the s3 bucket.



来源:https://stackoverflow.com/questions/23510704/using-amazon-data-pipeline-to-backup-dynamodb-data-to-s3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!