问题
I wanted to use AWS Data Pipeline to pipe data from a Postgres RDS to AWS S3. Does anybody know how this is done?
More precisely, I wanted to export a Postgres Table to AWS S3 using data Pipeline. The reason I am using Data Pipeline is I want to automate this process and this export is going to run once every week.
Any other suggestions will also work.
回答1:
There is a sample on github. https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RDStoS3
Here is the code: https://github.com/awslabs/data-pipeline-samples/blob/master/samples/RDStoS3/RDStoS3Pipeline.json
回答2:
I built a Pipeline from scratch using the MySQL and the documentation as reference.
You need to have the roles on place, DataPipelineDefaultResourceRole && DataPipelineDefaultRole.
I haven't load the parameters, so, you need to get into the architech and put your credentials and folders.
Hope it helps.
{
"objects": [
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "#{myS3LogsPath}",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"database": {
"ref": "DatabaseId_WC2j5"
},
"name": "DefaultSqlDataNode1",
"id": "SqlDataNodeId_VevnE",
"type": "SqlDataNode",
"selectQuery": "#{myRDSSelectQuery}",
"table": "#{myRDSTable}"
},
{
"*password": "#{*myRDSPassword}",
"name": "RDS_database",
"id": "DatabaseId_WC2j5",
"type": "RdsDatabase",
"rdsInstanceId": "#{myRDSId}",
"username": "#{myRDSUsername}"
},
{
"output": {
"ref": "S3DataNodeId_iYhHx"
},
"input": {
"ref": "SqlDataNodeId_VevnE"
},
"name": "DefaultCopyActivity1",
"runsOn": {
"ref": "ResourceId_G9GWz"
},
"id": "CopyActivityId_CapKO",
"type": "CopyActivity"
},
{
"dependsOn": {
"ref": "CopyActivityId_CapKO"
},
"filePath": "#{myS3Container}#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
"name": "DefaultS3DataNode1",
"id": "S3DataNodeId_iYhHx",
"type": "S3DataNode"
},
{
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"instanceType": "m1.medium",
"name": "DefaultResource1",
"id": "ResourceId_G9GWz",
"type": "Ec2Resource",
"terminateAfter": "30 Minutes"
}
],
"parameters": [
]
}
回答3:
You can define a copy-activity in the Data Pipeline interface to extract data from a Postgres RDS instance into S3.
- Create a data node of the type SqlDataNode. Specify table name and select query.
- Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
- Create a data node of the type S3DataNode.
- Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output.
Another option is to use an external tool like Alooma. Alooma can replicate tables from PostgreSQL database hosted Amazon RDS to Amazon S3 (https://www.alooma.com/integrations/postgresql/s3). The process can be automated and you can run it once a week.
来源:https://stackoverflow.com/questions/39899210/exporting-a-aws-postgres-rds-table-to-aws-s3