How to pipe data from AWS Postgres RDS to S3 (then Redshift)?

假如想象 提交于 2019-11-29 03:11:50

问题


I'm using AWS data pipeline service to pipe data from a RDS MySql database to s3 and then on to Redshift, which works nicely.

However, I also have data living in an RDS Postres instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?

"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”

回答1:


this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.

something like this:

{
  "id": "DumpCommand",
  "type": "ShellCommandActivity",
  "runsOn": { "ref": "MyEC2Resource" },
  "stage": "true",
  "output": { "ref": "S3ForRedshiftDataNode" },
  "command": "PGPASSWORD=password psql -h HOST -U USER -d DATABASE -p 5432 -t -A -F\",\" -c \"select blah_id from blahs\" > ${OUTPUT1_STAGING_DIR}/my_data.csv"
}

i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.

  • pros: super straightforward and requires no additional script files to upload to s3
  • cons: not exactly secure. your db password will be transmitted over the wire without encryption.

look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.




回答2:


Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:

  1. Create a data node of the type SqlDataNode. Specify table name and select query
  2. Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
  3. Create a data node of the type S3DataNode
  4. Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output



回答3:


AWS now allow partners to do near real time RDS -> Redshift inserts.

https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/



来源:https://stackoverflow.com/questions/26781758/how-to-pipe-data-from-aws-postgres-rds-to-s3-then-redshift

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!