AWS Data Pipelines with a Heroku Database

纵饮孤独 提交于 2019-12-13 03:45:56


I'm wondering about the feasibility of connecting an AWS Data Pipeline to a Heroku Database. The heroku databases are stored on EC2 instances (east region), and require SSL.

I've tried to open up a connection using a JdbcDatabase Object, but have run into issues at every turn.

I've tried the following:

      "id" : "heroku_database",
      "name" : "heroku_database",
      "type" : "JdbcDatabase",
      "jdbcDriverClass" : "org.postgresql.Driver",
      "connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
      "jdbcProperties": "ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory",
      "username" : "#{myHerokuDatabaseUserName}",
      "*password" : "#{*myHerokuDatabasePassword}"

with the result of:

unable to find valid certification path to requested target

as well as:

      "id" : "heroku_database",
      "name" : "heroku_database",
      "type" : "JdbcDatabase",
      "jdbcDriverClass" : "org.postgresql.Driver",
      "connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
      "jdbcProperties": "sslmode=require",
      "username" : "#{myHerokuDatabaseUserName}",
      "*password" : "#{*myHerokuDatabasePassword}"

with the result of:

amazonaws.datapipeline.database.ConnectionFactory: Unable to establish connection to jdbc:postgresql:// FATAL: no pg_hba.conf entry for host "", user "redacted", database "redacted", SSL off

To boot -- I have also tried to use a ShellCommandActivity to copy the postgres table from the ec2 instance and stdout it to my s3 bucket -- however the ec2 instance doesn't understand the psql command:

      "id": "herokuDatabaseDump",
      "name": "herokuDatabaseDump",
      "type": "ShellCommandActivity",
      "runsOn": { 
        "ref": "Ec2Instance" 
      "stage": "true",
      "stdout": "#{myOutputS3Loc}/#{myOutputFileName}",
      "command": "PGPASSWORD=#{*myHerokuDatabasePassword} psql -h #{myHerokuDatabaseHost} -U #{myHerokuDatabaseUserName} -d #{myHerokuDatabaseName} -p #{myHerokuDatabasePort} -t -A -F',' -c 'select * #{myHerokuDatabaseTableName}'"

and I also cannot yum install postgres beforehand.

It sucks to have both RDS and Heroku as our database sources. Any ideas on how to get a select query to run against a heroku postgres db via a data pipeline would be a great help. Thanks.


It looks like Heroku needs/wants the postgres 42.2.1 driver: Or at least if you are compiling a java app that's what they tell you to use.

I wasn't able to find out which driver Data Pipeline uses by default but it allows you to use the jdbcDriverJarUri and specify custom driver jars:

An important note here is that it requires Java7, so you are going to want to use the postgres-42.2.1.jre7.jar:

That combined with a jdbcProperties field of sslmode=require should allow it to go through and create the dump file you are looking for.

