Amazon Redshift to Mysql using Pentaho Data Integration

寵の児 提交于 2020-01-30 11:29:26

问题


We are using Amazon redshift and the data base is POSTGRESQL.Tha data sit in amazon cloud. We need to load data from Amazon redshift to Mysql using Pentaho Data Integration Software.Could you please tell us how to connect to Redshift via Pentaho ???


回答1:


I'll try to help you.

The redshift connection will need the PostgreSql JDBC in the lib folder of your pentaho data-integration. But the one that comes with Pentaho have some issues with redshift, this may be solved by removing the existent and use the version 8.4 (as seen on this link)

After that you may create a new connection on a transformation, using a table input step. You query should run just fine.

You may add a table output step, connected to a mysql database (you'll need to download de MySQL JDBC connector and place on the lib folder too).

An alternative output is the MySQL Bulk Loader Step that has a awesome performance. But for first tests the Table Output Step should do the work.




回答2:


We have solved exactly same problem in my current project, where we need to aggregate large data set from RedShift and need to import aggregated data into MySql for dashboard reports. If you have already decided to use Pantaho tools, well and good. Its really nice tool, but we took alternate approach because we had large data set and Spead we got with Kettle/Spoon was not meeting our benchmarks and business needs.

I'm here by listing the solution, so that it may be help-full to someone else.

  1. Fire a psql select command and redirect the resultset into CSV/TXT File.

    psql -U $User -d $db_name -c "Copy (Select * From foo_table LIMIT 10) To STDOUT With CSV HEADER DELIMITER '|';" > foo_data.csv
    
  2. User MySQlImport utility to import data into mysql.

    mysqlimport --local --compress  -u $MYSQL_USER -p$MYSQL_PASSWORD -h $MYSQL_HOST $MYSQL_DATABASE --fields-terminated-by='|' --ignore-lines=1 --columns C1|C2|C3|..|C4 TABLE_NAME.CSV
    

    With above approach, we have achieved ~100 times faster results.

Using same approach mysql to RedShift is doable too, the only change will, you may need to push mysql exported CSVs to S3 or enable SSH to use copy command in psql scripts.



来源:https://stackoverflow.com/questions/26926048/amazon-redshift-to-mysql-using-pentaho-data-integration

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!