问题
We are using Amazon redshift and the data base is POSTGRESQL.Tha data sit in amazon cloud. We need to load data from Amazon redshift to Mysql using Pentaho Data Integration Software.Could you please tell us how to connect to Redshift via Pentaho ???
回答1:
I'll try to help you.
The redshift connection will need the PostgreSql JDBC in the lib folder of your pentaho data-integration. But the one that comes with Pentaho have some issues with redshift, this may be solved by removing the existent and use the version 8.4 (as seen on this link)
After that you may create a new connection on a transformation, using a table input step. You query should run just fine.
You may add a table output step, connected to a mysql database (you'll need to download de MySQL JDBC connector and place on the lib folder too).
An alternative output is the MySQL Bulk Loader Step that has a awesome performance. But for first tests the Table Output Step should do the work.
回答2:
We have solved exactly same problem in my current project, where we need to aggregate large data set from RedShift and need to import aggregated data into MySql for dashboard reports. If you have already decided to use Pantaho tools, well and good. Its really nice tool, but we took alternate approach because we had large data set and Spead we got with Kettle/Spoon was not meeting our benchmarks and business needs.
I'm here by listing the solution, so that it may be help-full to someone else.
Fire a psql select command and redirect the resultset into CSV/TXT File.
psql -U $User -d $db_name -c "Copy (Select * From foo_table LIMIT 10) To STDOUT With CSV HEADER DELIMITER '|';" > foo_data.csv
User MySQlImport utility to import data into mysql.
mysqlimport --local --compress -u $MYSQL_USER -p$MYSQL_PASSWORD -h $MYSQL_HOST $MYSQL_DATABASE --fields-terminated-by='|' --ignore-lines=1 --columns C1|C2|C3|..|C4 TABLE_NAME.CSV
With above approach, we have achieved ~100 times faster results.
Using same approach mysql to RedShift is doable too, the only change will, you may need to push mysql exported CSVs to S3 or enable SSH to use copy command in psql scripts.
来源:https://stackoverflow.com/questions/26926048/amazon-redshift-to-mysql-using-pentaho-data-integration