We are using Amazon redshift and the data base is POSTGRESQL.Tha data sit in amazon cloud. We need to load data from Amazon redshift to Mysql using Pentaho Data Integration Soft
We have solved exactly same problem in my current project, where we need to aggregate large data set from RedShift and need to import aggregated data into MySql for dashboard reports. If you have already decided to use Pantaho tools, well and good. Its really nice tool, but we took alternate approach because we had large data set and Spead we got with Kettle/Spoon was not meeting our benchmarks and business needs.
I'm here by listing the solution, so that it may be help-full to someone else.
Fire a psql select command and redirect the resultset into CSV/TXT File.
psql -U $User -d $db_name -c "Copy (Select * From foo_table LIMIT 10) To STDOUT With CSV HEADER DELIMITER '|';" > foo_data.csv
User MySQlImport utility to import data into mysql.
mysqlimport --local --compress -u $MYSQL_USER -p$MYSQL_PASSWORD -h $MYSQL_HOST $MYSQL_DATABASE --fields-terminated-by='|' --ignore-lines=1 --columns C1|C2|C3|..|C4 TABLE_NAME.CSV
With above approach, we have achieved ~100 times faster results.
Using same approach mysql to RedShift is doable too, the only change will, you may need to push mysql exported CSVs to S3 or enable SSH to use copy command in psql scripts.