Amazon Redshift to Mysql using Pentaho Data Integration

后端 未结 2 1989
栀梦
栀梦 2021-01-27 23:49

We are using Amazon redshift and the data base is POSTGRESQL.Tha data sit in amazon cloud. We need to load data from Amazon redshift to Mysql using Pentaho Data Integration Soft

2条回答
  •  星月不相逢
    2021-01-27 23:59

    We have solved exactly same problem in my current project, where we need to aggregate large data set from RedShift and need to import aggregated data into MySql for dashboard reports. If you have already decided to use Pantaho tools, well and good. Its really nice tool, but we took alternate approach because we had large data set and Spead we got with Kettle/Spoon was not meeting our benchmarks and business needs.

    I'm here by listing the solution, so that it may be help-full to someone else.

    1. Fire a psql select command and redirect the resultset into CSV/TXT File.

      psql -U $User -d $db_name -c "Copy (Select * From foo_table LIMIT 10) To STDOUT With CSV HEADER DELIMITER '|';" > foo_data.csv
      
    2. User MySQlImport utility to import data into mysql.

      mysqlimport --local --compress  -u $MYSQL_USER -p$MYSQL_PASSWORD -h $MYSQL_HOST $MYSQL_DATABASE --fields-terminated-by='|' --ignore-lines=1 --columns C1|C2|C3|..|C4 TABLE_NAME.CSV
      

      With above approach, we have achieved ~100 times faster results.

    Using same approach mysql to RedShift is doable too, the only change will, you may need to push mysql exported CSVs to S3 or enable SSH to use copy command in psql scripts.

提交回复
热议问题