amazon-data-pipeline | 易学教程

Automatic AWS DynamoDB to S3 export failing with “role/DataPipelineDefaultRole is invalid”

阅读更多关于 Automatic AWS DynamoDB to S3 export failing with “role/DataPipelineDefaultRole is invalid”

Precisely following the step-by-step instructions on this page I am trying to export contents of one of my DynamoDB tables to an S3 bucket. I create a pipeline exactly as instructed but it fails to run. It seems that it has trouble identifying/running an EC2 resource to do the export. When I access EMR through AWS Console, I see entries like this: Cluster: df-0..._@EmrClusterForBackup_2015-03-06T00:33:04Terminated with errorsEMR service role arn:aws:iam::...:role/DataPipelineDefaultRole is invalid Why am I getting this message? Do I need to set up/configure something else for the pipeline to

Run a python script via AWS Data Pipelines

阅读更多关于 Run a python script via AWS Data Pipelines

问题 I use AWS Data Pipelines to run nightly SQL queries that populate tables for summary statistics. The UI's a bit funky, but eventually I got it up and working. Now I'd like to do something similar with a python script. I have a file that I run every morning on my laptop ( forecast_rev.py ) but of course that means I have to turn on my laptop and kick this off every day. Surely I can schedule a Pipeline to do the same thing, and thus go away on vacation and not care. For the life of me, I can't

Run a python script via AWS Data Pipelines

阅读更多关于 Run a python script via AWS Data Pipelines

I use AWS Data Pipelines to run nightly SQL queries that populate tables for summary statistics. The UI's a bit funky, but eventually I got it up and working. Now I'd like to do something similar with a python script. I have a file that I run every morning on my laptop ( forecast_rev.py ) but of course that means I have to turn on my laptop and kick this off every day. Surely I can schedule a Pipeline to do the same thing, and thus go away on vacation and not care. For the life of me, I can't find a tutorial, AWS doc, or StackOverflow about this! I'm not even sure how to get started. Does

Need strategy advice for migrating large tables from RDS to DynamoDB

阅读更多关于 Need strategy advice for migrating large tables from RDS to DynamoDB

问题 We have a couple of mySql tables in RDS that are huge (over 700 GB), that we'd like to migrate to a DynamoDB table. Can you suggest a strategy, or a direction to do this in a clean, parallelized way? Perhaps using EMR or the AWS Data Pipeline. 回答1: You can use AWS Pipeline. There are two basic templates, one for moving RDS tables to S3 and the second for importing data from S3 to DynamoDB. You can create your own pipeline using both templates. Regards 回答2: one thing to consider with such

Amazon Data Pipeline: How to use a script argument in a SqlActivity?

阅读更多关于 Amazon Data Pipeline: How to use a script argument in a SqlActivity?

问题 When trying to use a Script Argument in the sqlActivity: { "id" : "ActivityId_3zboU", "schedule" : { "ref" : "DefaultSchedule" }, "scriptUri" : "s3://location_of_script/unload.sql", "name" : "unload", "runsOn" : { "ref" : "Ec2Instance" }, "scriptArgument" : [ "'s3://location_of_unload/#format(minusDays(@scheduledStartTime,1),'YYYY/MM/dd/hhmm/')}'", "'aws_access_key_id=????;aws_secret_access_key=*******'" ], "type" : "SqlActivity", "dependsOn" : { "ref" : "ActivityId_YY69k" }, "database" : {

How to upgrade Data Pipeline definition from EMR 3.x to 4.x/5.x?

阅读更多关于 How to upgrade Data Pipeline definition from EMR 3.x to 4.x/5.x?

I would like to upgrade my AWS data pipeline definition to EMR 4.x or 5.x , so I can take advantage of Hive's latest features (version 2.0+), such as CURRENT_DATE and CURRENT_TIMESTAMP , etc. The change from EMR 3.x to 4.x/5.x requires the use of releaseLabel in EmrCluster , versus amiVersion . When I use a "releaseLabel": "emr-4.1.0" , I get the following error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask Below is my data pipeline definition, for EMR 3.x. It works well, so I hope others find this useful (including the answer for emr 4.x/5.x), as the

How to pipe data from AWS Postgres RDS to S3 (then Redshift)?

阅读更多关于 How to pipe data from AWS Postgres RDS to S3 (then Redshift)?

问题 I'm using AWS data pipeline service to pipe data from a RDS MySql database to s3 and then on to Redshift , which works nicely. However, I also have data living in an RDS Postres instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around? "connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB” 回答1: this doesn't work yet. aws hasnt built / released the functionality to connect nicely to

Exporting a AWS Postgres RDS Table to AWS S3

阅读更多关于 Exporting a AWS Postgres RDS Table to AWS S3

I wanted to use AWS Data Pipeline to pipe data from a Postgres RDS to AWS S3. Does anybody know how this is done? More precisely, I wanted to export a Postgres Table to AWS S3 using data Pipeline. The reason I am using Data Pipeline is I want to automate this process and this export is going to run once every week. Any other suggestions will also work. There is a sample on github. https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RDStoS3 Here is the code: https://github.com/awslabs/data-pipeline-samples/blob/master/samples/RDStoS3/RDStoS3Pipeline.json I built a Pipeline from