amazon-data-pipeline

How to transfer a file/files from one S3 bucket/directory to another using AWS Data Pipeline

爱⌒轻易说出口 提交于 2019-12-13 18:07:48
问题 I would like to transfer a file (i.e copy it to a target directory and delete it from the source directory) from one S3 bucket directory to another using AWS data pipeline. I tried using the ShellCommandActivity and made a script that would move a file/files from one S3 bucket/directory to another. But the result was that it only copied it to the target S3 bucket/directory and did not remove the file in the S3 source directory. Thanks in advance! 回答1: If you want to remove something from an

AWS Data Pipelines with a Heroku Database

纵饮孤独 提交于 2019-12-13 03:45:56
问题 I'm wondering about the feasibility of connecting an AWS Data Pipeline to a Heroku Database. The heroku databases are stored on EC2 instances (east region), and require SSL. I've tried to open up a connection using a JdbcDatabase Object, but have run into issues at every turn. I've tried the following: { "id" : "heroku_database", "name" : "heroku_database", "type" : "JdbcDatabase", "jdbcDriverClass" : "org.postgresql.Driver", "connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#

How do I call a stored procedure in SQL Server with Data Pipeline in ShellCommandActivity (AWS Data Pipeline)

烂漫一生 提交于 2019-12-12 07:03:42
问题 I know you can call a MySQL procedure with the script below, but is the same possible for SQL Server? mysql --host host_url --port port_number --user username --password password --execute="CALL stored_proc_name; I have SQL Server Express, and need to setup a procedure to be run daily. It's on RDS, and SQL Server Express doesn't have a task scheduler.. 回答1: The following should work: Download the SQL Server JDBC Driver. Choose to download the tar.gz file and unzip it. Among the extracted

How to fail a ShellCommandActivity on AWS Data pipeline

只谈情不闲聊 提交于 2019-12-11 15:48:14
问题 I am using AWS Data Pipeline and specifically the ShellCommandActivity object. This object calls a python script which extracts data via ftp, etc. There exists a chance that the file may not exist or the file may be of the wrong type and the pipeline can no longer run. If the python script errors out, or fails, the ShellCommandActivity object still gets marked as Finished, when I"m trying to get it to Failure. I have tried, in python, doing sys.exit(500) on failure, but the object still gets

Security-Configuration Field For AWS Data Pipeline EmrCluster

不羁岁月 提交于 2019-12-11 06:24:28
问题 I created an AWS EMR Cluster through the regular EMR Cluster wizard on the AWS Management Console and I was able to select a security-configuration e.g., when you export the CLI command it's --security-configuration 'mySecurityConfigurationValue' . I now need to create a similar EMR through the AWS Data Pipeline but I don't see any options where I can specify this security-configuration field. The only similar fields I see are EmrManagedSlaveSecurityGroup, EmrManagedMasterSecurityGroup,

How does AWS Data Pipeline run an EC2 instance?

ぐ巨炮叔叔 提交于 2019-12-11 02:23:46
问题 I have an AWS Data Pipeline built and keep getting warnings on an EC2 resource's TerminateAfter field being missing. My DataPipeline is designed to use the same instance various times throughout the entire process, which is to run every hour (I haven't run the pipeline yet). So if I set the Terminate After field to 3 minutes, I'm wondering if the EC2 instance is terminated 3 minutes after every time it is spun up. Or is the EC2 instance terminated 3 minutes after the last time it is used in

AWS EMR Spark: Error: Cannot load main class from JAR

▼魔方 西西 提交于 2019-12-10 20:44:07
问题 I am trying to submit a spark job to AWS EMR cluster using AWS console. But it fails with: Cannot load main class from JAR . The job runs successfully when I specify main class as --class in Arguments option in AWS EMR Console-> Add Step. On the local machine, the job seems to work perfectly fine when no main class is specified as below: ./spark-submit /home/astro/spark-programs/SpotEMR/MyJob.jar I have set main class to jar using run configuration. The main reason to avoid passing main class

Using a custom AMI (with s3cmd) in a Datapipeline

大憨熊 提交于 2019-12-07 13:35:26
问题 How can I install s3cmd on a AMI that is used in the pipeline? This should be a fairly basic thing to do but I can't seem to get it done: Here's what I've tried: Started a Pipeline without the Image-id option => Everything works fine Navigated to EC2 and created an Image of the running Instance to make sure all the needed stuff to run in the pipeline is installed on my custom AMI Started this AMI manually as an Instance SSH'd into the machine and installed S3cmd Created another Image of the

Automating Hive Activity using aws

旧城冷巷雨未停 提交于 2019-12-06 06:59:13
问题 I would like to automate my hive script every day , in order to do that i have an option which is data pipeline. But the problem is there that i am exporting data from dynamo-db to s3 and with a hive script i am manipulating this data. I am giving this input and output in hive-script that's where the problem starts because a hive-activity has to have input and output but i have to give them in script file. I am trying to find a way to automate this hive-script and waiting for some ideas ?

Using a custom AMI (with s3cmd) in a Datapipeline

柔情痞子 提交于 2019-12-06 03:27:31
How can I install s3cmd on a AMI that is used in the pipeline? This should be a fairly basic thing to do but I can't seem to get it done: Here's what I've tried: Started a Pipeline without the Image-id option => Everything works fine Navigated to EC2 and created an Image of the running Instance to make sure all the needed stuff to run in the pipeline is installed on my custom AMI Started this AMI manually as an Instance SSH'd into the machine and installed S3cmd Created another Image of the machine, this time with s3cmd installed Shut down the Instance Started the Pipeline again, this time