amazon-data-pipeline | 易学教程

How to transfer a file/files from one S3 bucket/directory to another using AWS Data Pipeline

阅读更多关于 How to transfer a file/files from one S3 bucket/directory to another using AWS Data Pipeline

问题 I would like to transfer a file (i.e copy it to a target directory and delete it from the source directory) from one S3 bucket directory to another using AWS data pipeline. I tried using the ShellCommandActivity and made a script that would move a file/files from one S3 bucket/directory to another. But the result was that it only copied it to the target S3 bucket/directory and did not remove the file in the S3 source directory. Thanks in advance! 回答1: If you want to remove something from an

AWS Data Pipelines with a Heroku Database

阅读更多关于 AWS Data Pipelines with a Heroku Database

问题 I'm wondering about the feasibility of connecting an AWS Data Pipeline to a Heroku Database. The heroku databases are stored on EC2 instances (east region), and require SSL. I've tried to open up a connection using a JdbcDatabase Object, but have run into issues at every turn. I've tried the following: { "id" : "heroku_database", "name" : "heroku_database", "type" : "JdbcDatabase", "jdbcDriverClass" : "org.postgresql.Driver", "connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#

How do I call a stored procedure in SQL Server with Data Pipeline in ShellCommandActivity (AWS Data Pipeline)

阅读更多关于 How do I call a stored procedure in SQL Server with Data Pipeline in ShellCommandActivity (AWS Data Pipeline)

问题 I know you can call a MySQL procedure with the script below, but is the same possible for SQL Server? mysql --host host_url --port port_number --user username --password password --execute="CALL stored_proc_name; I have SQL Server Express, and need to setup a procedure to be run daily. It's on RDS, and SQL Server Express doesn't have a task scheduler.. 回答1: The following should work: Download the SQL Server JDBC Driver. Choose to download the tar.gz file and unzip it. Among the extracted

How to fail a ShellCommandActivity on AWS Data pipeline

阅读更多关于 How to fail a ShellCommandActivity on AWS Data pipeline

问题 I am using AWS Data Pipeline and specifically the ShellCommandActivity object. This object calls a python script which extracts data via ftp, etc. There exists a chance that the file may not exist or the file may be of the wrong type and the pipeline can no longer run. If the python script errors out, or fails, the ShellCommandActivity object still gets marked as Finished, when I"m trying to get it to Failure. I have tried, in python, doing sys.exit(500) on failure, but the object still gets

Security-Configuration Field For AWS Data Pipeline EmrCluster

阅读更多关于 Security-Configuration Field For AWS Data Pipeline EmrCluster

问题 I created an AWS EMR Cluster through the regular EMR Cluster wizard on the AWS Management Console and I was able to select a security-configuration e.g., when you export the CLI command it's --security-configuration 'mySecurityConfigurationValue' . I now need to create a similar EMR through the AWS Data Pipeline but I don't see any options where I can specify this security-configuration field. The only similar fields I see are EmrManagedSlaveSecurityGroup, EmrManagedMasterSecurityGroup,

How does AWS Data Pipeline run an EC2 instance?

阅读更多关于 How does AWS Data Pipeline run an EC2 instance?

问题 I have an AWS Data Pipeline built and keep getting warnings on an EC2 resource's TerminateAfter field being missing. My DataPipeline is designed to use the same instance various times throughout the entire process, which is to run every hour (I haven't run the pipeline yet). So if I set the Terminate After field to 3 minutes, I'm wondering if the EC2 instance is terminated 3 minutes after every time it is spun up. Or is the EC2 instance terminated 3 minutes after the last time it is used in

AWS EMR Spark: Error: Cannot load main class from JAR

阅读更多关于 AWS EMR Spark: Error: Cannot load main class from JAR

问题 I am trying to submit a spark job to AWS EMR cluster using AWS console. But it fails with: Cannot load main class from JAR . The job runs successfully when I specify main class as --class in Arguments option in AWS EMR Console-> Add Step. On the local machine, the job seems to work perfectly fine when no main class is specified as below: ./spark-submit /home/astro/spark-programs/SpotEMR/MyJob.jar I have set main class to jar using run configuration. The main reason to avoid passing main class

Using a custom AMI (with s3cmd) in a Datapipeline

阅读更多关于 Using a custom AMI (with s3cmd) in a Datapipeline

问题 How can I install s3cmd on a AMI that is used in the pipeline? This should be a fairly basic thing to do but I can't seem to get it done: Here's what I've tried: Started a Pipeline without the Image-id option => Everything works fine Navigated to EC2 and created an Image of the running Instance to make sure all the needed stuff to run in the pipeline is installed on my custom AMI Started this AMI manually as an Instance SSH'd into the machine and installed S3cmd Created another Image of the

Automating Hive Activity using aws

阅读更多关于 Automating Hive Activity using aws

问题 I would like to automate my hive script every day , in order to do that i have an option which is data pipeline. But the problem is there that i am exporting data from dynamo-db to s3 and with a hive script i am manipulating this data. I am giving this input and output in hive-script that's where the problem starts because a hive-activity has to have input and output but i have to give them in script file. I am trying to find a way to automate this hive-script and waiting for some ideas ?

Using a custom AMI (with s3cmd) in a Datapipeline

阅读更多关于 Using a custom AMI (with s3cmd) in a Datapipeline

How can I install s3cmd on a AMI that is used in the pipeline? This should be a fairly basic thing to do but I can't seem to get it done: Here's what I've tried: Started a Pipeline without the Image-id option => Everything works fine Navigated to EC2 and created an Image of the running Instance to make sure all the needed stuff to run in the pipeline is installed on my custom AMI Started this AMI manually as an Instance SSH'd into the machine and installed S3cmd Created another Image of the machine, this time with s3cmd installed Shut down the Instance Started the Pipeline again, this time