Run a python script via AWS Data Pipelines

前端 未结 2 1169
一整个雨季
一整个雨季 2021-01-22 00:33

I use AWS Data Pipelines to run nightly SQL queries that populate tables for summary statistics. The UI\'s a bit funky, but eventually I got it up and working.

Now I\'d

2条回答
  •  伪装坚强ぢ
    2021-01-22 01:22

    I faced similar situation, here how i over come it.
    I am going to describe how i did it with Ec2Resource. if you are looking for solution in EMRCluster refer @franklinsijo answer.

    Steps
    1. Store your python script in s3.
    2. create a shell script(hello.sh)(given bellow) and store it to s3
    3. Create a Ec2Resource Node and ShellCommandActivity Node and provide these information.

    • Provide shell script S3 url in "Script Uri" and set "stage" to true in ShellCommandActivity. And it should runs on your DefaultResource

    Here is the shell script(hello.sh) which download your python program from s3 and stores locally, install python and required 3rd party library and finally execute your python file.

    hello.sh

    echo 'Download python file to local temp'
    aws s3 cp s3://path/to/python_file/hello_world.py /tmp/hello.py
    # Install python(on CentOs )
    sudo yum -y install python-pip
    pip install 
    python /tmp/hello.py
    

    I had hard time while trying with bang line so do not included them here.
    if aws cp command doesn't works(awscli is older), here is a quick solution for this case.

    1. Follow step 1-3 above, along with that create a s3DataNode.
      I. provide your python s3 url in "File Path" of S3DataNode.
      II. provide DataNode as "input" to ShellCommandActivity
      III. write following command in "command" field of ShellCommandActivity

    Command

    echo 'Install Python2'
    sudo yum -y install python-pip
    pip install 
    python ${INPUT1_STAGING_DIR}/hello_world.py
    

提交回复
热议问题