AWS Glue automatic job creation

问题

I have pyspark script which I can run in AWS GLUE. But everytime I am creating job from UI and copying my code to the job .Is there anyway I can automatically create job from my file in s3 bucket. (I have all the library and glue context which will be used while running )

回答1:

Another alternative is to use AWS CloudFormation. You can define all AWS resources you want to create (not only Glue jobs) in a template file and then update stack whenever you need from AWS Console or using cli.

Template for a Glue job would look like this:

  MyJob:
    Type: AWS::Glue::Job
    Properties:
      Command:
        Name: glueetl
        ScriptLocation: "s3://aws-glue-scripts//your-script-file.py"
      DefaultArguments:
        "--job-bookmark-option": "job-bookmark-enable"
      ExecutionProperty:
        MaxConcurrentRuns: 2
      MaxRetries: 0
      Name: cf-job1
      Role: !Ref MyJobRole # reference to a Role resource which is not presented here

回答2:

Yes, it is possible. For instance, you can use boto3 framework for this purpose.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html

回答3:

I wrote script which does following:

We have (glue)_dependency.txt file, script gets path of all dependency files and create zip file.
It uploads glue file and zip file in S3 by using s3 sync
Optionally, if any change in job setting will re-deploy cloudformation template

You may write shell script to do it.

来源：https://stackoverflow.com/questions/54193618/aws-glue-automatic-job-creation

标签

amazon-web-services

amazon-ec2

pyspark

aws-glue

aws-glue-data-catalog