AWS Glue automatic job creation

試著忘記壹切 提交于 2020-03-03 10:12:10

问题


I have pyspark script which I can run in AWS GLUE. But everytime I am creating job from UI and copying my code to the job .Is there anyway I can automatically create job from my file in s3 bucket. (I have all the library and glue context which will be used while running )


回答1:


Another alternative is to use AWS CloudFormation. You can define all AWS resources you want to create (not only Glue jobs) in a template file and then update stack whenever you need from AWS Console or using cli.

Template for a Glue job would look like this:

  MyJob:
    Type: AWS::Glue::Job
    Properties:
      Command:
        Name: glueetl
        ScriptLocation: "s3://aws-glue-scripts//your-script-file.py"
      DefaultArguments:
        "--job-bookmark-option": "job-bookmark-enable"
      ExecutionProperty:
        MaxConcurrentRuns: 2
      MaxRetries: 0
      Name: cf-job1
      Role: !Ref MyJobRole # reference to a Role resource which is not presented here



回答2:


Yes, it is possible. For instance, you can use boto3 framework for this purpose.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html




回答3:


I wrote script which does following:

  1. We have (glue)_dependency.txt file, script gets path of all dependency files and create zip file.
  2. It uploads glue file and zip file in S3 by using s3 sync
  3. Optionally, if any change in job setting will re-deploy cloudformation template

You may write shell script to do it.



来源:https://stackoverflow.com/questions/54193618/aws-glue-automatic-job-creation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!