问题
I have pyspark script which I can run in AWS GLUE. But everytime I am creating job from UI and copying my code to the job .Is there anyway I can automatically create job from my file in s3 bucket. (I have all the library and glue context which will be used while running )
回答1:
Another alternative is to use AWS CloudFormation. You can define all AWS resources you want to create (not only Glue jobs) in a template file and then update stack whenever you need from AWS Console or using cli.
Template for a Glue job would look like this:
MyJob:
Type: AWS::Glue::Job
Properties:
Command:
Name: glueetl
ScriptLocation: "s3://aws-glue-scripts//your-script-file.py"
DefaultArguments:
"--job-bookmark-option": "job-bookmark-enable"
ExecutionProperty:
MaxConcurrentRuns: 2
MaxRetries: 0
Name: cf-job1
Role: !Ref MyJobRole # reference to a Role resource which is not presented here
回答2:
Yes, it is possible. For instance, you can use boto3 framework for this purpose.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html
回答3:
I wrote script which does following:
- We have (glue)_dependency.txt file, script gets path of all dependency files and create zip file.
- It uploads glue file and zip file in S3 by using s3 sync
- Optionally, if any change in job setting will re-deploy cloudformation template
You may write shell script to do it.
来源:https://stackoverflow.com/questions/54193618/aws-glue-automatic-job-creation