AWS Glue Job Input Parameters

前端 未结 1 946
感情败类
感情败类 2020-12-24 15:13

I am relatively new to AWS and this may be a bit less technical question, but at present AWS Glue notes a maximum of 25 jobs permitted to be created. We are loading in a ser

相关标签:
1条回答
  • 2020-12-24 15:57

    In the below example I present how to use Glue job input parameters in the code. This code takes the input parameters and it writes them to the flat file.

    1) Setting the input parameters in the job configuration.

    2) The code of Glue job

    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    
    ## @params: [JOB_NAME]
    args = getResolvedOptions(sys.argv, ['JOB_NAME'])
    
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    args = getResolvedOptions(sys.argv, ['JOB_NAME','VAL1','VAL2','VAL3','DEST_FOLDER'])
    job.init(args['JOB_NAME'], args)
    
    v_list=[{"VAL1":args['VAL1'],"VAL2":args['VAL2'],"VAL3":args['VAL3']}]
    
    df=sc.parallelize(v_list).toDF()
    df.repartition(1).write.mode('overwrite').format('csv').options(header=True, delimiter = ';').save("s3://"+ args['DEST_FOLDER'] +"/")
    
    job.commit()
    

    3) There is also possible to provide input parameters during using boto3, CloudFormation or StepFunctions. This example shows how to do it by using boto3.

    import boto3
    
    def lambda_handler(event, context):
        glue = boto3.client('glue')
    
    
        myJob = glue.create_job(Name='example_job2', Role='AWSGlueServiceDefaultRole',
                                Command={'Name': 'glueetl','ScriptLocation': 's3://aws-glue-scripts/example_job'},
                                DefaultArguments={"VAL1":"value1","VAL2":"value2","VAL3":"value3"}       
                                       )
    
        glue.start_job_run(JobName=myJob['Name'], Arguments={"VAL1":"value11","VAL2":"value22","VAL3":"value33"})
    

    Useful links:

    1. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html
    2. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html
    3. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job
    4. https://docs.aws.amazon.com/step-functions/latest/dg/connectors-glue.html
    0 讨论(0)
提交回复
热议问题