How do you automate pyspark jobs on emr using boto3 (or otherwise)?

前端 未结 4 1309
悲&欢浪女
悲&欢浪女 2021-02-01 08:44

I am creating a job to parse massive amounts of server data, and then upload it into a Redshift database.

My job flow is as follows:

  • Grab the
4条回答
  •  难免孤独
    2021-02-01 09:14

    Actually, I've gone with AWS's Step Functions, which is a state machine wrapper for Lambda functions, so you can use boto3 to start the EMR Spark job using run_job_flow and you can use describe_cluaster to get the status of the cluster. Finally use a choice. SO your step functions look something like this (step function types in brackets:

    Run job (task) -> Wait for X min (wait) -> Check status (task) -> Branch (choice) [ => back to wait, or => done ]

提交回复
热议问题