发表新帖

发表新帖

How do you automate pyspark jobs on emr using boto3 (or otherwise)?

前端未结

关注

 4  1309

悲&欢浪女 2021-02-01 08:44

I am creating a job to parse massive amounts of server data, and then upload it into a Redshift database.

My job flow is as follows:

Grab the

4条回答

难免孤独 (楼主)

2021-02-01 09:14

Actually, I've gone with AWS's Step Functions, which is a state machine wrapper for Lambda functions, so you can use boto3 to start the EMR Spark job using run_job_flow and you can use describe_cluaster to get the status of the cluster. Finally use a choice. SO your step functions look something like this (step function types in brackets:

Run job (task) -> Wait for X min (wait) -> Check status (task) -> Branch (choice) [ => back to wait, or => done ]

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题