发表新帖

发表新帖

How do you automate pyspark jobs on emr using boto3 (or otherwise)?

前端未结

关注

 4  1314

悲&欢浪女 2021-02-01 08:44

I am creating a job to parse massive amounts of server data, and then upload it into a Redshift database.

My job flow is as follows:

Grab the

4条回答

伪装坚强ぢ (楼主)

2021-02-01 08:56
I put a complete example on GitHub that shows how to do all of this with Boto3.

The long-lived cluster example shows how to create and run job steps on a cluster that grabs data from a public S3 bucket that contains historical Amazon review data, do some PySpark processing on it, and write the output back to an S3 bucket.
- Creates an Amazon S3 bucket and uploads a job script.
- Creates AWS Identity and Access Management (IAM) roles used by the demo.
- Creates Amazon Elastic Compute Cloud (Amazon EC2) security groups used by the demo.
- Creates short-lived and long-lived clusters and runs job steps on them.
- Terminates clusters and cleans up all resources.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题