PySpark using IAM roles to access S3

前端未结

关注

 5  1488

野趣味 2021-02-08 23:48

I\'m wondering if PySpark supports S3 access using IAM roles. Specifically, I have a business constraint where I have to assume an AWS role in order to access a given bucket. Th

5条回答

深忆病人 (楼主)

2021-02-09 00:06
Hadoop 2.8+'s s3a connector supports IAM roles via a new credential provider; It's not in the Hadoop 2.7 release.

To use it you need to change the credential provider.
```
fs.s3a.aws.credentials.provider = org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
fs.s3a.access.key = 
fs.s3a.secret.key = 
fs.s3a.session.token = 
```
What is in Hadoop 2.7 (and enabled by default) is the picking up of the AWS_ environment variables.

If you set the AWS env vars for session login on your local system and the remote ones then they should get picked up.

I know its a pain, but as far as the Hadoop team are concerned Hadoop 2.7 shipped mid-2016 and we've done a lot since then, stuff which we aren't going to backport
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...