I\'m wondering if PySpark supports S3 access using IAM roles. Specifically, I have a business constraint where I have to assume an AWS role in order to access a given bucket. Th
Hadoop 2.8+'s s3a connector supports IAM roles via a new credential provider; It's not in the Hadoop 2.7 release.
To use it you need to change the credential provider.
fs.s3a.aws.credentials.provider = org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
fs.s3a.access.key =
fs.s3a.secret.key =
fs.s3a.session.token =
What is in Hadoop 2.7 (and enabled by default) is the picking up of the AWS_
environment variables.
If you set the AWS env vars for session login on your local system and the remote ones then they should get picked up.
I know its a pain, but as far as the Hadoop team are concerned Hadoop 2.7 shipped mid-2016 and we've done a lot since then, stuff which we aren't going to backport