I\'m wondering if PySpark supports S3 access using IAM roles. Specifically, I have a business constraint where I have to assume an AWS role in order to access a given bucket. Th
IAM role for accessing s3
is only support by s3a, because it is using AWS SDK.
You need to put hadoop-aws
JAR and aws-java-sdk
JAR (and third-party Jars in its package) into your CLASSPATH.
hadoop-aws link.
aws-java-sdk link.
Then set this in core-site.xml
:
fs.s3.impl
org.apache.hadoop.fs.s3a.S3AFileSystem
fs.s3a.impl
org.apache.hadoop.fs.s3a.S3AFileSystem