PySpark using IAM roles to access S3

前端 未结 5 1489
野趣味
野趣味 2021-02-08 23:48

I\'m wondering if PySpark supports S3 access using IAM roles. Specifically, I have a business constraint where I have to assume an AWS role in order to access a given bucket. Th

5条回答
  •  灰色年华
    2021-02-09 00:17

    IAM role for accessing s3 is only support by s3a, because it is using AWS SDK.

    You need to put hadoop-aws JAR and aws-java-sdk JAR (and third-party Jars in its package) into your CLASSPATH.

    hadoop-aws link.

    aws-java-sdk link.

    Then set this in core-site.xml:

    
        fs.s3.impl
        org.apache.hadoop.fs.s3a.S3AFileSystem
    
    
        fs.s3a.impl
        org.apache.hadoop.fs.s3a.S3AFileSystem
    
    

提交回复
热议问题