PySpark using IAM roles to access S3

前端 未结 5 1488
野趣味
野趣味 2021-02-08 23:48

I\'m wondering if PySpark supports S3 access using IAM roles. Specifically, I have a business constraint where I have to assume an AWS role in order to access a given bucket. Th

5条回答
  •  深忆病人
    2021-02-09 00:06

    Hadoop 2.8+'s s3a connector supports IAM roles via a new credential provider; It's not in the Hadoop 2.7 release.

    To use it you need to change the credential provider.

    fs.s3a.aws.credentials.provider = org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
    fs.s3a.access.key = 
    fs.s3a.secret.key = 
    fs.s3a.session.token = 
    

    What is in Hadoop 2.7 (and enabled by default) is the picking up of the AWS_ environment variables.

    If you set the AWS env vars for session login on your local system and the remote ones then they should get picked up.

    I know its a pain, but as far as the Hadoop team are concerned Hadoop 2.7 shipped mid-2016 and we've done a lot since then, stuff which we aren't going to backport

提交回复
热议问题