s3 bucket policy for instance to read from two different accounts

巧了我就是萌 提交于 2019-12-13 04:34:15

问题


I have a instance which needs to read data from two different account s3.

  1. Bucket in DataAccount with bucket name "dataaccountlogs"
  2. Bucket in UserAccount with bucket name "userlogs"

I have console access to both account, so now I need to configure bucket policy to allow instances to read s3 data from buckets dataaccountlogs and userlogs , and my instance is running in UserAccount .

I need to access these two bucket both from command line as well as using spark job.


回答1:


You will need a role in UserAccount, which will be used to access mentioned buckets, say RoleA. Role should have permissions for required S3 operations.

Then you will able to configure a bucket policy for each bucket:

  1. For DataAccount:

    {        
    "Version": "2012-10-17",
    "Id": "Policy1",
    "Statement": [
        {
            "Sid": "test1",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::DataAccount:role/RoleA"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::dataaccountlogs",
                "arn:aws:s3:::dataaccountlogs/*"
            ]
        }
    ]
    }
    
  2. For UserAccount:

    {
    "Version": "2012-10-17",
    "Id": "Policy1",
    "Statement": [
        {
            "Sid": "test1",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::DataAccount:role/RoleA"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::userlogs",
                "arn:aws:s3:::userlogs/*"
            ]
        }
    ]
    } 
    

For accessing them from command line:

You will need to setup AWS CLI tool first: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html

Then you will need to configure a profile for using your role. First you will need to make a profile for your user to login:

aws configure --profile YourProfileAlias

And follow instructions for setting up credentials.

Then you will need to edit config and add profile for a role: ~/.aws/config

Add to the end a block:

[profile YourRoleProfileName]
role_arn = arn:aws:iam::DataAccount:role/RoleA
source_profile = YourProfileAlias

After that you will be able to use aws s3api ... --profile YourRoleProfileName to access your both buckets on behalf of created role.

To access from spark:

  1. If you run your cluster on EMR, you should use SecurityConfiguration, and fill a section for S3 role configuration. A different role for each specific bucket can be specified. You should use "Prefix" constraint and list all destination prefixes after. Like "s3://dataaccountlogs/,s3://userlogs".

Note: you should strictly use s3 protocol for this, not s3a. Also there is number of limitations, you can find here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html

  1. Another way with spark is to configure Hadoop to assume your role. Putting

    spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"

And configuring you role to be used

spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam::DataAccount:role/RoleA

This way is more general now, since EMR commiter have various limitations. You can find more information for configuring this at Hadoop docs: https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html



来源:https://stackoverflow.com/questions/53246014/s3-bucket-policy-for-instance-to-read-from-two-different-accounts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!