s3 bucket policy for instance to read from two different accounts

问题

I have a instance which needs to read data from two different account s3.

Bucket in DataAccount with bucket name "dataaccountlogs"
Bucket in UserAccount with bucket name "userlogs"

I have console access to both account, so now I need to configure bucket policy to allow instances to read s3 data from buckets dataaccountlogs and userlogs , and my instance is running in UserAccount .

I need to access these two bucket both from command line as well as using spark job.

回答1:

You will need a role in UserAccount, which will be used to access mentioned buckets, say RoleA. Role should have permissions for required S3 operations.

Then you will able to configure a bucket policy for each bucket:

For DataAccount:

{        
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
    {
        "Sid": "test1",
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::DataAccount:role/RoleA"
        },
        "Action": "s3:*",
        "Resource": [
            "arn:aws:s3:::dataaccountlogs",
            "arn:aws:s3:::dataaccountlogs/*"
        ]
    }
]
}

For UserAccount:

{
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
    {
        "Sid": "test1",
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::DataAccount:role/RoleA"
        },
        "Action": "s3:*",
        "Resource": [
            "arn:aws:s3:::userlogs",
            "arn:aws:s3:::userlogs/*"
        ]
    }
]
}

For accessing them from command line:

You will need to setup AWS CLI tool first: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html

Then you will need to configure a profile for using your role. First you will need to make a profile for your user to login:

aws configure --profile YourProfileAlias

And follow instructions for setting up credentials.

Then you will need to edit config and add profile for a role: ~/.aws/config

Add to the end a block:

[profile YourRoleProfileName]
role_arn = arn:aws:iam::DataAccount:role/RoleA
source_profile = YourProfileAlias

After that you will be able to use aws s3api ... --profile YourRoleProfileName to access your both buckets on behalf of created role.

To access from spark:

If you run your cluster on EMR, you should use SecurityConfiguration, and fill a section for S3 role configuration. A different role for each specific bucket can be specified. You should use "Prefix" constraint and list all destination prefixes after. Like "s3://dataaccountlogs/,s3://userlogs".

Note: you should strictly use s3 protocol for this, not s3a. Also there is number of limitations, you can find here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html

Another way with spark is to configure Hadoop to assume your role. Putting

spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"

And configuring you role to be used

spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam::DataAccount:role/RoleA

This way is more general now, since EMR commiter have various limitations. You can find more information for configuring this at Hadoop docs: https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html

来源：https://stackoverflow.com/questions/53246014/s3-bucket-policy-for-instance-to-read-from-two-different-accounts

标签

amazon-web-services

apache-spark

amazon-s3

amazon-iam