问题
I have a instance which needs to read data from two different account s3.
- Bucket in DataAccount with bucket name "dataaccountlogs"
- Bucket in UserAccount with bucket name "userlogs"
I have console access to both account, so now I need to configure bucket policy to allow instances to read s3 data from buckets dataaccountlogs and userlogs , and my instance is running in UserAccount .
I need to access these two bucket both from command line as well as using spark job.
回答1:
You will need a role in UserAccount, which will be used to access mentioned buckets, say RoleA. Role should have permissions for required S3 operations.
Then you will able to configure a bucket policy for each bucket:
For DataAccount:
{ "Version": "2012-10-17", "Id": "Policy1", "Statement": [ { "Sid": "test1", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::DataAccount:role/RoleA" }, "Action": "s3:*", "Resource": [ "arn:aws:s3:::dataaccountlogs", "arn:aws:s3:::dataaccountlogs/*" ] } ] }
For UserAccount:
{ "Version": "2012-10-17", "Id": "Policy1", "Statement": [ { "Sid": "test1", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::DataAccount:role/RoleA" }, "Action": "s3:*", "Resource": [ "arn:aws:s3:::userlogs", "arn:aws:s3:::userlogs/*" ] } ] }
For accessing them from command line:
You will need to setup AWS CLI tool first: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html
Then you will need to configure a profile for using your role. First you will need to make a profile for your user to login:
aws configure --profile YourProfileAlias
And follow instructions for setting up credentials.
Then you will need to edit config and add profile for a role: ~/.aws/config
Add to the end a block:
[profile YourRoleProfileName]
role_arn = arn:aws:iam::DataAccount:role/RoleA
source_profile = YourProfileAlias
After that you will be able to use aws s3api ... --profile YourRoleProfileName to access your both buckets on behalf of created role.
To access from spark:
- If you run your cluster on EMR, you should use SecurityConfiguration, and fill a section for S3 role configuration. A different role for each specific bucket can be specified. You should use "Prefix" constraint and list all destination prefixes after. Like "s3://dataaccountlogs/,s3://userlogs".
Note: you should strictly use s3 protocol for this, not s3a. Also there is number of limitations, you can find here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html
- Another way with spark is to configure Hadoop to assume your role. Putting
spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"
And configuring you role to be used
spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam::DataAccount:role/RoleA
This way is more general now, since EMR commiter have various limitations. You can find more information for configuring this at Hadoop docs: https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html
来源:https://stackoverflow.com/questions/53246014/s3-bucket-policy-for-instance-to-read-from-two-different-accounts