aws glue to access/crawl dynamodb from another aws account (cross account access)

独自空忆成欢 提交于 2021-01-28 06:56:15

问题


I have written a glue job which exports DynamoDb table and stores it on S3 in csv format. The glue job and the table are in the same aws account, but the S3 bucket is in a different aws account. I have been able to access cross account S3 bucket from the glue job by attaching the following bucket policy to it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "tempS3Access",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AWS-ACCOUNT-ID>:role/<ROLE-PATH>"
            },
            "Action": [
                "s3:Get*",
                "s3:Put*",
                "s3:List*",
                "s3:DeleteObject*"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET-NAME>",
                "arn:aws:s3:::<BUCKET-NAME>/*"
            ]
        }
    ]
}

Now, I also want to read/access DynamoDb table from another AWS account as well. Is it possible to access cross account DynamoDb table using Crawler ? What do I need to achieve this ?

Thanks


回答1:


Short answer: You can't. The crawler can only crawl dynamo tables in your own account.

Looong answer:
You can use my workaround.

  1. Create a trust policy in account A. The one you have made will do.
  2. In your account B create a glue job. Import boto3 and create a session in the first account. Then using the dynamodb.resource you can scan the table. Check out my code:
import boto3 . 
sts_client = boto3.client('sts',region_name='your-region')  
assumed_role_object=sts_client.assume_role(RoleArn="arn:aws:iam::accountAid:role/the-role-you-created", RoleSessionName="AssumeRoleSession1")
credentials=assumed_role_object['Credentials']
dynamodb_client = boto3.resource(
    'dynamodb',
    aws_access_key_id=credentials['AccessKeyId'],
    aws_secret_access_key=credentials['SecretAccessKey'],
    aws_session_token=credentials['SessionToken'],
    region_name='your-region'
)  

table = dynamodb_client.Table('table-to-crawl')  

response = table.scan()  

data = response['Items']

Now with this 'data', which holds all the table elements you can do a bunch of things. You can create a dynamicFrame if you wish to manipulate the data in some way:

dataF = glueContext.create_dynamic_frame.from_rdd(spark.sparkContext.parallelize(data),'data'))

Or a dataFrame if that's what you need.
I hope this helps. If you have any questions feel free to ask.



来源:https://stackoverflow.com/questions/56233581/aws-glue-to-access-crawl-dynamodb-from-another-aws-account-cross-account-access

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!